All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-11 10:30 ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Clean up the test code, add some new basic subtests, then unblock
unbind test variants.

No incompletes / aborts nor subsequently run test issues have been
reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
unidentified driver sysfs issue but the device is fully recovered and
left in a usable state.  Perceived Haswell/Broadwell issue with audio
power management has been worked around and its potential occurrence
is reported as an IGT warning.

Series changelog:
v2: New patch "Un-blocklist *bind* subtests added.
v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
    from subtest failures".
  - a new patche "Clean up device open error handling" added, an old
    patch "Fix missing newline" obsoleted by the new one dropped,
  - other new patches added:
    - "Let the driver time out essential sysfs operations",
    - "More thorough i915 healthcheck and recovery",
  - a patch "Add 'lateclose before restore' variants" from another
    series included.
v4: Optional patch "Duplicate debug messages in dmesg" from another
    series included.
v5: New patch added with Haswell audio related kernel warning worked
    around and replaced with an IGT warning to preserve visibility of
    the issue.
v6: New patch added for also checking health of render device nodes,
  - new patch added with proper handling of health check before late
    close,
  - inclusion of unbind-rebind scenario to BAT scope proposed.

@Michał: Since some patch updates are trivial, I've preserved your
v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
marked your R-b as v1/v2 applicable.  Please have a look and confirm if
you are still OK with them.

@Tvrtko: As I already asked before, please support my attempt to remove
the unbind test variants from the blocklist.

@Petri, @Martin: Assuming CI results will be as good as those obtained
on Trybot, please give me your green light for merging this series if
you have no objections.

Thanks,
Janusz

Janusz Krzysztofik (24):
  tests/core_hotunplug: Use igt_assert_fd()
  tests/core_hotunplug: Constify dev_bus_addr string
  tests/core_hotunplug: Clean up device open error handling
  tests/core_hotunplug: Consolidate duplicated debug messages
  tests/core_hotunplug: Assert successful device filter application
  tests/core_hotunplug: Maintain a single data structure instance
  tests/core_hotunplug: Pass errors via a data structure field
  tests/core_hotunplug: Handle device close errors
  tests/core_hotunplug: Prepare invariant data once per test run
  tests/core_hotunplug: Skip selectively on sysfs close errors
  tests/core_hotunplug: Recover from subtest failures
  tests/core_hotunplug: Fail subtests on device close errors
  tests/core_hotunplug: Let the driver time out essential sysfs
    operations
  tests/core_hotunplug: Process return values of sysfs operations
  tests/core_hotunplug: Assert expected device presence/absence
  tests/core_hotunplug: Explicitly ignore unused return values
  tests/core_hotunplug: Also check health of render device node
  tests/core_hotunplug: More thorough i915 healthcheck and recovery
  tests/core_hotunplug: Add 'lateclose before restore' variants
  tests/core_hotunplug: Check health both before and after late close
  tests/core_hotunplug: HSW/BDW audio issue workaround
  tests/core_hotunplug: Duplicate debug messages in dmesg
  tests/core_hotunplug: Un-blocklist *bind* subtests
  tests/core_hotunplug: Add unbind-rebind subtest to BAT scope

 tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
 tests/intel-ci/blacklist.txt          |   2 +-
 tests/intel-ci/fast-feedback.testlist |   1 +
 3 files changed, 431 insertions(+), 132 deletions(-)

-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-11 10:30 ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Clean up the test code, add some new basic subtests, then unblock
unbind test variants.

No incompletes / aborts nor subsequently run test issues have been
reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
unidentified driver sysfs issue but the device is fully recovered and
left in a usable state.  Perceived Haswell/Broadwell issue with audio
power management has been worked around and its potential occurrence
is reported as an IGT warning.

Series changelog:
v2: New patch "Un-blocklist *bind* subtests added.
v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
    from subtest failures".
  - a new patche "Clean up device open error handling" added, an old
    patch "Fix missing newline" obsoleted by the new one dropped,
  - other new patches added:
    - "Let the driver time out essential sysfs operations",
    - "More thorough i915 healthcheck and recovery",
  - a patch "Add 'lateclose before restore' variants" from another
    series included.
v4: Optional patch "Duplicate debug messages in dmesg" from another
    series included.
v5: New patch added with Haswell audio related kernel warning worked
    around and replaced with an IGT warning to preserve visibility of
    the issue.
v6: New patch added for also checking health of render device nodes,
  - new patch added with proper handling of health check before late
    close,
  - inclusion of unbind-rebind scenario to BAT scope proposed.

@Michał: Since some patch updates are trivial, I've preserved your
v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
marked your R-b as v1/v2 applicable.  Please have a look and confirm if
you are still OK with them.

@Tvrtko: As I already asked before, please support my attempt to remove
the unbind test variants from the blocklist.

@Petri, @Martin: Assuming CI results will be as good as those obtained
on Trybot, please give me your green light for merging this series if
you have no objections.

Thanks,
Janusz

Janusz Krzysztofik (24):
  tests/core_hotunplug: Use igt_assert_fd()
  tests/core_hotunplug: Constify dev_bus_addr string
  tests/core_hotunplug: Clean up device open error handling
  tests/core_hotunplug: Consolidate duplicated debug messages
  tests/core_hotunplug: Assert successful device filter application
  tests/core_hotunplug: Maintain a single data structure instance
  tests/core_hotunplug: Pass errors via a data structure field
  tests/core_hotunplug: Handle device close errors
  tests/core_hotunplug: Prepare invariant data once per test run
  tests/core_hotunplug: Skip selectively on sysfs close errors
  tests/core_hotunplug: Recover from subtest failures
  tests/core_hotunplug: Fail subtests on device close errors
  tests/core_hotunplug: Let the driver time out essential sysfs
    operations
  tests/core_hotunplug: Process return values of sysfs operations
  tests/core_hotunplug: Assert expected device presence/absence
  tests/core_hotunplug: Explicitly ignore unused return values
  tests/core_hotunplug: Also check health of render device node
  tests/core_hotunplug: More thorough i915 healthcheck and recovery
  tests/core_hotunplug: Add 'lateclose before restore' variants
  tests/core_hotunplug: Check health both before and after late close
  tests/core_hotunplug: HSW/BDW audio issue workaround
  tests/core_hotunplug: Duplicate debug messages in dmesg
  tests/core_hotunplug: Un-blocklist *bind* subtests
  tests/core_hotunplug: Add unbind-rebind subtest to BAT scope

 tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
 tests/intel-ci/blacklist.txt          |   2 +-
 tests/intel-ci/fast-feedback.testlist |   1 +
 3 files changed, 431 insertions(+), 132 deletions(-)

-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 01/24] tests/core_hotunplug: Use igt_assert_fd()
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
  (?)
@ 2020-09-11 10:30 ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

There is a new library helper that asserts validity of open file
descriptors.  Use it instead of open coding.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index e03f3b945..7431346b1 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -57,7 +57,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 
 	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
 				    O_DIRECTORY);
-	igt_assert(priv->fd.sysfs_drv >= 0);
+	igt_assert_fd(priv->fd.sysfs_drv);
 
 	len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
 	buf[len] = '\0';
@@ -72,10 +72,10 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
 	igt_debug("opening device\n");
 	priv->fd.drm = __drm_open_driver(DRIVER_ANY);
-	igt_assert(priv->fd.drm >= 0);
+	igt_assert_fd(priv->fd.drm);
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
-	igt_assert(priv->fd.sysfs_dev >= 0);
+	igt_assert_fd(priv->fd.sysfs_dev);
 
 	if (buf) {
 		prepare_for_unbind(priv, buf, buflen);
@@ -83,7 +83,7 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 		/* prepare for bus rescan */
 		priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
 					    "device/subsystem", O_DIRECTORY);
-		igt_assert(priv->fd.sysfs_bus >= 0);
+		igt_assert_fd(priv->fd.sysfs_bus);
 	}
 }
 
@@ -261,7 +261,7 @@ igt_main
 		 * a device file descriptor open for exit handler use.
 		 */
 		fd_drm = __drm_open_driver(DRIVER_ANY);
-		igt_assert(fd_drm >= 0);
+		igt_assert_fd(fd_drm);
 
 		if (is_i915_device(fd_drm))
 			igt_require_gem(fd_drm);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 02/24] tests/core_hotunplug: Constify dev_bus_addr string
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Device bus address structure field is always initialized with a pointer
to a substring of the device sysfs path and never used for its
modification.  Declare it as a constant string.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7431346b1..a4071f51e 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -44,7 +44,7 @@ struct hotunplug {
 		int sysfs_bus;
 		int sysfs_drv;
 	} fd;
-	char *dev_bus_addr;
+	const char *dev_bus_addr;
 };
 
 /* Helpers */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 02/24] tests/core_hotunplug: Constify dev_bus_addr string
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Device bus address structure field is always initialized with a pointer
to a substring of the device sysfs path and never used for its
modification.  Declare it as a constant string.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7431346b1..a4071f51e 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -44,7 +44,7 @@ struct hotunplug {
 		int sysfs_bus;
 		int sysfs_drv;
 	} fd;
-	char *dev_bus_addr;
+	const char *dev_bus_addr;
 };
 
 /* Helpers */
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 03/24] tests/core_hotunplug: Clean up device open error handling
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

We don't use drm_driver_open() since in case of an i915 device it keeps
an extra file descriptor of the exercised device open for exit handler
use, while we would like to be able to close the device completely
before running certain test operations.  Instead, we call
__drm_driver_open() and handle its result ourselves.  Unlike
drm_driver_open() which skips on device open errors, we always fail or
abort the test in such case.  Moreover, we don't ensure that the i915
driver is idle before starting subtests like drm_open_driver() does.

Skip instead of failing on initial device open error.  Also, call
gem_quiescent_gpu() if an i915 device is detected.  For subsequent
device opens, define a local helper that fails on error and use it.  If
we think we need to abort the test execution on device open error, set
our failure marker first to trigger the abort from a follow up
igt_fixture section.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index a4071f51e..e576a6c6c 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -49,6 +49,21 @@ struct hotunplug {
 
 /* Helpers */
 
+/**
+ * Subtests must be able to close examined devices completely.  Don't
+ * use drm_open_driver() since in case of an i915 device it opens it
+ * twice and keeps a second file descriptor open for exit handler use.
+ */
+static int local_drm_open_driver(void)
+{
+	int fd_drm;
+
+	fd_drm = __drm_open_driver(DRIVER_ANY);
+	igt_assert_fd(fd_drm);
+
+	return fd_drm;
+}
+
 static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 {
 	int len;
@@ -71,8 +86,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
 	igt_debug("opening device\n");
-	priv->fd.drm = __drm_open_driver(DRIVER_ANY);
-	igt_assert_fd(priv->fd.drm);
+	priv->fd.drm = local_drm_open_driver();
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -145,8 +159,9 @@ static void healthcheck(void)
 	igt_devices_scan(true);
 
 	igt_debug("reopening the device\n");
-	fd_drm = __drm_open_driver(DRIVER_ANY);
-	igt_abort_on_f(fd_drm < 0, "Device reopen failure");
+	failure = "Device reopen failure!";
+	fd_drm = local_drm_open_driver();
+	failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
 		failure = "GEM failure";
@@ -255,16 +270,13 @@ igt_main
 	igt_fixture {
 		int fd_drm;
 
-		/**
-		 * As subtests must be able to close examined devices
-		 * completely, don't use drm_open_driver() as it keeps
-		 * a device file descriptor open for exit handler use.
-		 */
 		fd_drm = __drm_open_driver(DRIVER_ANY);
-		igt_assert_fd(fd_drm);
+		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
 
-		if (is_i915_device(fd_drm))
+		if (is_i915_device(fd_drm)) {
+			gem_quiescent_gpu(fd_drm);
 			igt_require_gem(fd_drm);
+		}
 
 		/* Make sure subtests always reopen the same device */
 		set_filter_from_device(fd_drm);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 03/24] tests/core_hotunplug: Clean up device open error handling
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

We don't use drm_driver_open() since in case of an i915 device it keeps
an extra file descriptor of the exercised device open for exit handler
use, while we would like to be able to close the device completely
before running certain test operations.  Instead, we call
__drm_driver_open() and handle its result ourselves.  Unlike
drm_driver_open() which skips on device open errors, we always fail or
abort the test in such case.  Moreover, we don't ensure that the i915
driver is idle before starting subtests like drm_open_driver() does.

Skip instead of failing on initial device open error.  Also, call
gem_quiescent_gpu() if an i915 device is detected.  For subsequent
device opens, define a local helper that fails on error and use it.  If
we think we need to abort the test execution on device open error, set
our failure marker first to trigger the abort from a follow up
igt_fixture section.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index a4071f51e..e576a6c6c 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -49,6 +49,21 @@ struct hotunplug {
 
 /* Helpers */
 
+/**
+ * Subtests must be able to close examined devices completely.  Don't
+ * use drm_open_driver() since in case of an i915 device it opens it
+ * twice and keeps a second file descriptor open for exit handler use.
+ */
+static int local_drm_open_driver(void)
+{
+	int fd_drm;
+
+	fd_drm = __drm_open_driver(DRIVER_ANY);
+	igt_assert_fd(fd_drm);
+
+	return fd_drm;
+}
+
 static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 {
 	int len;
@@ -71,8 +86,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
 	igt_debug("opening device\n");
-	priv->fd.drm = __drm_open_driver(DRIVER_ANY);
-	igt_assert_fd(priv->fd.drm);
+	priv->fd.drm = local_drm_open_driver();
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -145,8 +159,9 @@ static void healthcheck(void)
 	igt_devices_scan(true);
 
 	igt_debug("reopening the device\n");
-	fd_drm = __drm_open_driver(DRIVER_ANY);
-	igt_abort_on_f(fd_drm < 0, "Device reopen failure");
+	failure = "Device reopen failure!";
+	fd_drm = local_drm_open_driver();
+	failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
 		failure = "GEM failure";
@@ -255,16 +270,13 @@ igt_main
 	igt_fixture {
 		int fd_drm;
 
-		/**
-		 * As subtests must be able to close examined devices
-		 * completely, don't use drm_open_driver() as it keeps
-		 * a device file descriptor open for exit handler use.
-		 */
 		fd_drm = __drm_open_driver(DRIVER_ANY);
-		igt_assert_fd(fd_drm);
+		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
 
-		if (is_i915_device(fd_drm))
+		if (is_i915_device(fd_drm)) {
+			gem_quiescent_gpu(fd_drm);
 			igt_require_gem(fd_drm);
+		}
 
 		/* Make sure subtests always reopen the same device */
 		set_filter_from_device(fd_drm);
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 04/24] tests/core_hotunplug: Consolidate duplicated debug messages
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Some debug messages which designate specific test operations, or their
greater parts at least, sound always the same, no matter which subtest
they are called from.  Emit them, possibly updated with subtest
specified modifiers, from inside respective helpers instead of
duplicating them in subtest bodies.

v2: Rebase only.
v3: Refresh and extend over new case (local_drm_open_driver),
  - allow callers to specify a message suffix as well where applicable.
v4: Rename prefix/suffix string arguments to more meaningful when/why.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index e576a6c6c..fc239324a 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -54,10 +54,12 @@ struct hotunplug {
  * use drm_open_driver() since in case of an i915 device it opens it
  * twice and keeps a second file descriptor open for exit handler use.
  */
-static int local_drm_open_driver(void)
+static int local_drm_open_driver(const char *when, const char *why)
 {
 	int fd_drm;
 
+	igt_debug("%sopening device%s\n", when, why);
+
 	fd_drm = __drm_open_driver(DRIVER_ANY);
 	igt_assert_fd(fd_drm);
 
@@ -85,8 +87,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
-	igt_debug("opening device\n");
-	priv->fd.drm = local_drm_open_driver();
+	priv->fd.drm = local_drm_open_driver("", " for subtest");
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -104,8 +105,11 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 static const char *failure;
 
 /* Unbind the driver from the device */
-static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
+static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr,
+			  const char *prefix)
 {
+	igt_debug("%sunbinding the driver from the device\n", prefix);
+
 	failure = "Driver unbind timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
@@ -118,6 +122,8 @@ static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
 /* Re-bind the driver to the device */
 static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
 {
+	igt_debug("rebinding the driver to the device\n");
+
 	failure = "Driver re-bind timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
@@ -128,8 +134,10 @@ static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(int fd_sysfs_dev)
+static void device_unplug(int fd_sysfs_dev, const char *prefix)
 {
+	igt_debug("%sunplugging the device\n", prefix);
+
 	failure = "Device unplug timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
@@ -142,6 +150,8 @@ static void device_unplug(int fd_sysfs_dev)
 /* Re-discover the device by rescanning its bus */
 static void bus_rescan(int fd_sysfs_bus)
 {
+	igt_debug("rediscovering the device\n");
+
 	failure = "Bus rescan timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
@@ -158,9 +168,8 @@ static void healthcheck(void)
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	igt_debug("reopening the device\n");
 	failure = "Device reopen failure!";
-	fd_drm = local_drm_open_driver();
+	fd_drm = local_drm_open_driver("re", " for health check");
 	failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
@@ -199,10 +208,8 @@ static void unbind_rebind(void)
 	igt_debug("closing the device\n");
 	close(priv.fd.drm);
 
-	igt_debug("unbinding the driver from the device\n");
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "");
 
-	igt_debug("rebinding the driver to the device\n");
 	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
 
 	healthcheck();
@@ -217,10 +224,8 @@ static void unplug_rescan(void)
 	igt_debug("closing the device\n");
 	close(priv.fd.drm);
 
-	igt_debug("unplugging the device\n");
-	device_unplug(priv.fd.sysfs_dev);
+	device_unplug(priv.fd.sysfs_dev, "");
 
-	igt_debug("recovering the device\n");
 	bus_rescan(priv.fd.sysfs_bus);
 
 	healthcheck();
@@ -233,10 +238,8 @@ static void hotunbind_lateclose(void)
 
 	prepare(&priv, buf, sizeof(buf));
 
-	igt_debug("hot unbinding the driver from the device\n");
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "hot ");
 
-	igt_debug("rebinding the driver to the device\n");
 	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
 
 	igt_debug("late closing the unbound device instance\n");
@@ -251,10 +254,8 @@ static void hotunplug_lateclose(void)
 
 	prepare(&priv, NULL, 0);
 
-	igt_debug("hot unplugging the device\n");
-	device_unplug(priv.fd.sysfs_dev);
+	device_unplug(priv.fd.sysfs_dev, "hot ");
 
-	igt_debug("recovering the device\n");
 	bus_rescan(priv.fd.sysfs_bus);
 
 	igt_debug("late closing the removed device instance\n");
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 04/24] tests/core_hotunplug: Consolidate duplicated debug messages
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Some debug messages which designate specific test operations, or their
greater parts at least, sound always the same, no matter which subtest
they are called from.  Emit them, possibly updated with subtest
specified modifiers, from inside respective helpers instead of
duplicating them in subtest bodies.

v2: Rebase only.
v3: Refresh and extend over new case (local_drm_open_driver),
  - allow callers to specify a message suffix as well where applicable.
v4: Rename prefix/suffix string arguments to more meaningful when/why.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index e576a6c6c..fc239324a 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -54,10 +54,12 @@ struct hotunplug {
  * use drm_open_driver() since in case of an i915 device it opens it
  * twice and keeps a second file descriptor open for exit handler use.
  */
-static int local_drm_open_driver(void)
+static int local_drm_open_driver(const char *when, const char *why)
 {
 	int fd_drm;
 
+	igt_debug("%sopening device%s\n", when, why);
+
 	fd_drm = __drm_open_driver(DRIVER_ANY);
 	igt_assert_fd(fd_drm);
 
@@ -85,8 +87,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
-	igt_debug("opening device\n");
-	priv->fd.drm = local_drm_open_driver();
+	priv->fd.drm = local_drm_open_driver("", " for subtest");
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -104,8 +105,11 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 static const char *failure;
 
 /* Unbind the driver from the device */
-static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
+static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr,
+			  const char *prefix)
 {
+	igt_debug("%sunbinding the driver from the device\n", prefix);
+
 	failure = "Driver unbind timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
@@ -118,6 +122,8 @@ static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
 /* Re-bind the driver to the device */
 static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
 {
+	igt_debug("rebinding the driver to the device\n");
+
 	failure = "Driver re-bind timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
@@ -128,8 +134,10 @@ static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(int fd_sysfs_dev)
+static void device_unplug(int fd_sysfs_dev, const char *prefix)
 {
+	igt_debug("%sunplugging the device\n", prefix);
+
 	failure = "Device unplug timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
@@ -142,6 +150,8 @@ static void device_unplug(int fd_sysfs_dev)
 /* Re-discover the device by rescanning its bus */
 static void bus_rescan(int fd_sysfs_bus)
 {
+	igt_debug("rediscovering the device\n");
+
 	failure = "Bus rescan timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
@@ -158,9 +168,8 @@ static void healthcheck(void)
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	igt_debug("reopening the device\n");
 	failure = "Device reopen failure!";
-	fd_drm = local_drm_open_driver();
+	fd_drm = local_drm_open_driver("re", " for health check");
 	failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
@@ -199,10 +208,8 @@ static void unbind_rebind(void)
 	igt_debug("closing the device\n");
 	close(priv.fd.drm);
 
-	igt_debug("unbinding the driver from the device\n");
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "");
 
-	igt_debug("rebinding the driver to the device\n");
 	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
 
 	healthcheck();
@@ -217,10 +224,8 @@ static void unplug_rescan(void)
 	igt_debug("closing the device\n");
 	close(priv.fd.drm);
 
-	igt_debug("unplugging the device\n");
-	device_unplug(priv.fd.sysfs_dev);
+	device_unplug(priv.fd.sysfs_dev, "");
 
-	igt_debug("recovering the device\n");
 	bus_rescan(priv.fd.sysfs_bus);
 
 	healthcheck();
@@ -233,10 +238,8 @@ static void hotunbind_lateclose(void)
 
 	prepare(&priv, buf, sizeof(buf));
 
-	igt_debug("hot unbinding the driver from the device\n");
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "hot ");
 
-	igt_debug("rebinding the driver to the device\n");
 	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
 
 	igt_debug("late closing the unbound device instance\n");
@@ -251,10 +254,8 @@ static void hotunplug_lateclose(void)
 
 	prepare(&priv, NULL, 0);
 
-	igt_debug("hot unplugging the device\n");
-	device_unplug(priv.fd.sysfs_dev);
+	device_unplug(priv.fd.sysfs_dev, "hot ");
 
-	igt_debug("recovering the device\n");
 	bus_rescan(priv.fd.sysfs_bus);
 
 	igt_debug("late closing the removed device instance\n");
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 05/24] tests/core_hotunplug: Assert successful device filter application
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Return value of igt_device_filter_add() representing a number of
successfully installed device filters is now ignored.  Fail if not 1.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index fc239324a..2f7031094 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -193,7 +193,7 @@ static void set_filter_from_device(int fd)
 	igt_assert(realpath(path, dst));
 
 	igt_device_filter_free_all();
-	igt_device_filter_add(filter);
+	igt_assert_eq(igt_device_filter_add(filter), 1);
 }
 
 /* Subtests */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 05/24] tests/core_hotunplug: Assert successful device filter application
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Return value of igt_device_filter_add() representing a number of
successfully installed device filters is now ignored.  Fail if not 1.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index fc239324a..2f7031094 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -193,7 +193,7 @@ static void set_filter_from_device(int fd)
 	igt_assert(realpath(path, dst));
 
 	igt_device_filter_free_all();
-	igt_device_filter_add(filter);
+	igt_assert_eq(igt_device_filter_add(filter), 1);
 }
 
 /* Subtests */
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 06/24] tests/core_hotunplug: Maintain a single data structure instance
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The following changes to the test are planned:
- avoid global variables if possible,
- prepare invariant data only once per test run,
- skip subsequent subtests after device close errors,
- allow subtests to fail on errors and try to recover from those
  failures in follow up igt dixture sections instead of aborting.
For that to be possible, maintain a single instance of hotunplug
structure at igt_main level and pass it down to subtests.

v2: Commit description refreshed.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 56 ++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 2f7031094..cb5f10474 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -198,68 +198,62 @@ static void set_filter_from_device(int fd)
 
 /* Subtests */
 
-static void unbind_rebind(void)
+static void unbind_rebind(struct hotunplug *priv)
 {
-	struct hotunplug priv;
 	char buf[PATH_MAX];
 
-	prepare(&priv, buf, sizeof(buf));
+	prepare(priv, buf, sizeof(buf));
 
 	igt_debug("closing the device\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "");
+	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "");
 
-	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
 
 	healthcheck();
 }
 
-static void unplug_rescan(void)
+static void unplug_rescan(struct hotunplug *priv)
 {
-	struct hotunplug priv;
-
-	prepare(&priv, NULL, 0);
+	prepare(priv, NULL, 0);
 
 	igt_debug("closing the device\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
-	device_unplug(priv.fd.sysfs_dev, "");
+	device_unplug(priv->fd.sysfs_dev, "");
 
-	bus_rescan(priv.fd.sysfs_bus);
+	bus_rescan(priv->fd.sysfs_bus);
 
 	healthcheck();
 }
 
-static void hotunbind_lateclose(void)
+static void hotunbind_lateclose(struct hotunplug *priv)
 {
-	struct hotunplug priv;
 	char buf[PATH_MAX];
 
-	prepare(&priv, buf, sizeof(buf));
+	prepare(priv, buf, sizeof(buf));
 
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "hot ");
+	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "hot ");
 
-	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
 
 	igt_debug("late closing the unbound device instance\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
 	healthcheck();
 }
 
-static void hotunplug_lateclose(void)
+static void hotunplug_lateclose(struct hotunplug *priv)
 {
-	struct hotunplug priv;
-
-	prepare(&priv, NULL, 0);
+	prepare(priv, NULL, 0);
 
-	device_unplug(priv.fd.sysfs_dev, "hot ");
+	device_unplug(priv->fd.sysfs_dev, "hot ");
 
-	bus_rescan(priv.fd.sysfs_bus);
+	bus_rescan(priv->fd.sysfs_bus);
 
 	igt_debug("late closing the removed device instance\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
 	healthcheck();
 }
@@ -268,6 +262,8 @@ static void hotunplug_lateclose(void)
 
 igt_main
 {
+	struct hotunplug priv;
+
 	igt_fixture {
 		int fd_drm;
 
@@ -287,28 +283,28 @@ igt_main
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
 	igt_subtest("unbind-rebind")
-		unbind_rebind();
+		unbind_rebind(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
-		unplug_rescan();
+		unplug_rescan(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
-		hotunbind_lateclose();
+		hotunbind_lateclose(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
-		hotunplug_lateclose();
+		hotunplug_lateclose(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 06/24] tests/core_hotunplug: Maintain a single data structure instance
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

The following changes to the test are planned:
- avoid global variables if possible,
- prepare invariant data only once per test run,
- skip subsequent subtests after device close errors,
- allow subtests to fail on errors and try to recover from those
  failures in follow up igt dixture sections instead of aborting.
For that to be possible, maintain a single instance of hotunplug
structure at igt_main level and pass it down to subtests.

v2: Commit description refreshed.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 56 ++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 2f7031094..cb5f10474 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -198,68 +198,62 @@ static void set_filter_from_device(int fd)
 
 /* Subtests */
 
-static void unbind_rebind(void)
+static void unbind_rebind(struct hotunplug *priv)
 {
-	struct hotunplug priv;
 	char buf[PATH_MAX];
 
-	prepare(&priv, buf, sizeof(buf));
+	prepare(priv, buf, sizeof(buf));
 
 	igt_debug("closing the device\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "");
+	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "");
 
-	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
 
 	healthcheck();
 }
 
-static void unplug_rescan(void)
+static void unplug_rescan(struct hotunplug *priv)
 {
-	struct hotunplug priv;
-
-	prepare(&priv, NULL, 0);
+	prepare(priv, NULL, 0);
 
 	igt_debug("closing the device\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
-	device_unplug(priv.fd.sysfs_dev, "");
+	device_unplug(priv->fd.sysfs_dev, "");
 
-	bus_rescan(priv.fd.sysfs_bus);
+	bus_rescan(priv->fd.sysfs_bus);
 
 	healthcheck();
 }
 
-static void hotunbind_lateclose(void)
+static void hotunbind_lateclose(struct hotunplug *priv)
 {
-	struct hotunplug priv;
 	char buf[PATH_MAX];
 
-	prepare(&priv, buf, sizeof(buf));
+	prepare(priv, buf, sizeof(buf));
 
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "hot ");
+	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "hot ");
 
-	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
 
 	igt_debug("late closing the unbound device instance\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
 	healthcheck();
 }
 
-static void hotunplug_lateclose(void)
+static void hotunplug_lateclose(struct hotunplug *priv)
 {
-	struct hotunplug priv;
-
-	prepare(&priv, NULL, 0);
+	prepare(priv, NULL, 0);
 
-	device_unplug(priv.fd.sysfs_dev, "hot ");
+	device_unplug(priv->fd.sysfs_dev, "hot ");
 
-	bus_rescan(priv.fd.sysfs_bus);
+	bus_rescan(priv->fd.sysfs_bus);
 
 	igt_debug("late closing the removed device instance\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
 	healthcheck();
 }
@@ -268,6 +262,8 @@ static void hotunplug_lateclose(void)
 
 igt_main
 {
+	struct hotunplug priv;
+
 	igt_fixture {
 		int fd_drm;
 
@@ -287,28 +283,28 @@ igt_main
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
 	igt_subtest("unbind-rebind")
-		unbind_rebind();
+		unbind_rebind(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
-		unplug_rescan();
+		unplug_rescan(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
-		hotunbind_lateclose();
+		hotunbind_lateclose(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
-		hotunplug_lateclose();
+		hotunplug_lateclose(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 07/24] tests/core_hotunplug: Pass errors via a data structure field
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

A pointer to fatal error messages can be passed around via hotunplug
structure, no need to declare it as global.

v2: Rebase only.
v3: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 96 +++++++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 49 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index cb5f10474..5c9c4d8bf 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -45,6 +45,7 @@ struct hotunplug {
 		int sysfs_drv;
 	} fd;
 	const char *dev_bus_addr;
+	const char *failure;
 };
 
 /* Helpers */
@@ -102,80 +103,77 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 	}
 }
 
-static const char *failure;
-
 /* Unbind the driver from the device */
-static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr,
-			  const char *prefix)
+static void driver_unbind(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
 
-	failure = "Driver unbind timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
+	priv->failure = "Driver unbind timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	/* don't close fd_sysfs_drv, it will be used for driver rebinding */
+	/* don't close fd.sysfs_drv, it will be used for driver rebinding */
 }
 
 /* Re-bind the driver to the device */
-static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
+static void driver_bind(struct hotunplug *priv)
 {
 	igt_debug("rebinding the driver to the device\n");
 
-	failure = "Driver re-bind timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
+	priv->failure = "Driver re-bind timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_drv);
+	close(priv->fd.sysfs_drv);
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(int fd_sysfs_dev, const char *prefix)
+static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunplugging the device\n", prefix);
 
-	failure = "Device unplug timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
+	priv->failure = "Device unplug timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_dev, "device/remove", "1");
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_dev);
+	close(priv->fd.sysfs_dev);
 }
 
 /* Re-discover the device by rescanning its bus */
-static void bus_rescan(int fd_sysfs_bus)
+static void bus_rescan(struct hotunplug *priv)
 {
 	igt_debug("rediscovering the device\n");
 
-	failure = "Bus rescan timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
+	priv->failure = "Bus rescan timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_bus, "rescan", "1");
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_bus);
+	close(priv->fd.sysfs_bus);
 }
 
-static void healthcheck(void)
+static void healthcheck(struct hotunplug *priv)
 {
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	failure = "Device reopen failure!";
+	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for health check");
-	failure = NULL;
+	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
-		failure = "GEM failure";
+		priv->failure = "GEM failure";
 		igt_require_gem(fd_drm);
-		failure = NULL;
+		priv->failure = NULL;
 	}
 
 	close(fd_drm);
@@ -207,11 +205,11 @@ static void unbind_rebind(struct hotunplug *priv)
 	igt_debug("closing the device\n");
 	close(priv->fd.drm);
 
-	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "");
+	driver_unbind(priv, "");
 
-	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
+	driver_bind(priv);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void unplug_rescan(struct hotunplug *priv)
@@ -221,11 +219,11 @@ static void unplug_rescan(struct hotunplug *priv)
 	igt_debug("closing the device\n");
 	close(priv->fd.drm);
 
-	device_unplug(priv->fd.sysfs_dev, "");
+	device_unplug(priv, "");
 
-	bus_rescan(priv->fd.sysfs_bus);
+	bus_rescan(priv);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void hotunbind_lateclose(struct hotunplug *priv)
@@ -234,35 +232,35 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 	prepare(priv, buf, sizeof(buf));
 
-	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "hot ");
+	driver_unbind(priv, "hot ");
 
-	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
+	driver_bind(priv);
 
 	igt_debug("late closing the unbound device instance\n");
 	close(priv->fd.drm);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
 	prepare(priv, NULL, 0);
 
-	device_unplug(priv->fd.sysfs_dev, "hot ");
+	device_unplug(priv, "hot ");
 
-	bus_rescan(priv->fd.sysfs_bus);
+	bus_rescan(priv);
 
 	igt_debug("late closing the removed device instance\n");
 	close(priv->fd.drm);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 /* Main */
 
 igt_main
 {
-	struct hotunplug priv;
+	struct hotunplug priv = { .failure = NULL, };
 
 	igt_fixture {
 		int fd_drm;
@@ -286,26 +284,26 @@ igt_main
 		unbind_rebind(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
 		unplug_rescan(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
 		hotunbind_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 07/24] tests/core_hotunplug: Pass errors via a data structure field
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

A pointer to fatal error messages can be passed around via hotunplug
structure, no need to declare it as global.

v2: Rebase only.
v3: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 96 +++++++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 49 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index cb5f10474..5c9c4d8bf 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -45,6 +45,7 @@ struct hotunplug {
 		int sysfs_drv;
 	} fd;
 	const char *dev_bus_addr;
+	const char *failure;
 };
 
 /* Helpers */
@@ -102,80 +103,77 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 	}
 }
 
-static const char *failure;
-
 /* Unbind the driver from the device */
-static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr,
-			  const char *prefix)
+static void driver_unbind(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
 
-	failure = "Driver unbind timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
+	priv->failure = "Driver unbind timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	/* don't close fd_sysfs_drv, it will be used for driver rebinding */
+	/* don't close fd.sysfs_drv, it will be used for driver rebinding */
 }
 
 /* Re-bind the driver to the device */
-static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
+static void driver_bind(struct hotunplug *priv)
 {
 	igt_debug("rebinding the driver to the device\n");
 
-	failure = "Driver re-bind timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
+	priv->failure = "Driver re-bind timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_drv);
+	close(priv->fd.sysfs_drv);
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(int fd_sysfs_dev, const char *prefix)
+static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunplugging the device\n", prefix);
 
-	failure = "Device unplug timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
+	priv->failure = "Device unplug timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_dev, "device/remove", "1");
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_dev);
+	close(priv->fd.sysfs_dev);
 }
 
 /* Re-discover the device by rescanning its bus */
-static void bus_rescan(int fd_sysfs_bus)
+static void bus_rescan(struct hotunplug *priv)
 {
 	igt_debug("rediscovering the device\n");
 
-	failure = "Bus rescan timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
+	priv->failure = "Bus rescan timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_bus, "rescan", "1");
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_bus);
+	close(priv->fd.sysfs_bus);
 }
 
-static void healthcheck(void)
+static void healthcheck(struct hotunplug *priv)
 {
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	failure = "Device reopen failure!";
+	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for health check");
-	failure = NULL;
+	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
-		failure = "GEM failure";
+		priv->failure = "GEM failure";
 		igt_require_gem(fd_drm);
-		failure = NULL;
+		priv->failure = NULL;
 	}
 
 	close(fd_drm);
@@ -207,11 +205,11 @@ static void unbind_rebind(struct hotunplug *priv)
 	igt_debug("closing the device\n");
 	close(priv->fd.drm);
 
-	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "");
+	driver_unbind(priv, "");
 
-	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
+	driver_bind(priv);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void unplug_rescan(struct hotunplug *priv)
@@ -221,11 +219,11 @@ static void unplug_rescan(struct hotunplug *priv)
 	igt_debug("closing the device\n");
 	close(priv->fd.drm);
 
-	device_unplug(priv->fd.sysfs_dev, "");
+	device_unplug(priv, "");
 
-	bus_rescan(priv->fd.sysfs_bus);
+	bus_rescan(priv);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void hotunbind_lateclose(struct hotunplug *priv)
@@ -234,35 +232,35 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 	prepare(priv, buf, sizeof(buf));
 
-	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "hot ");
+	driver_unbind(priv, "hot ");
 
-	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
+	driver_bind(priv);
 
 	igt_debug("late closing the unbound device instance\n");
 	close(priv->fd.drm);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
 	prepare(priv, NULL, 0);
 
-	device_unplug(priv->fd.sysfs_dev, "hot ");
+	device_unplug(priv, "hot ");
 
-	bus_rescan(priv->fd.sysfs_bus);
+	bus_rescan(priv);
 
 	igt_debug("late closing the removed device instance\n");
 	close(priv->fd.drm);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 /* Main */
 
 igt_main
 {
-	struct hotunplug priv;
+	struct hotunplug priv = { .failure = NULL, };
 
 	igt_fixture {
 		int fd_drm;
@@ -286,26 +284,26 @@ igt_main
 		unbind_rebind(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
 		unplug_rescan(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
 		hotunbind_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 }
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 08/24] tests/core_hotunplug: Handle device close errors
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The test now ignores device close errors.  Those errors are believed to
have no influence on device health so there is no need to process them
the same way as we mostly do on errors, i.e., notify CI about a problem
via igt_abort.  However, those errors may indicate issues with the test
itself.  Moreover, impact of those errors on operations performed by
subtests, like driver unbind or device remove, should be perceived as
undefined.  Then, we should fail as soon as a device or device sysfs
node close error occurs in a subtest and also skip subsequent subtests.
However, once a driver unbind or device unplug operation has been
attempted by a subtest, we would still like to check the device health.

When in a subtest, store results of device close operations for future
reference.  Reuse file descriptor fields of the hotunplug structure for
that.  Unless in between of a driver remove or device unplug operation
and a successful device health check completion, fail current test
section right after a device close error occurs, warn otherwise.  If
still running, examine device file descriptor fields in subsequent
igt_fixture sections and skip on errors.

v2: Fix a typo in post_healthcheck function name.
v3: Don't fail on close error after successful health check, warn only,
  - move duplicated device close error messages to helpers.
v4: Assert device file descriptors closed cleanly on start of each
    subtest.
v5: Update device status on open for health check if not yet dirty,
  - move device close debug messages to helper.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 76 ++++++++++++++++++++++++++++++++----------
 1 file changed, 58 insertions(+), 18 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 5c9c4d8bf..51de942ba 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -43,7 +43,7 @@ struct hotunplug {
 		int sysfs_dev;
 		int sysfs_bus;
 		int sysfs_drv;
-	} fd;
+	} fd;	/* >= 0: valid fd, == -1: closed, < -1: close failed */
 	const char *dev_bus_addr;
 	const char *failure;
 };
@@ -67,6 +67,26 @@ static int local_drm_open_driver(const char *when, const char *why)
 	return fd_drm;
 }
 
+static int local_close(int fd, const char *warning)
+{
+	errno = 0;
+	if (igt_warn_on_f(close(fd), "%s\n", warning))
+		return -errno;	/* (never -1) */
+
+	return -1;	/* success - return 'closed' */
+}
+
+static int close_device(int fd_drm, const char *when, const char *which)
+{
+	igt_debug("%sclosing %sdevice instance\n", when, which);
+	return local_close(fd_drm, "Device close failed");
+}
+
+static int close_sysfs(int fd_sysfs_dev)
+{
+	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
+}
+
 static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 {
 	int len;
@@ -83,11 +103,16 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 	igt_assert(priv->dev_bus_addr++);
 
 	/* sysfs_dev no longer needed */
-	close(priv->fd.sysfs_dev);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
+	/* assert device file descriptors closed cleanly on subtest start */
+	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
+
 	priv->fd.drm = local_drm_open_driver("", " for subtest");
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
@@ -142,7 +167,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_reset_timeout();
 	priv->failure = NULL;
 
-	close(priv->fd.sysfs_dev);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -161,6 +186,8 @@ static void bus_rescan(struct hotunplug *priv)
 
 static void healthcheck(struct hotunplug *priv)
 {
+	/* preserve potentially dirty device status stored in priv->fd.drm */
+	bool closed = priv->fd.drm == -1;
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
@@ -168,6 +195,8 @@ static void healthcheck(struct hotunplug *priv)
 
 	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for health check");
+	if (closed)	/* store fd for post_healthcheck if not dirty */
+		priv->fd.drm = fd_drm;
 	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
@@ -176,7 +205,17 @@ static void healthcheck(struct hotunplug *priv)
 		priv->failure = NULL;
 	}
 
-	close(fd_drm);
+	fd_drm = close_device(fd_drm, "", "health checked ");
+	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
+		priv->fd.drm = fd_drm;
+}
+
+static void post_healthcheck(struct hotunplug *priv)
+{
+	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
+
+	igt_require(priv->fd.drm == -1);
+	igt_require(priv->fd.sysfs_dev == -1);
 }
 
 static void set_filter_from_device(int fd)
@@ -202,8 +241,8 @@ static void unbind_rebind(struct hotunplug *priv)
 
 	prepare(priv, buf, sizeof(buf));
 
-	igt_debug("closing the device\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	driver_unbind(priv, "");
 
@@ -216,8 +255,8 @@ static void unplug_rescan(struct hotunplug *priv)
 {
 	prepare(priv, NULL, 0);
 
-	igt_debug("closing the device\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	device_unplug(priv, "");
 
@@ -236,8 +275,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 	driver_bind(priv);
 
-	igt_debug("late closing the unbound device instance\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 
 	healthcheck(priv);
 }
@@ -250,8 +288,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 
 	bus_rescan(priv);
 
-	igt_debug("late closing the removed device instance\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 
 	healthcheck(priv);
 }
@@ -260,7 +297,10 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 
 igt_main
 {
-	struct hotunplug priv = { .failure = NULL, };
+	struct hotunplug priv = {
+		.fd		= { .drm = -1, .sysfs_dev = -1, },
+		.failure	= NULL,
+	};
 
 	igt_fixture {
 		int fd_drm;
@@ -276,7 +316,7 @@ igt_main
 		/* Make sure subtests always reopen the same device */
 		set_filter_from_device(fd_drm);
 
-		close(fd_drm);
+		igt_assert_eq(close_device(fd_drm, "", "selected "), -1);
 	}
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
@@ -284,26 +324,26 @@ igt_main
 		unbind_rebind(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
 		unplug_rescan(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
 		hotunbind_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 08/24] tests/core_hotunplug: Handle device close errors
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

The test now ignores device close errors.  Those errors are believed to
have no influence on device health so there is no need to process them
the same way as we mostly do on errors, i.e., notify CI about a problem
via igt_abort.  However, those errors may indicate issues with the test
itself.  Moreover, impact of those errors on operations performed by
subtests, like driver unbind or device remove, should be perceived as
undefined.  Then, we should fail as soon as a device or device sysfs
node close error occurs in a subtest and also skip subsequent subtests.
However, once a driver unbind or device unplug operation has been
attempted by a subtest, we would still like to check the device health.

When in a subtest, store results of device close operations for future
reference.  Reuse file descriptor fields of the hotunplug structure for
that.  Unless in between of a driver remove or device unplug operation
and a successful device health check completion, fail current test
section right after a device close error occurs, warn otherwise.  If
still running, examine device file descriptor fields in subsequent
igt_fixture sections and skip on errors.

v2: Fix a typo in post_healthcheck function name.
v3: Don't fail on close error after successful health check, warn only,
  - move duplicated device close error messages to helpers.
v4: Assert device file descriptors closed cleanly on start of each
    subtest.
v5: Update device status on open for health check if not yet dirty,
  - move device close debug messages to helper.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 76 ++++++++++++++++++++++++++++++++----------
 1 file changed, 58 insertions(+), 18 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 5c9c4d8bf..51de942ba 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -43,7 +43,7 @@ struct hotunplug {
 		int sysfs_dev;
 		int sysfs_bus;
 		int sysfs_drv;
-	} fd;
+	} fd;	/* >= 0: valid fd, == -1: closed, < -1: close failed */
 	const char *dev_bus_addr;
 	const char *failure;
 };
@@ -67,6 +67,26 @@ static int local_drm_open_driver(const char *when, const char *why)
 	return fd_drm;
 }
 
+static int local_close(int fd, const char *warning)
+{
+	errno = 0;
+	if (igt_warn_on_f(close(fd), "%s\n", warning))
+		return -errno;	/* (never -1) */
+
+	return -1;	/* success - return 'closed' */
+}
+
+static int close_device(int fd_drm, const char *when, const char *which)
+{
+	igt_debug("%sclosing %sdevice instance\n", when, which);
+	return local_close(fd_drm, "Device close failed");
+}
+
+static int close_sysfs(int fd_sysfs_dev)
+{
+	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
+}
+
 static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 {
 	int len;
@@ -83,11 +103,16 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 	igt_assert(priv->dev_bus_addr++);
 
 	/* sysfs_dev no longer needed */
-	close(priv->fd.sysfs_dev);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
+	/* assert device file descriptors closed cleanly on subtest start */
+	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
+
 	priv->fd.drm = local_drm_open_driver("", " for subtest");
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
@@ -142,7 +167,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_reset_timeout();
 	priv->failure = NULL;
 
-	close(priv->fd.sysfs_dev);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -161,6 +186,8 @@ static void bus_rescan(struct hotunplug *priv)
 
 static void healthcheck(struct hotunplug *priv)
 {
+	/* preserve potentially dirty device status stored in priv->fd.drm */
+	bool closed = priv->fd.drm == -1;
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
@@ -168,6 +195,8 @@ static void healthcheck(struct hotunplug *priv)
 
 	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for health check");
+	if (closed)	/* store fd for post_healthcheck if not dirty */
+		priv->fd.drm = fd_drm;
 	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
@@ -176,7 +205,17 @@ static void healthcheck(struct hotunplug *priv)
 		priv->failure = NULL;
 	}
 
-	close(fd_drm);
+	fd_drm = close_device(fd_drm, "", "health checked ");
+	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
+		priv->fd.drm = fd_drm;
+}
+
+static void post_healthcheck(struct hotunplug *priv)
+{
+	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
+
+	igt_require(priv->fd.drm == -1);
+	igt_require(priv->fd.sysfs_dev == -1);
 }
 
 static void set_filter_from_device(int fd)
@@ -202,8 +241,8 @@ static void unbind_rebind(struct hotunplug *priv)
 
 	prepare(priv, buf, sizeof(buf));
 
-	igt_debug("closing the device\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	driver_unbind(priv, "");
 
@@ -216,8 +255,8 @@ static void unplug_rescan(struct hotunplug *priv)
 {
 	prepare(priv, NULL, 0);
 
-	igt_debug("closing the device\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	device_unplug(priv, "");
 
@@ -236,8 +275,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 	driver_bind(priv);
 
-	igt_debug("late closing the unbound device instance\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 
 	healthcheck(priv);
 }
@@ -250,8 +288,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 
 	bus_rescan(priv);
 
-	igt_debug("late closing the removed device instance\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 
 	healthcheck(priv);
 }
@@ -260,7 +297,10 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 
 igt_main
 {
-	struct hotunplug priv = { .failure = NULL, };
+	struct hotunplug priv = {
+		.fd		= { .drm = -1, .sysfs_dev = -1, },
+		.failure	= NULL,
+	};
 
 	igt_fixture {
 		int fd_drm;
@@ -276,7 +316,7 @@ igt_main
 		/* Make sure subtests always reopen the same device */
 		set_filter_from_device(fd_drm);
 
-		close(fd_drm);
+		igt_assert_eq(close_device(fd_drm, "", "selected "), -1);
 	}
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
@@ -284,26 +324,26 @@ igt_main
 		unbind_rebind(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
 		unplug_rescan(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
 		hotunbind_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 }
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 09/24] tests/core_hotunplug: Prepare invariant data once per test run
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Each subtest now calls a prepare() helper which opens a couple of files
required by that subtest.  Those files are then closed after use,
either directly from the subtest body, or indirectly from inside one of
helper functions called during the subtest execution.  That approach
not only makes life cycle of individual file descriptors difficult to
follow but also prevents us from re-running health checks on subtest
failures from follow up igt_fixture sections since we may need to retry
bus rescan or driver rebind operations.

Two of those files - device bus and driver sysfs nodes - are not
affected nor interfere with driver unbind / device unplug operations
performed by subtests.  Then, there is not much sense in closing and
reopening those nodes.  Open them once at the beginning of a test run,
then close them as late as on test completion.

The prepare() helper also populates a device bus address string used by
driver unbind / rebind operations.  Since the bus address of an
exercised device never changes, also prepare that string only once at
the beginning of a test run.  Note that it is the same as the last
component of a device filter string which is already resolved and
installed from an initial igt_fixture section of the test.  Then,
initialize the device bus address field of a hotunplug structure
instance with a pointer to the respective substring of that filter
rather than resolving it again from the device sysfs node pathname.

There is one more sysfs node - a DRM device node - now opened by the
prepare() helper for subtests which perform device remove operations.
That node can't be opened only once at the beginning of a test run
because its open file descriptor is no longer usable as soon as a
driver unbind operation is performed.  On the other hand, it can't be
opened easily from inside a device_remove() helper since some subtests
just don't open the device so its file descriptor used by
igt_sysfs_open() may just not be available.  However, note that only a
PCI sysfs node of the device, not necessarily the DRM one, is actually
required for a successful device remove operation, and that node can be
opened easily from a bus file descriptor using a device bus address
string, both already available.  Then, change the semantics of a
.fd.sysfs_dev field of the hotunplug structure from DRM to PCI device
sysfs file descriptor, then let the device_remove() helper open the
device PCI node by itself and store its file descriptor in that field.
Also, for still more easy access to the device PCI node, use a
'subsystem/devices' sub-node of the PCI device as its bus sysfs
location instead of just 'subsystem', then adjust a relative path to
the bus 'rescan' function accordingly.

A side benefit of using the PCI device sysfs node, not the DRM one,
while removing the device is that a future subtest may now easily
perform both driver unbind and device remove operations in a row.

v2: Rebase only.
v3: Refresh.
v4: Refresh,
    still assert a device dile descriptor closed cleanly on subtest
    start, a device sysfs file descriptor still before open.

Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 83 +++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 50 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 51de942ba..7f5e800c6 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -87,45 +87,31 @@ static int close_sysfs(int fd_sysfs_dev)
 	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
 }
 
-static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
+static void prepare(struct hotunplug *priv)
 {
-	int len;
+	const char *filter = igt_device_filter_get(0), *sysfs_path;
 
-	igt_assert(buflen);
+	igt_assert(filter);
 
-	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
-				    O_DIRECTORY);
-	igt_assert_fd(priv->fd.sysfs_drv);
-
-	len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
-	buf[len] = '\0';
-	priv->dev_bus_addr = strrchr(buf, '/');
+	priv->dev_bus_addr = strrchr(filter, '/');
 	igt_assert(priv->dev_bus_addr++);
 
-	/* sysfs_dev no longer needed */
-	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
-}
+	sysfs_path = strchr(filter, ':');
+	igt_assert(sysfs_path++);
 
-static void prepare(struct hotunplug *priv, char *buf, int buflen)
-{
-	/* assert device file descriptors closed cleanly on subtest start */
-	igt_assert_eq(priv->fd.drm, -1);
 	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	priv->fd.sysfs_dev = open(sysfs_path, O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_dev);
 
-	priv->fd.drm = local_drm_open_driver("", " for subtest");
+	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "driver", O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_drv);
 
-	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
-	igt_assert_fd(priv->fd.sysfs_dev);
+	priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev, "subsystem/devices",
+				    O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_bus);
 
-	if (buf) {
-		prepare_for_unbind(priv, buf, buflen);
-	} else {
-		/* prepare for bus rescan */
-		priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
-					    "device/subsystem", O_DIRECTORY);
-		igt_assert_fd(priv->fd.sysfs_bus);
-	}
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Unbind the driver from the device */
@@ -138,8 +124,6 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix)
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	/* don't close fd.sysfs_drv, it will be used for driver rebinding */
 }
 
 /* Re-bind the driver to the device */
@@ -152,18 +136,21 @@ static void driver_bind(struct hotunplug *priv)
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	close(priv->fd.sysfs_drv);
 }
 
 /* Remove (virtually unplug) the device from its bus */
 static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	priv->fd.sysfs_dev = openat(priv->fd.sysfs_bus, priv->dev_bus_addr,
+				    O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_dev);
+
 	igt_debug("%sunplugging the device\n", prefix);
 
 	priv->failure = "Device unplug timeout!";
 	igt_set_timeout(60, priv->failure);
-	igt_sysfs_set(priv->fd.sysfs_dev, "device/remove", "1");
+	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
 	priv->failure = NULL;
 
@@ -177,11 +164,9 @@ static void bus_rescan(struct hotunplug *priv)
 
 	priv->failure = "Bus rescan timeout!";
 	igt_set_timeout(60, priv->failure);
-	igt_sysfs_set(priv->fd.sysfs_bus, "rescan", "1");
+	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	close(priv->fd.sysfs_bus);
 }
 
 static void healthcheck(struct hotunplug *priv)
@@ -237,11 +222,6 @@ static void set_filter_from_device(int fd)
 
 static void unbind_rebind(struct hotunplug *priv)
 {
-	char buf[PATH_MAX];
-
-	prepare(priv, buf, sizeof(buf));
-
-	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
 	igt_assert_eq(priv->fd.drm, -1);
 
 	driver_unbind(priv, "");
@@ -253,9 +233,6 @@ static void unbind_rebind(struct hotunplug *priv)
 
 static void unplug_rescan(struct hotunplug *priv)
 {
-	prepare(priv, NULL, 0);
-
-	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
 	igt_assert_eq(priv->fd.drm, -1);
 
 	device_unplug(priv, "");
@@ -267,9 +244,8 @@ static void unplug_rescan(struct hotunplug *priv)
 
 static void hotunbind_lateclose(struct hotunplug *priv)
 {
-	char buf[PATH_MAX];
-
-	prepare(priv, buf, sizeof(buf));
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver("", " for hot unbind");
 
 	driver_unbind(priv, "hot ");
 
@@ -282,7 +258,8 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
-	prepare(priv, NULL, 0);
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver("", " for hot unplug");
 
 	device_unplug(priv, "hot ");
 
@@ -317,6 +294,8 @@ igt_main
 		set_filter_from_device(fd_drm);
 
 		igt_assert_eq(close_device(fd_drm, "", "selected "), -1);
+
+		prepare(&priv);
 	}
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
@@ -344,6 +323,10 @@ igt_main
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
-	igt_fixture
+	igt_fixture {
 		post_healthcheck(&priv);
+
+		close(priv.fd.sysfs_bus);
+		close(priv.fd.sysfs_drv);
+	}
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 09/24] tests/core_hotunplug: Prepare invariant data once per test run
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Each subtest now calls a prepare() helper which opens a couple of files
required by that subtest.  Those files are then closed after use,
either directly from the subtest body, or indirectly from inside one of
helper functions called during the subtest execution.  That approach
not only makes life cycle of individual file descriptors difficult to
follow but also prevents us from re-running health checks on subtest
failures from follow up igt_fixture sections since we may need to retry
bus rescan or driver rebind operations.

Two of those files - device bus and driver sysfs nodes - are not
affected nor interfere with driver unbind / device unplug operations
performed by subtests.  Then, there is not much sense in closing and
reopening those nodes.  Open them once at the beginning of a test run,
then close them as late as on test completion.

The prepare() helper also populates a device bus address string used by
driver unbind / rebind operations.  Since the bus address of an
exercised device never changes, also prepare that string only once at
the beginning of a test run.  Note that it is the same as the last
component of a device filter string which is already resolved and
installed from an initial igt_fixture section of the test.  Then,
initialize the device bus address field of a hotunplug structure
instance with a pointer to the respective substring of that filter
rather than resolving it again from the device sysfs node pathname.

There is one more sysfs node - a DRM device node - now opened by the
prepare() helper for subtests which perform device remove operations.
That node can't be opened only once at the beginning of a test run
because its open file descriptor is no longer usable as soon as a
driver unbind operation is performed.  On the other hand, it can't be
opened easily from inside a device_remove() helper since some subtests
just don't open the device so its file descriptor used by
igt_sysfs_open() may just not be available.  However, note that only a
PCI sysfs node of the device, not necessarily the DRM one, is actually
required for a successful device remove operation, and that node can be
opened easily from a bus file descriptor using a device bus address
string, both already available.  Then, change the semantics of a
.fd.sysfs_dev field of the hotunplug structure from DRM to PCI device
sysfs file descriptor, then let the device_remove() helper open the
device PCI node by itself and store its file descriptor in that field.
Also, for still more easy access to the device PCI node, use a
'subsystem/devices' sub-node of the PCI device as its bus sysfs
location instead of just 'subsystem', then adjust a relative path to
the bus 'rescan' function accordingly.

A side benefit of using the PCI device sysfs node, not the DRM one,
while removing the device is that a future subtest may now easily
perform both driver unbind and device remove operations in a row.

v2: Rebase only.
v3: Refresh.
v4: Refresh,
    still assert a device dile descriptor closed cleanly on subtest
    start, a device sysfs file descriptor still before open.

Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 83 +++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 50 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 51de942ba..7f5e800c6 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -87,45 +87,31 @@ static int close_sysfs(int fd_sysfs_dev)
 	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
 }
 
-static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
+static void prepare(struct hotunplug *priv)
 {
-	int len;
+	const char *filter = igt_device_filter_get(0), *sysfs_path;
 
-	igt_assert(buflen);
+	igt_assert(filter);
 
-	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
-				    O_DIRECTORY);
-	igt_assert_fd(priv->fd.sysfs_drv);
-
-	len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
-	buf[len] = '\0';
-	priv->dev_bus_addr = strrchr(buf, '/');
+	priv->dev_bus_addr = strrchr(filter, '/');
 	igt_assert(priv->dev_bus_addr++);
 
-	/* sysfs_dev no longer needed */
-	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
-}
+	sysfs_path = strchr(filter, ':');
+	igt_assert(sysfs_path++);
 
-static void prepare(struct hotunplug *priv, char *buf, int buflen)
-{
-	/* assert device file descriptors closed cleanly on subtest start */
-	igt_assert_eq(priv->fd.drm, -1);
 	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	priv->fd.sysfs_dev = open(sysfs_path, O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_dev);
 
-	priv->fd.drm = local_drm_open_driver("", " for subtest");
+	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "driver", O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_drv);
 
-	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
-	igt_assert_fd(priv->fd.sysfs_dev);
+	priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev, "subsystem/devices",
+				    O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_bus);
 
-	if (buf) {
-		prepare_for_unbind(priv, buf, buflen);
-	} else {
-		/* prepare for bus rescan */
-		priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
-					    "device/subsystem", O_DIRECTORY);
-		igt_assert_fd(priv->fd.sysfs_bus);
-	}
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Unbind the driver from the device */
@@ -138,8 +124,6 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix)
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	/* don't close fd.sysfs_drv, it will be used for driver rebinding */
 }
 
 /* Re-bind the driver to the device */
@@ -152,18 +136,21 @@ static void driver_bind(struct hotunplug *priv)
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	close(priv->fd.sysfs_drv);
 }
 
 /* Remove (virtually unplug) the device from its bus */
 static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	priv->fd.sysfs_dev = openat(priv->fd.sysfs_bus, priv->dev_bus_addr,
+				    O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_dev);
+
 	igt_debug("%sunplugging the device\n", prefix);
 
 	priv->failure = "Device unplug timeout!";
 	igt_set_timeout(60, priv->failure);
-	igt_sysfs_set(priv->fd.sysfs_dev, "device/remove", "1");
+	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
 	priv->failure = NULL;
 
@@ -177,11 +164,9 @@ static void bus_rescan(struct hotunplug *priv)
 
 	priv->failure = "Bus rescan timeout!";
 	igt_set_timeout(60, priv->failure);
-	igt_sysfs_set(priv->fd.sysfs_bus, "rescan", "1");
+	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	close(priv->fd.sysfs_bus);
 }
 
 static void healthcheck(struct hotunplug *priv)
@@ -237,11 +222,6 @@ static void set_filter_from_device(int fd)
 
 static void unbind_rebind(struct hotunplug *priv)
 {
-	char buf[PATH_MAX];
-
-	prepare(priv, buf, sizeof(buf));
-
-	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
 	igt_assert_eq(priv->fd.drm, -1);
 
 	driver_unbind(priv, "");
@@ -253,9 +233,6 @@ static void unbind_rebind(struct hotunplug *priv)
 
 static void unplug_rescan(struct hotunplug *priv)
 {
-	prepare(priv, NULL, 0);
-
-	priv->fd.drm = close_device(priv->fd.drm, "", "exercised ");
 	igt_assert_eq(priv->fd.drm, -1);
 
 	device_unplug(priv, "");
@@ -267,9 +244,8 @@ static void unplug_rescan(struct hotunplug *priv)
 
 static void hotunbind_lateclose(struct hotunplug *priv)
 {
-	char buf[PATH_MAX];
-
-	prepare(priv, buf, sizeof(buf));
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver("", " for hot unbind");
 
 	driver_unbind(priv, "hot ");
 
@@ -282,7 +258,8 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
-	prepare(priv, NULL, 0);
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver("", " for hot unplug");
 
 	device_unplug(priv, "hot ");
 
@@ -317,6 +294,8 @@ igt_main
 		set_filter_from_device(fd_drm);
 
 		igt_assert_eq(close_device(fd_drm, "", "selected "), -1);
+
+		prepare(&priv);
 	}
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
@@ -344,6 +323,10 @@ igt_main
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
-	igt_fixture
+	igt_fixture {
 		post_healthcheck(&priv);
+
+		close(priv.fd.sysfs_bus);
+		close(priv.fd.sysfs_drv);
+	}
 }
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 10/24] tests/core_hotunplug: Skip selectively on sysfs close errors
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Since we no longer open a device DRM sysfs node, only a PCI one, driver
unbind operations are no longer affected by missed or unsuccessful
sysfs file close attempts.  Skip only affected subtests if that
happens.

v2: Rebase only.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7f5e800c6..d51526029 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -111,7 +111,6 @@ static void prepare(struct hotunplug *priv)
 	igt_assert_fd(priv->fd.sysfs_bus);
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Unbind the driver from the device */
@@ -141,7 +140,8 @@ static void driver_bind(struct hotunplug *priv)
 /* Remove (virtually unplug) the device from its bus */
 static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	igt_require(priv->fd.sysfs_dev == -1);
+
 	priv->fd.sysfs_dev = openat(priv->fd.sysfs_bus, priv->dev_bus_addr,
 				    O_DIRECTORY);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -200,7 +200,6 @@ static void post_healthcheck(struct hotunplug *priv)
 	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
 
 	igt_require(priv->fd.drm == -1);
-	igt_require(priv->fd.sysfs_dev == -1);
 }
 
 static void set_filter_from_device(int fd)
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 10/24] tests/core_hotunplug: Skip selectively on sysfs close errors
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Since we no longer open a device DRM sysfs node, only a PCI one, driver
unbind operations are no longer affected by missed or unsuccessful
sysfs file close attempts.  Skip only affected subtests if that
happens.

v2: Rebase only.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7f5e800c6..d51526029 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -111,7 +111,6 @@ static void prepare(struct hotunplug *priv)
 	igt_assert_fd(priv->fd.sysfs_bus);
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Unbind the driver from the device */
@@ -141,7 +140,8 @@ static void driver_bind(struct hotunplug *priv)
 /* Remove (virtually unplug) the device from its bus */
 static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	igt_require(priv->fd.sysfs_dev == -1);
+
 	priv->fd.sysfs_dev = openat(priv->fd.sysfs_bus, priv->dev_bus_addr,
 				    O_DIRECTORY);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -200,7 +200,6 @@ static void post_healthcheck(struct hotunplug *priv)
 	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
 
 	igt_require(priv->fd.drm == -1);
-	igt_require(priv->fd.sysfs_dev == -1);
 }
 
 static void set_filter_from_device(int fd)
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 11/24] tests/core_hotunplug: Recover from subtest failures
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Subtests now forcibly call or request igt_abort on failures in order to
avoid silently leaving an exercised device in an unusable state.
However, a failure inside a subtest doesn't always mean the device is
no longer working correctly and reboot is needed.  On the other hand,
if a subtest just fails without aborting, that doesn't mean in turn the
device is healthy.  We should still perform a device health check
in that case before deciding on next steps.

Reuse the 'failure' structure field as a mark which is set before each
critical operation is executed that must be followed by a successful
health check in order to avoid aborting the test.  Then, follow each
subtest with its individual igt_fixture section, from where device file
descriptors potentially left open are closed, device rediscover or
driver rebing operation is run as needed, and finally the health check
is run again if the preceding igt_subtest section has exited with the
marker set.

v2: Start each recovery phase from unconditionally closing file
    descriptors potentially left open by a subtest before it entered
    its critical section,
  - replace igt_require() with 'if() return;' construct in recover() to
    reduce noise,
  - replace "subtest failure" message used as a request for healthcheck
    with a more appropriate "need healthcheck" for clarity,
  - rebase on current upstream master.
v3: Refresh,
  - move bus_rescan() and driver_bind() function calls back from
    heaalthcheck() to recover() so a pure health check can still be
    called from a subtest if essential,
  - move failure mark assignments back from subtests to helpers for
    more adequate abort reason reporting but clean the mark only on
    health check success,
  - call cleanup() also from post_healthcheck() in order to close a
    device file descriptor potentially left open by a failed health
    check,
  - reword commit message and update description.
v4: Close exercised device fd before failing a health check run,
  - don't drop health checks from subtest bodies, their results should
    always matter.
v5: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 100 ++++++++++++++++++++++++++++++-----------
 1 file changed, 74 insertions(+), 26 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index d51526029..7fc6df688 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -78,12 +78,18 @@ static int local_close(int fd, const char *warning)
 
 static int close_device(int fd_drm, const char *when, const char *which)
 {
+	if (fd_drm < 0)	/* not open - return current status */
+		return fd_drm;
+
 	igt_debug("%sclosing %sdevice instance\n", when, which);
 	return local_close(fd_drm, "Device close failed");
 }
 
 static int close_sysfs(int fd_sysfs_dev)
 {
+	if (fd_sysfs_dev < 0)	/* not open - return current status */
+		return fd_sysfs_dev;
+
 	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
 }
 
@@ -117,24 +123,22 @@ static void prepare(struct hotunplug *priv)
 static void driver_unbind(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
+	priv->failure = "Driver unbind failure!";
 
-	priv->failure = "Driver unbind timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Driver unbind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	priv->failure = NULL;
 }
 
 /* Re-bind the driver to the device */
 static void driver_bind(struct hotunplug *priv)
 {
 	igt_debug("rebinding the driver to the device\n");
+	priv->failure = "Driver re-bind failure!";
 
-	priv->failure = "Driver re-bind timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Driver re-bind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	priv->failure = NULL;
 }
 
 /* Remove (virtually unplug) the device from its bus */
@@ -147,12 +151,11 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_assert_fd(priv->fd.sysfs_dev);
 
 	igt_debug("%sunplugging the device\n", prefix);
+	priv->failure = "Device unplug failure!";
 
-	priv->failure = "Device unplug timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Device unplug timeout!");
 	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
-	priv->failure = NULL;
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
@@ -161,12 +164,17 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 static void bus_rescan(struct hotunplug *priv)
 {
 	igt_debug("rediscovering the device\n");
+	priv->failure = "Bus rescan failure!";
 
-	priv->failure = "Bus rescan timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Bus rescan timeout!");
 	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
-	priv->failure = NULL;
+}
+
+static void cleanup(struct hotunplug *priv)
+{
+	priv->fd.drm = close_device(priv->fd.drm, "post ", "failed ");
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
 static void healthcheck(struct hotunplug *priv)
@@ -180,25 +188,45 @@ static void healthcheck(struct hotunplug *priv)
 
 	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for health check");
-	if (closed)	/* store fd for post_healthcheck if not dirty */
+	if (closed)	/* store fd for cleanup if not dirty */
 		priv->fd.drm = fd_drm;
-	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
 		priv->failure = "GEM failure";
 		igt_require_gem(fd_drm);
 		priv->failure = NULL;
+	} else {
+		/* no device specific healthcheck, rely on reopen result */
+		priv->failure = NULL;
 	}
 
 	fd_drm = close_device(fd_drm, "", "health checked ");
 	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
 		priv->fd.drm = fd_drm;
+
+	/* not only request igt_abort on failure, also fail the health check */
+	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
+}
+
+static void recover(struct hotunplug *priv)
+{
+	cleanup(priv);
+
+	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
+		bus_rescan(priv);
+
+	else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
+		driver_bind(priv);
+
+	if (priv->failure)
+		healthcheck(priv);
 }
 
 static void post_healthcheck(struct hotunplug *priv)
 {
 	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
 
+	cleanup(priv);
 	igt_require(priv->fd.drm == -1);
 }
 
@@ -297,30 +325,50 @@ igt_main
 		prepare(&priv);
 	}
 
-	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
-	igt_subtest("unbind-rebind")
-		unbind_rebind(&priv);
+	igt_subtest_group {
+		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+		igt_subtest("unbind-rebind")
+			unbind_rebind(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
-	igt_subtest("unplug-rescan")
-		unplug_rescan(&priv);
+	igt_subtest_group {
+		igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+		igt_subtest("unplug-rescan")
+			unplug_rescan(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
-	igt_subtest("hotunbind-lateclose")
-		hotunbind_lateclose(&priv);
+	igt_subtest_group {
+		igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
+		igt_subtest("hotunbind-lateclose")
+			hotunbind_lateclose(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if a still open device can be cleanly unplugged, then released");
-	igt_subtest("hotunplug-lateclose")
-		hotunplug_lateclose(&priv);
+	igt_subtest_group {
+		igt_describe("Check if a still open device can be cleanly unplugged, then released");
+		igt_subtest("hotunplug-lateclose")
+			hotunplug_lateclose(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture {
 		post_healthcheck(&priv);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 11/24] tests/core_hotunplug: Recover from subtest failures
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Subtests now forcibly call or request igt_abort on failures in order to
avoid silently leaving an exercised device in an unusable state.
However, a failure inside a subtest doesn't always mean the device is
no longer working correctly and reboot is needed.  On the other hand,
if a subtest just fails without aborting, that doesn't mean in turn the
device is healthy.  We should still perform a device health check
in that case before deciding on next steps.

Reuse the 'failure' structure field as a mark which is set before each
critical operation is executed that must be followed by a successful
health check in order to avoid aborting the test.  Then, follow each
subtest with its individual igt_fixture section, from where device file
descriptors potentially left open are closed, device rediscover or
driver rebing operation is run as needed, and finally the health check
is run again if the preceding igt_subtest section has exited with the
marker set.

v2: Start each recovery phase from unconditionally closing file
    descriptors potentially left open by a subtest before it entered
    its critical section,
  - replace igt_require() with 'if() return;' construct in recover() to
    reduce noise,
  - replace "subtest failure" message used as a request for healthcheck
    with a more appropriate "need healthcheck" for clarity,
  - rebase on current upstream master.
v3: Refresh,
  - move bus_rescan() and driver_bind() function calls back from
    heaalthcheck() to recover() so a pure health check can still be
    called from a subtest if essential,
  - move failure mark assignments back from subtests to helpers for
    more adequate abort reason reporting but clean the mark only on
    health check success,
  - call cleanup() also from post_healthcheck() in order to close a
    device file descriptor potentially left open by a failed health
    check,
  - reword commit message and update description.
v4: Close exercised device fd before failing a health check run,
  - don't drop health checks from subtest bodies, their results should
    always matter.
v5: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 100 ++++++++++++++++++++++++++++++-----------
 1 file changed, 74 insertions(+), 26 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index d51526029..7fc6df688 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -78,12 +78,18 @@ static int local_close(int fd, const char *warning)
 
 static int close_device(int fd_drm, const char *when, const char *which)
 {
+	if (fd_drm < 0)	/* not open - return current status */
+		return fd_drm;
+
 	igt_debug("%sclosing %sdevice instance\n", when, which);
 	return local_close(fd_drm, "Device close failed");
 }
 
 static int close_sysfs(int fd_sysfs_dev)
 {
+	if (fd_sysfs_dev < 0)	/* not open - return current status */
+		return fd_sysfs_dev;
+
 	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
 }
 
@@ -117,24 +123,22 @@ static void prepare(struct hotunplug *priv)
 static void driver_unbind(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
+	priv->failure = "Driver unbind failure!";
 
-	priv->failure = "Driver unbind timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Driver unbind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	priv->failure = NULL;
 }
 
 /* Re-bind the driver to the device */
 static void driver_bind(struct hotunplug *priv)
 {
 	igt_debug("rebinding the driver to the device\n");
+	priv->failure = "Driver re-bind failure!";
 
-	priv->failure = "Driver re-bind timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Driver re-bind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	priv->failure = NULL;
 }
 
 /* Remove (virtually unplug) the device from its bus */
@@ -147,12 +151,11 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_assert_fd(priv->fd.sysfs_dev);
 
 	igt_debug("%sunplugging the device\n", prefix);
+	priv->failure = "Device unplug failure!";
 
-	priv->failure = "Device unplug timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Device unplug timeout!");
 	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
-	priv->failure = NULL;
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
@@ -161,12 +164,17 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 static void bus_rescan(struct hotunplug *priv)
 {
 	igt_debug("rediscovering the device\n");
+	priv->failure = "Bus rescan failure!";
 
-	priv->failure = "Bus rescan timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Bus rescan timeout!");
 	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
-	priv->failure = NULL;
+}
+
+static void cleanup(struct hotunplug *priv)
+{
+	priv->fd.drm = close_device(priv->fd.drm, "post ", "failed ");
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
 static void healthcheck(struct hotunplug *priv)
@@ -180,25 +188,45 @@ static void healthcheck(struct hotunplug *priv)
 
 	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for health check");
-	if (closed)	/* store fd for post_healthcheck if not dirty */
+	if (closed)	/* store fd for cleanup if not dirty */
 		priv->fd.drm = fd_drm;
-	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
 		priv->failure = "GEM failure";
 		igt_require_gem(fd_drm);
 		priv->failure = NULL;
+	} else {
+		/* no device specific healthcheck, rely on reopen result */
+		priv->failure = NULL;
 	}
 
 	fd_drm = close_device(fd_drm, "", "health checked ");
 	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
 		priv->fd.drm = fd_drm;
+
+	/* not only request igt_abort on failure, also fail the health check */
+	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
+}
+
+static void recover(struct hotunplug *priv)
+{
+	cleanup(priv);
+
+	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
+		bus_rescan(priv);
+
+	else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
+		driver_bind(priv);
+
+	if (priv->failure)
+		healthcheck(priv);
 }
 
 static void post_healthcheck(struct hotunplug *priv)
 {
 	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
 
+	cleanup(priv);
 	igt_require(priv->fd.drm == -1);
 }
 
@@ -297,30 +325,50 @@ igt_main
 		prepare(&priv);
 	}
 
-	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
-	igt_subtest("unbind-rebind")
-		unbind_rebind(&priv);
+	igt_subtest_group {
+		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+		igt_subtest("unbind-rebind")
+			unbind_rebind(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
-	igt_subtest("unplug-rescan")
-		unplug_rescan(&priv);
+	igt_subtest_group {
+		igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+		igt_subtest("unplug-rescan")
+			unplug_rescan(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
-	igt_subtest("hotunbind-lateclose")
-		hotunbind_lateclose(&priv);
+	igt_subtest_group {
+		igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
+		igt_subtest("hotunbind-lateclose")
+			hotunbind_lateclose(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if a still open device can be cleanly unplugged, then released");
-	igt_subtest("hotunplug-lateclose")
-		hotunplug_lateclose(&priv);
+	igt_subtest_group {
+		igt_describe("Check if a still open device can be cleanly unplugged, then released");
+		igt_subtest("hotunplug-lateclose")
+			hotunplug_lateclose(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture {
 		post_healthcheck(&priv);
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 12/24] tests/core_hotunplug: Fail subtests on device close errors
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Since health checks are now run from follow-up fixture sections, it is
safe to fail subtests without the need to abort the test execution.  Do
that on device close errors instead of just emitting warnings.

v2: Rebase only.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7fc6df688..d31faf215 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -158,6 +158,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_reset_timeout();
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -279,6 +280,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	driver_bind(priv);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	healthcheck(priv);
 }
@@ -293,6 +295,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	bus_rescan(priv);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	healthcheck(priv);
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 12/24] tests/core_hotunplug: Fail subtests on device close errors
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Since health checks are now run from follow-up fixture sections, it is
safe to fail subtests without the need to abort the test execution.  Do
that on device close errors instead of just emitting warnings.

v2: Rebase only.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7fc6df688..d31faf215 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -158,6 +158,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_reset_timeout();
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -279,6 +280,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	driver_bind(priv);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	healthcheck(priv);
 }
@@ -293,6 +295,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	bus_rescan(priv);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
+	igt_assert_eq(priv->fd.drm, -1);
 
 	healthcheck(priv);
 }
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 13/24] tests/core_hotunplug: Let the driver time out essential sysfs operations
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The test now arms a timer before performing each driver unbind / rebind
or device unplug / bus rescan sysfs operation.  Then in case of issues
we may prevent the driver from showing us if and how it can handle
them.

Don't arm the timer before sysfs operations which are essential for a
subtest.

v2: Refresh,
  - don't time out on hot driver rebind / hot device restore in
    *-lateclose variants, those operations haven't been covered by
    other subtests.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index d31faf215..6112704f4 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -120,29 +120,31 @@ static void prepare(struct hotunplug *priv)
 }
 
 /* Unbind the driver from the device */
-static void driver_unbind(struct hotunplug *priv, const char *prefix)
+static void driver_unbind(struct hotunplug *priv, const char *prefix,
+			  int timeout)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
 	priv->failure = "Driver unbind failure!";
 
-	igt_set_timeout(60, "Driver unbind timeout!");
+	igt_set_timeout(timeout, "Driver unbind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
 }
 
 /* Re-bind the driver to the device */
-static void driver_bind(struct hotunplug *priv)
+static void driver_bind(struct hotunplug *priv, int timeout)
 {
 	igt_debug("rebinding the driver to the device\n");
 	priv->failure = "Driver re-bind failure!";
 
-	igt_set_timeout(60, "Driver re-bind timeout!");
+	igt_set_timeout(timeout, "Driver re-bind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(struct hotunplug *priv, const char *prefix)
+static void device_unplug(struct hotunplug *priv, const char *prefix,
+			  int timeout)
 {
 	igt_require(priv->fd.sysfs_dev == -1);
 
@@ -153,7 +155,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_debug("%sunplugging the device\n", prefix);
 	priv->failure = "Device unplug failure!";
 
-	igt_set_timeout(60, "Device unplug timeout!");
+	igt_set_timeout(timeout, "Device unplug timeout!");
 	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
 
@@ -162,12 +164,12 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 }
 
 /* Re-discover the device by rescanning its bus */
-static void bus_rescan(struct hotunplug *priv)
+static void bus_rescan(struct hotunplug *priv, int timeout)
 {
 	igt_debug("rediscovering the device\n");
 	priv->failure = "Bus rescan failure!";
 
-	igt_set_timeout(60, "Bus rescan timeout!");
+	igt_set_timeout(timeout, "Bus rescan timeout!");
 	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
 }
@@ -214,10 +216,10 @@ static void recover(struct hotunplug *priv)
 	cleanup(priv);
 
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
-		bus_rescan(priv);
+		bus_rescan(priv, 60);
 
 	else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
-		driver_bind(priv);
+		driver_bind(priv, 60);
 
 	if (priv->failure)
 		healthcheck(priv);
@@ -252,9 +254,9 @@ static void unbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 
-	driver_unbind(priv, "");
+	driver_unbind(priv, "", 0);
 
-	driver_bind(priv);
+	driver_bind(priv, 0);
 
 	healthcheck(priv);
 }
@@ -263,9 +265,9 @@ static void unplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 
-	device_unplug(priv, "");
+	device_unplug(priv, "", 0);
 
-	bus_rescan(priv);
+	bus_rescan(priv, 0);
 
 	healthcheck(priv);
 }
@@ -275,9 +277,9 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver("", " for hot unbind");
 
-	driver_unbind(priv, "hot ");
+	driver_unbind(priv, "hot ", 0);
 
-	driver_bind(priv);
+	driver_bind(priv, 0);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
@@ -290,9 +292,9 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver("", " for hot unplug");
 
-	device_unplug(priv, "hot ");
+	device_unplug(priv, "hot ", 0);
 
-	bus_rescan(priv);
+	bus_rescan(priv, 0);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 	igt_assert_eq(priv->fd.drm, -1);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 13/24] tests/core_hotunplug: Let the driver time out essential sysfs operations
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

The test now arms a timer before performing each driver unbind / rebind
or device unplug / bus rescan sysfs operation.  Then in case of issues
we may prevent the driver from showing us if and how it can handle
them.

Don't arm the timer before sysfs operations which are essential for a
subtest.

v2: Refresh,
  - don't time out on hot driver rebind / hot device restore in
    *-lateclose variants, those operations haven't been covered by
    other subtests.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index d31faf215..6112704f4 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -120,29 +120,31 @@ static void prepare(struct hotunplug *priv)
 }
 
 /* Unbind the driver from the device */
-static void driver_unbind(struct hotunplug *priv, const char *prefix)
+static void driver_unbind(struct hotunplug *priv, const char *prefix,
+			  int timeout)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
 	priv->failure = "Driver unbind failure!";
 
-	igt_set_timeout(60, "Driver unbind timeout!");
+	igt_set_timeout(timeout, "Driver unbind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
 }
 
 /* Re-bind the driver to the device */
-static void driver_bind(struct hotunplug *priv)
+static void driver_bind(struct hotunplug *priv, int timeout)
 {
 	igt_debug("rebinding the driver to the device\n");
 	priv->failure = "Driver re-bind failure!";
 
-	igt_set_timeout(60, "Driver re-bind timeout!");
+	igt_set_timeout(timeout, "Driver re-bind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(struct hotunplug *priv, const char *prefix)
+static void device_unplug(struct hotunplug *priv, const char *prefix,
+			  int timeout)
 {
 	igt_require(priv->fd.sysfs_dev == -1);
 
@@ -153,7 +155,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_debug("%sunplugging the device\n", prefix);
 	priv->failure = "Device unplug failure!";
 
-	igt_set_timeout(60, "Device unplug timeout!");
+	igt_set_timeout(timeout, "Device unplug timeout!");
 	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
 
@@ -162,12 +164,12 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 }
 
 /* Re-discover the device by rescanning its bus */
-static void bus_rescan(struct hotunplug *priv)
+static void bus_rescan(struct hotunplug *priv, int timeout)
 {
 	igt_debug("rediscovering the device\n");
 	priv->failure = "Bus rescan failure!";
 
-	igt_set_timeout(60, "Bus rescan timeout!");
+	igt_set_timeout(timeout, "Bus rescan timeout!");
 	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
 }
@@ -214,10 +216,10 @@ static void recover(struct hotunplug *priv)
 	cleanup(priv);
 
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
-		bus_rescan(priv);
+		bus_rescan(priv, 60);
 
 	else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
-		driver_bind(priv);
+		driver_bind(priv, 60);
 
 	if (priv->failure)
 		healthcheck(priv);
@@ -252,9 +254,9 @@ static void unbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 
-	driver_unbind(priv, "");
+	driver_unbind(priv, "", 0);
 
-	driver_bind(priv);
+	driver_bind(priv, 0);
 
 	healthcheck(priv);
 }
@@ -263,9 +265,9 @@ static void unplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 
-	device_unplug(priv, "");
+	device_unplug(priv, "", 0);
 
-	bus_rescan(priv);
+	bus_rescan(priv, 0);
 
 	healthcheck(priv);
 }
@@ -275,9 +277,9 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver("", " for hot unbind");
 
-	driver_unbind(priv, "hot ");
+	driver_unbind(priv, "hot ", 0);
 
-	driver_bind(priv);
+	driver_bind(priv, 0);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
@@ -290,9 +292,9 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver("", " for hot unplug");
 
-	device_unplug(priv, "hot ");
+	device_unplug(priv, "hot ", 0);
 
-	bus_rescan(priv);
+	bus_rescan(priv, 0);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 	igt_assert_eq(priv->fd.drm, -1);
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 14/24] tests/core_hotunplug: Process return values of sysfs operations
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Return values of driver bind/unbind / device remove/recover sysfs
operations are now ignored.  Assert their correctness.

v2: Add trailing newlines missing from igt_assert messages.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 6112704f4..d5b8c5ed3 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -127,7 +127,9 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 	priv->failure = "Driver unbind failure!";
 
 	igt_set_timeout(timeout, "Driver unbind timeout!");
-	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_drv, "unbind",
+				   priv->dev_bus_addr),
+		     "Driver unbind failure!\n");
 	igt_reset_timeout();
 }
 
@@ -138,7 +140,9 @@ static void driver_bind(struct hotunplug *priv, int timeout)
 	priv->failure = "Driver re-bind failure!";
 
 	igt_set_timeout(timeout, "Driver re-bind timeout!");
-	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_drv, "bind",
+				   priv->dev_bus_addr),
+		     "Driver re-bind failure\n!");
 	igt_reset_timeout();
 }
 
@@ -156,7 +160,8 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 	priv->failure = "Device unplug failure!";
 
 	igt_set_timeout(timeout, "Device unplug timeout!");
-	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1"),
+		     "Device unplug failure\n!");
 	igt_reset_timeout();
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
@@ -170,7 +175,8 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 	priv->failure = "Bus rescan failure!";
 
 	igt_set_timeout(timeout, "Bus rescan timeout!");
-	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1"),
+		       "Bus rescan failure!\n");
 	igt_reset_timeout();
 }
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 14/24] tests/core_hotunplug: Process return values of sysfs operations
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Return values of driver bind/unbind / device remove/recover sysfs
operations are now ignored.  Assert their correctness.

v2: Add trailing newlines missing from igt_assert messages.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 6112704f4..d5b8c5ed3 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -127,7 +127,9 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 	priv->failure = "Driver unbind failure!";
 
 	igt_set_timeout(timeout, "Driver unbind timeout!");
-	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_drv, "unbind",
+				   priv->dev_bus_addr),
+		     "Driver unbind failure!\n");
 	igt_reset_timeout();
 }
 
@@ -138,7 +140,9 @@ static void driver_bind(struct hotunplug *priv, int timeout)
 	priv->failure = "Driver re-bind failure!";
 
 	igt_set_timeout(timeout, "Driver re-bind timeout!");
-	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_drv, "bind",
+				   priv->dev_bus_addr),
+		     "Driver re-bind failure\n!");
 	igt_reset_timeout();
 }
 
@@ -156,7 +160,8 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 	priv->failure = "Device unplug failure!";
 
 	igt_set_timeout(timeout, "Device unplug timeout!");
-	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1"),
+		     "Device unplug failure\n!");
 	igt_reset_timeout();
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
@@ -170,7 +175,8 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 	priv->failure = "Bus rescan failure!";
 
 	igt_set_timeout(timeout, "Bus rescan timeout!");
-	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1"),
+		       "Bus rescan failure!\n");
 	igt_reset_timeout();
 }
 
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 15/24] tests/core_hotunplug: Assert expected device presence/absence
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
                   ` (14 preceding siblings ...)
  (?)
@ 2020-09-11 10:30 ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Don't rely on successful write to sysfs control files, assert existence
/ non-existence of a respective device sysfs node as well.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index d5b8c5ed3..a7dc4cf3b 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -131,6 +131,9 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 				   priv->dev_bus_addr),
 		     "Driver unbind failure!\n");
 	igt_reset_timeout();
+
+	igt_assert_f(faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0),
+		     "Unbound device still present\n");
 }
 
 /* Re-bind the driver to the device */
@@ -144,6 +147,10 @@ static void driver_bind(struct hotunplug *priv, int timeout)
 				   priv->dev_bus_addr),
 		     "Driver re-bind failure\n!");
 	igt_reset_timeout();
+
+	igt_fail_on_f(faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr,
+				F_OK, 0),
+		      "Rebound device not present!\n");
 }
 
 /* Remove (virtually unplug) the device from its bus */
@@ -166,6 +173,9 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 	igt_assert_eq(priv->fd.sysfs_dev, -1);
+
+	igt_assert_f(faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0),
+		     "Unplugged device still present\n");
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -178,6 +188,10 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1"),
 		       "Bus rescan failure!\n");
 	igt_reset_timeout();
+
+	igt_fail_on_f(faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr,
+				F_OK, 0),
+		      "Fakely unplugged device not rediscovered!\n");
 }
 
 static void cleanup(struct hotunplug *priv)
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 16/24] tests/core_hotunplug: Explicitly ignore unused return values
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Some return values are not useful and can be ignored.  Wrap those cases
inside igt_ignore_warn(), not only to make sure compilers are happy but
also to clearly document our decisions.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index a7dc4cf3b..6cf56d047 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -261,7 +261,7 @@ static void set_filter_from_device(int fd)
 	char path[PATH_MAX + 1];
 
 	igt_assert(igt_sysfs_path(fd, path, PATH_MAX));
-	strncat(path, "/device", PATH_MAX - strlen(path));
+	igt_ignore_warn(strncat(path, "/device", PATH_MAX - strlen(path)));
 	igt_assert(realpath(path, dst));
 
 	igt_device_filter_free_all();
@@ -398,7 +398,7 @@ igt_main
 	igt_fixture {
 		post_healthcheck(&priv);
 
-		close(priv.fd.sysfs_bus);
-		close(priv.fd.sysfs_drv);
+		igt_ignore_warn(close(priv.fd.sysfs_bus));
+		igt_ignore_warn(close(priv.fd.sysfs_drv));
 	}
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 16/24] tests/core_hotunplug: Explicitly ignore unused return values
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Some return values are not useful and can be ignored.  Wrap those cases
inside igt_ignore_warn(), not only to make sure compilers are happy but
also to clearly document our decisions.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index a7dc4cf3b..6cf56d047 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -261,7 +261,7 @@ static void set_filter_from_device(int fd)
 	char path[PATH_MAX + 1];
 
 	igt_assert(igt_sysfs_path(fd, path, PATH_MAX));
-	strncat(path, "/device", PATH_MAX - strlen(path));
+	igt_ignore_warn(strncat(path, "/device", PATH_MAX - strlen(path)));
 	igt_assert(realpath(path, dst));
 
 	igt_device_filter_free_all();
@@ -398,7 +398,7 @@ igt_main
 	igt_fixture {
 		post_healthcheck(&priv);
 
-		close(priv.fd.sysfs_bus);
-		close(priv.fd.sysfs_drv);
+		igt_ignore_warn(close(priv.fd.sysfs_bus));
+		igt_ignore_warn(close(priv.fd.sysfs_drv));
 	}
 }
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 17/24] tests/core_hotunplug: Also check health of render device node
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
                   ` (16 preceding siblings ...)
  (?)
@ 2020-09-11 10:30 ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Failures of subsequent tests accessing the render device node have been
observed on CI after late close of a hot rebound device.  Extend our
health check over the render node to detect that condition and start
our recovery phase with unbinding the driver from the device found
faulty.  Also, check health of both device nodes before running any
subtests.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 6cf56d047..5e9eba8e7 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -55,13 +55,15 @@ struct hotunplug {
  * use drm_open_driver() since in case of an i915 device it opens it
  * twice and keeps a second file descriptor open for exit handler use.
  */
-static int local_drm_open_driver(const char *when, const char *why)
+static int local_drm_open_driver(bool render, const char *when, const char *why)
 {
 	int fd_drm;
 
-	igt_debug("%sopening device%s\n", when, why);
+	igt_debug("%sopening %s device%s\n", when, render ? "render" : "DRM",
+		  why);
 
-	fd_drm = __drm_open_driver(DRIVER_ANY);
+	fd_drm = render ? __drm_open_driver_render(DRIVER_ANY) :
+			  __drm_open_driver(DRIVER_ANY);
 	igt_assert_fd(fd_drm);
 
 	return fd_drm;
@@ -200,17 +202,15 @@ static void cleanup(struct hotunplug *priv)
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
-static void healthcheck(struct hotunplug *priv)
+static void node_healthcheck(struct hotunplug *priv, bool render)
 {
 	/* preserve potentially dirty device status stored in priv->fd.drm */
 	bool closed = priv->fd.drm == -1;
 	int fd_drm;
 
-	/* device name may have changed, rebuild IGT device list */
-	igt_devices_scan(true);
-
-	priv->failure = "Device reopen failure!";
-	fd_drm = local_drm_open_driver("re", " for health check");
+	priv->failure = render ? "Render device reopen failure!" :
+				 "DRM device reopen failure!";
+	fd_drm = local_drm_open_driver(render, "re", " for health check");
 	if (closed)	/* store fd for cleanup if not dirty */
 		priv->fd.drm = fd_drm;
 
@@ -226,6 +226,16 @@ static void healthcheck(struct hotunplug *priv)
 	fd_drm = close_device(fd_drm, "", "health checked ");
 	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
 		priv->fd.drm = fd_drm;
+}
+
+static void healthcheck(struct hotunplug *priv)
+{
+	/* device name may have changed, rebuild IGT device list */
+	igt_devices_scan(true);
+
+	node_healthcheck(priv, false);
+	if (!priv->failure)
+		node_healthcheck(priv, true);
 
 	/* not only request igt_abort on failure, also fail the health check */
 	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
@@ -235,6 +245,11 @@ static void recover(struct hotunplug *priv)
 {
 	cleanup(priv);
 
+	/* unbind the driver from a possibly hot rebound unhealthy device */
+	if (!faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0) &&
+	    priv->fd.drm == -1 && priv->failure)
+		driver_unbind(priv, "post ", 60);
+
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
 		bus_rescan(priv, 60);
 
@@ -295,7 +310,7 @@ static void unplug_rescan(struct hotunplug *priv)
 static void hotunbind_lateclose(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
-	priv->fd.drm = local_drm_open_driver("", " for hot unbind");
+	priv->fd.drm = local_drm_open_driver(false, "", " for hot unbind");
 
 	driver_unbind(priv, "hot ", 0);
 
@@ -310,7 +325,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
-	priv->fd.drm = local_drm_open_driver("", " for hot unplug");
+	priv->fd.drm = local_drm_open_driver(false, "", " for hot unplug");
 
 	device_unplug(priv, "hot ", 0);
 
@@ -348,6 +363,11 @@ igt_main
 		igt_assert_eq(close_device(fd_drm, "", "selected "), -1);
 
 		prepare(&priv);
+
+		node_healthcheck(&priv, false);
+		if (!priv.failure)
+			node_healthcheck(&priv, true);
+		igt_skip_on_f(priv.failure, "%s\n", priv.failure);
 	}
 
 	igt_subtest_group {
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 18/24] tests/core_hotunplug: More thorough i915 healthcheck and recovery
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The test now assumes the i915 driver is able to identify potential
hardware or driver issues while rebinding to a device and indicate them
by marking the GPU wedged.  Should that assumption occur wrong, the
health check phase of the test would happily succeed while potentially
leaving the device in an unusable state.  That would not only give us
falsely positive test results but could also potentially affect
subsequently run applications.  Then, we should examine health of the
exercised device more thoroughly and try harder to recover it from
potentially detected stalls.

We could use a gem_test_engine() library function which submits and
asserts successful execution of a NOP batch on each physical engine.
Unfortunately, on failure this function jumps out of an IGT test
section it is called from, while we would like to continue with
recovery steps, possibly not adding another level of test section group
nesting.  Moreover, the function opens the device again and doesn't
close the extra file descriptor before the jump, while we care for
being able to close the exercised device completely before running
certain subtest operations.  Then, reimplement the function locally
with those issues fixed and use it as an i915 health check.  Call it
also on test startup so operations performed by the test are never
blamed for driver or hardware issues which may potentially exist and
be possible to detect on test start.

Should the i915 GPU be found unresponsive by the health check called
from a recovery section, try harder to recover it to a usable state
with a global GPU reset.

For still more effective detection of GPU hangs, use a hang detector
provided by IGT library.  However, replace the library signal handler
with our own implementation that doesn't jump out of the current IGT
test section on GPU hang so we are still able to perform the reset and
retry.

v2: Skip i915 health check if a GPU hang has been already detected by a
    previous health check run and not yet recovered with a GPU reset,
  - take care of stopping a hang detector instance possibly left
    running by a failed health check attempt.
v3: Re-run i915 health check as a first setp of i915 recovery (use full
    GPU reset as a last resort),
  - prefix i915 health check debug messages with step indicators,
  - fix spelling error in a comment.
v4: Unbind the driver from an unhealthy device before recovery,
  - drop caches on i915 health check completion.
v5: Refresh on top of a new patch added to the series which already
    unbinds the driver form a device found unhealthy and runs health
    checks on test startup,
  - no need to drop caches from the i915 health check, it seems to do
    its job correctly without that.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 115 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 101 insertions(+), 14 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 5e9eba8e7..bc82ae3fb 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -23,8 +23,10 @@
 
 #include <fcntl.h>
 #include <limits.h>
+#include <signal.h>
 #include <stdlib.h>
 #include <string.h>
+#include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <unistd.h>
@@ -202,8 +204,87 @@ static void cleanup(struct hotunplug *priv)
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
-static void node_healthcheck(struct hotunplug *priv, bool render)
+static bool local_i915_is_wedged(int i915)
 {
+	int err = 0;
+
+	if (ioctl(i915, DRM_IOCTL_I915_GEM_THROTTLE))
+		err = -errno;
+	return err == -EIO;
+}
+
+static bool hang_detected = false;
+
+static void local_sig_abort(int sig)
+{
+	errno = 0; /* inside a signal, last errno reporting is confusing */
+	hang_detected = true;
+}
+
+static int local_i915_healthcheck(int i915, const char *prefix)
+{
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 execbuf = {
+		.buffers_ptr = to_user_pointer(&obj),
+		.buffer_count = 1,
+	};
+	const struct intel_execution_engine2 *engine;
+
+	/* stop our hang detector possibly still running if we failed before */
+	igt_stop_hang_detector();
+
+	/* don't run again before GPU reset if hang has been already detected */
+	if (hang_detected)
+		return -EIO;
+
+	igt_debug("%srunning i915 GPU healthcheck\n", prefix);
+
+	if (local_i915_is_wedged(i915))
+		return -EIO;
+
+	obj.handle = gem_create(i915, 4096);
+	gem_write(i915, obj.handle, 0, &bbe, sizeof(bbe));
+
+	igt_fork_hang_detector(i915);
+	signal(SIGIO, local_sig_abort);
+
+	__for_each_physical_engine(i915, engine) {
+		execbuf.flags = engine->flags;
+		gem_execbuf(i915, &execbuf);
+	}
+
+	gem_sync(i915, obj.handle);
+	gem_close(i915, obj.handle);
+
+	igt_stop_hang_detector();
+	if (hang_detected)
+		return -EIO;
+
+	if (local_i915_is_wedged(i915))
+		return -EIO;
+
+	return 0;
+}
+
+static int local_i915_recover(int i915)
+{
+	hang_detected = false;
+	if (!local_i915_healthcheck(i915, "re-"))
+		return 0;
+
+	igt_debug("forcing i915 GPU reset\n");
+	igt_force_gpu_reset(i915);
+
+	hang_detected = false;
+	return local_i915_healthcheck(i915, "post-");
+}
+
+#define FLAG_RENDER	(1 << 0)
+#define FLAG_RECOVER	(1 << 1)
+static void node_healthcheck(struct hotunplug *priv, unsigned flags)
+{
+	bool render = flags & FLAG_RENDER;
 	/* preserve potentially dirty device status stored in priv->fd.drm */
 	bool closed = priv->fd.drm == -1;
 	int fd_drm;
@@ -215,9 +296,14 @@ static void node_healthcheck(struct hotunplug *priv, bool render)
 		priv->fd.drm = fd_drm;
 
 	if (is_i915_device(fd_drm)) {
-		priv->failure = "GEM failure";
-		igt_require_gem(fd_drm);
-		priv->failure = NULL;
+		/* don't report library failed asserts as healthcheck failure */
+		priv->failure = "Unrecoverable test failure";
+		if (local_i915_healthcheck(fd_drm, "") &&
+		    (!(flags & FLAG_RECOVER) || local_i915_recover(fd_drm)))
+			priv->failure = "Healthcheck failure!";
+		else
+			priv->failure = NULL;
+
 	} else {
 		/* no device specific healthcheck, rely on reopen result */
 		priv->failure = NULL;
@@ -228,14 +314,15 @@ static void node_healthcheck(struct hotunplug *priv, bool render)
 		priv->fd.drm = fd_drm;
 }
 
-static void healthcheck(struct hotunplug *priv)
+static void healthcheck(struct hotunplug *priv, bool recover)
 {
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	node_healthcheck(priv, false);
+	node_healthcheck(priv, recover ? FLAG_RECOVER : 0);
 	if (!priv->failure)
-		node_healthcheck(priv, true);
+		node_healthcheck(priv,
+				 FLAG_RENDER | (recover ? FLAG_RECOVER : 0));
 
 	/* not only request igt_abort on failure, also fail the health check */
 	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
@@ -257,7 +344,7 @@ static void recover(struct hotunplug *priv)
 		driver_bind(priv, 60);
 
 	if (priv->failure)
-		healthcheck(priv);
+		healthcheck(priv, true);
 }
 
 static void post_healthcheck(struct hotunplug *priv)
@@ -293,7 +380,7 @@ static void unbind_rebind(struct hotunplug *priv)
 
 	driver_bind(priv, 0);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void unplug_rescan(struct hotunplug *priv)
@@ -304,7 +391,7 @@ static void unplug_rescan(struct hotunplug *priv)
 
 	bus_rescan(priv, 0);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void hotunbind_lateclose(struct hotunplug *priv)
@@ -319,7 +406,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void hotunplug_lateclose(struct hotunplug *priv)
@@ -334,7 +421,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 	igt_assert_eq(priv->fd.drm, -1);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 /* Main */
@@ -364,9 +451,9 @@ igt_main
 
 		prepare(&priv);
 
-		node_healthcheck(&priv, false);
+		node_healthcheck(&priv, 0);
 		if (!priv.failure)
-			node_healthcheck(&priv, true);
+			node_healthcheck(&priv, FLAG_RENDER);
 		igt_skip_on_f(priv.failure, "%s\n", priv.failure);
 	}
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 18/24] tests/core_hotunplug: More thorough i915 healthcheck and recovery
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

The test now assumes the i915 driver is able to identify potential
hardware or driver issues while rebinding to a device and indicate them
by marking the GPU wedged.  Should that assumption occur wrong, the
health check phase of the test would happily succeed while potentially
leaving the device in an unusable state.  That would not only give us
falsely positive test results but could also potentially affect
subsequently run applications.  Then, we should examine health of the
exercised device more thoroughly and try harder to recover it from
potentially detected stalls.

We could use a gem_test_engine() library function which submits and
asserts successful execution of a NOP batch on each physical engine.
Unfortunately, on failure this function jumps out of an IGT test
section it is called from, while we would like to continue with
recovery steps, possibly not adding another level of test section group
nesting.  Moreover, the function opens the device again and doesn't
close the extra file descriptor before the jump, while we care for
being able to close the exercised device completely before running
certain subtest operations.  Then, reimplement the function locally
with those issues fixed and use it as an i915 health check.  Call it
also on test startup so operations performed by the test are never
blamed for driver or hardware issues which may potentially exist and
be possible to detect on test start.

Should the i915 GPU be found unresponsive by the health check called
from a recovery section, try harder to recover it to a usable state
with a global GPU reset.

For still more effective detection of GPU hangs, use a hang detector
provided by IGT library.  However, replace the library signal handler
with our own implementation that doesn't jump out of the current IGT
test section on GPU hang so we are still able to perform the reset and
retry.

v2: Skip i915 health check if a GPU hang has been already detected by a
    previous health check run and not yet recovered with a GPU reset,
  - take care of stopping a hang detector instance possibly left
    running by a failed health check attempt.
v3: Re-run i915 health check as a first setp of i915 recovery (use full
    GPU reset as a last resort),
  - prefix i915 health check debug messages with step indicators,
  - fix spelling error in a comment.
v4: Unbind the driver from an unhealthy device before recovery,
  - drop caches on i915 health check completion.
v5: Refresh on top of a new patch added to the series which already
    unbinds the driver form a device found unhealthy and runs health
    checks on test startup,
  - no need to drop caches from the i915 health check, it seems to do
    its job correctly without that.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 115 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 101 insertions(+), 14 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 5e9eba8e7..bc82ae3fb 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -23,8 +23,10 @@
 
 #include <fcntl.h>
 #include <limits.h>
+#include <signal.h>
 #include <stdlib.h>
 #include <string.h>
+#include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <unistd.h>
@@ -202,8 +204,87 @@ static void cleanup(struct hotunplug *priv)
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
-static void node_healthcheck(struct hotunplug *priv, bool render)
+static bool local_i915_is_wedged(int i915)
 {
+	int err = 0;
+
+	if (ioctl(i915, DRM_IOCTL_I915_GEM_THROTTLE))
+		err = -errno;
+	return err == -EIO;
+}
+
+static bool hang_detected = false;
+
+static void local_sig_abort(int sig)
+{
+	errno = 0; /* inside a signal, last errno reporting is confusing */
+	hang_detected = true;
+}
+
+static int local_i915_healthcheck(int i915, const char *prefix)
+{
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 execbuf = {
+		.buffers_ptr = to_user_pointer(&obj),
+		.buffer_count = 1,
+	};
+	const struct intel_execution_engine2 *engine;
+
+	/* stop our hang detector possibly still running if we failed before */
+	igt_stop_hang_detector();
+
+	/* don't run again before GPU reset if hang has been already detected */
+	if (hang_detected)
+		return -EIO;
+
+	igt_debug("%srunning i915 GPU healthcheck\n", prefix);
+
+	if (local_i915_is_wedged(i915))
+		return -EIO;
+
+	obj.handle = gem_create(i915, 4096);
+	gem_write(i915, obj.handle, 0, &bbe, sizeof(bbe));
+
+	igt_fork_hang_detector(i915);
+	signal(SIGIO, local_sig_abort);
+
+	__for_each_physical_engine(i915, engine) {
+		execbuf.flags = engine->flags;
+		gem_execbuf(i915, &execbuf);
+	}
+
+	gem_sync(i915, obj.handle);
+	gem_close(i915, obj.handle);
+
+	igt_stop_hang_detector();
+	if (hang_detected)
+		return -EIO;
+
+	if (local_i915_is_wedged(i915))
+		return -EIO;
+
+	return 0;
+}
+
+static int local_i915_recover(int i915)
+{
+	hang_detected = false;
+	if (!local_i915_healthcheck(i915, "re-"))
+		return 0;
+
+	igt_debug("forcing i915 GPU reset\n");
+	igt_force_gpu_reset(i915);
+
+	hang_detected = false;
+	return local_i915_healthcheck(i915, "post-");
+}
+
+#define FLAG_RENDER	(1 << 0)
+#define FLAG_RECOVER	(1 << 1)
+static void node_healthcheck(struct hotunplug *priv, unsigned flags)
+{
+	bool render = flags & FLAG_RENDER;
 	/* preserve potentially dirty device status stored in priv->fd.drm */
 	bool closed = priv->fd.drm == -1;
 	int fd_drm;
@@ -215,9 +296,14 @@ static void node_healthcheck(struct hotunplug *priv, bool render)
 		priv->fd.drm = fd_drm;
 
 	if (is_i915_device(fd_drm)) {
-		priv->failure = "GEM failure";
-		igt_require_gem(fd_drm);
-		priv->failure = NULL;
+		/* don't report library failed asserts as healthcheck failure */
+		priv->failure = "Unrecoverable test failure";
+		if (local_i915_healthcheck(fd_drm, "") &&
+		    (!(flags & FLAG_RECOVER) || local_i915_recover(fd_drm)))
+			priv->failure = "Healthcheck failure!";
+		else
+			priv->failure = NULL;
+
 	} else {
 		/* no device specific healthcheck, rely on reopen result */
 		priv->failure = NULL;
@@ -228,14 +314,15 @@ static void node_healthcheck(struct hotunplug *priv, bool render)
 		priv->fd.drm = fd_drm;
 }
 
-static void healthcheck(struct hotunplug *priv)
+static void healthcheck(struct hotunplug *priv, bool recover)
 {
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	node_healthcheck(priv, false);
+	node_healthcheck(priv, recover ? FLAG_RECOVER : 0);
 	if (!priv->failure)
-		node_healthcheck(priv, true);
+		node_healthcheck(priv,
+				 FLAG_RENDER | (recover ? FLAG_RECOVER : 0));
 
 	/* not only request igt_abort on failure, also fail the health check */
 	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
@@ -257,7 +344,7 @@ static void recover(struct hotunplug *priv)
 		driver_bind(priv, 60);
 
 	if (priv->failure)
-		healthcheck(priv);
+		healthcheck(priv, true);
 }
 
 static void post_healthcheck(struct hotunplug *priv)
@@ -293,7 +380,7 @@ static void unbind_rebind(struct hotunplug *priv)
 
 	driver_bind(priv, 0);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void unplug_rescan(struct hotunplug *priv)
@@ -304,7 +391,7 @@ static void unplug_rescan(struct hotunplug *priv)
 
 	bus_rescan(priv, 0);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void hotunbind_lateclose(struct hotunplug *priv)
@@ -319,7 +406,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void hotunplug_lateclose(struct hotunplug *priv)
@@ -334,7 +421,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 	igt_assert_eq(priv->fd.drm, -1);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 /* Main */
@@ -364,9 +451,9 @@ igt_main
 
 		prepare(&priv);
 
-		node_healthcheck(&priv, false);
+		node_healthcheck(&priv, 0);
 		if (!priv.failure)
-			node_healthcheck(&priv, true);
+			node_healthcheck(&priv, FLAG_RENDER);
 		igt_skip_on_f(priv.failure, "%s\n", priv.failure);
 	}
 
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 19/24] tests/core_hotunplug: Add 'lateclose before restore' variants
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

If a GPU gets wedged during driver rebind or device re-plug for some
reason, current hotunbind/hotunplug test variants may time out before
lateclose phase, resulting in incomplete CI reports.

Add new test variants which close the device before restoring it.  Also
rename old variants to more adequate hotrebind/hotreplug-lateclose and
perform health checks both before and after late close.

v2: Rebase on upstream.
v3: Refresh,
  - further rename hotunbind/hotunplug-lateclose to hotunbind-rebind
    and hotunplug-rescan respectively, then add two more variants under
    the old names which only exercise late close, leaving rebind /
    rescan to be cared of in the post-subtest recovery phase,
  - also update descriptions of unmodified subtests for consistency.
v4: Refresh,
  - drop subtests with no health checks, adjust timeouts in successors,
  - perform health checks of hot restored devices also before late
    close,
  - in order to be able to safely run a health check while still
    keeping an unbound / unplugged device instance open, also preserve
    the open device fd, not only a close error,
  - adjust subtest descriptions.
v5: Refresh,
  - split out pre-lateclose health checks and related changes,
    introduced in v4, to a separate patch.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v2
---
 tests/core_hotunplug.c | 78 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 66 insertions(+), 12 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index bc82ae3fb..436517ce5 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -394,28 +394,58 @@ static void unplug_rescan(struct hotunplug *priv)
 	healthcheck(priv, false);
 }
 
-static void hotunbind_lateclose(struct hotunplug *priv)
+static void hotunbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unbind");
 
 	driver_unbind(priv, "hot ", 0);
 
-	driver_bind(priv, 0);
-
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
 
+	driver_bind(priv, 0);
+
 	healthcheck(priv, false);
 }
 
-static void hotunplug_lateclose(struct hotunplug *priv)
+static void hotunplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unplug");
 
 	device_unplug(priv, "hot ", 0);
 
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
+	igt_assert_eq(priv->fd.drm, -1);
+
+	bus_rescan(priv, 0);
+
+	healthcheck(priv, false);
+}
+
+static void hotrebind_lateclose(struct hotunplug *priv)
+{
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver(false, "", " for hot rebind");
+
+	driver_unbind(priv, "hot ", 60);
+
+	driver_bind(priv, 0);
+
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
+	igt_assert_eq(priv->fd.drm, -1);
+
+	healthcheck(priv, false);
+}
+
+static void hotreplug_lateclose(struct hotunplug *priv)
+{
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver(false, "", " for hot replug");
+
+	device_unplug(priv, "hot ", 60);
+
 	bus_rescan(priv, 0);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
@@ -458,7 +488,7 @@ igt_main
 	}
 
 	igt_subtest_group {
-		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed, then rebound");
 		igt_subtest("unbind-rebind")
 			unbind_rebind(&priv);
 
@@ -470,7 +500,7 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+		igt_describe("Check if a device believed to be closed can be cleanly unplugged, then restored");
 		igt_subtest("unplug-rescan")
 			unplug_rescan(&priv);
 
@@ -482,9 +512,33 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
-		igt_subtest("hotunbind-lateclose")
-			hotunbind_lateclose(&priv);
+		igt_describe("Check if the driver can be cleanly unbound from an open device, then released and rebound");
+		igt_subtest("hotunbind-rebind")
+			hotunbind_rebind(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
+
+	igt_fixture
+		post_healthcheck(&priv);
+
+	igt_subtest_group {
+		igt_describe("Check if an open device can be cleanly unplugged, then released and restored");
+		igt_subtest("hotunplug-rescan")
+			hotunplug_rescan(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
+
+	igt_fixture
+		post_healthcheck(&priv);
+
+	igt_subtest_group {
+		igt_describe("Check if the driver hot unbound from a still open device can be cleanly rebound, then the old instance released");
+		igt_subtest("hotrebind-lateclose")
+			hotrebind_lateclose(&priv);
 
 		igt_fixture
 			recover(&priv);
@@ -494,9 +548,9 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if a still open device can be cleanly unplugged, then released");
-		igt_subtest("hotunplug-lateclose")
-			hotunplug_lateclose(&priv);
+		igt_describe("Check if a still open while hot unplugged device can be cleanly restored, then the old instance released");
+		igt_subtest("hotreplug-lateclose")
+			hotreplug_lateclose(&priv);
 
 		igt_fixture
 			recover(&priv);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 19/24] tests/core_hotunplug: Add 'lateclose before restore' variants
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

If a GPU gets wedged during driver rebind or device re-plug for some
reason, current hotunbind/hotunplug test variants may time out before
lateclose phase, resulting in incomplete CI reports.

Add new test variants which close the device before restoring it.  Also
rename old variants to more adequate hotrebind/hotreplug-lateclose and
perform health checks both before and after late close.

v2: Rebase on upstream.
v3: Refresh,
  - further rename hotunbind/hotunplug-lateclose to hotunbind-rebind
    and hotunplug-rescan respectively, then add two more variants under
    the old names which only exercise late close, leaving rebind /
    rescan to be cared of in the post-subtest recovery phase,
  - also update descriptions of unmodified subtests for consistency.
v4: Refresh,
  - drop subtests with no health checks, adjust timeouts in successors,
  - perform health checks of hot restored devices also before late
    close,
  - in order to be able to safely run a health check while still
    keeping an unbound / unplugged device instance open, also preserve
    the open device fd, not only a close error,
  - adjust subtest descriptions.
v5: Refresh,
  - split out pre-lateclose health checks and related changes,
    introduced in v4, to a separate patch.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v2
---
 tests/core_hotunplug.c | 78 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 66 insertions(+), 12 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index bc82ae3fb..436517ce5 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -394,28 +394,58 @@ static void unplug_rescan(struct hotunplug *priv)
 	healthcheck(priv, false);
 }
 
-static void hotunbind_lateclose(struct hotunplug *priv)
+static void hotunbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unbind");
 
 	driver_unbind(priv, "hot ", 0);
 
-	driver_bind(priv, 0);
-
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
 
+	driver_bind(priv, 0);
+
 	healthcheck(priv, false);
 }
 
-static void hotunplug_lateclose(struct hotunplug *priv)
+static void hotunplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unplug");
 
 	device_unplug(priv, "hot ", 0);
 
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
+	igt_assert_eq(priv->fd.drm, -1);
+
+	bus_rescan(priv, 0);
+
+	healthcheck(priv, false);
+}
+
+static void hotrebind_lateclose(struct hotunplug *priv)
+{
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver(false, "", " for hot rebind");
+
+	driver_unbind(priv, "hot ", 60);
+
+	driver_bind(priv, 0);
+
+	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
+	igt_assert_eq(priv->fd.drm, -1);
+
+	healthcheck(priv, false);
+}
+
+static void hotreplug_lateclose(struct hotunplug *priv)
+{
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver(false, "", " for hot replug");
+
+	device_unplug(priv, "hot ", 60);
+
 	bus_rescan(priv, 0);
 
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
@@ -458,7 +488,7 @@ igt_main
 	}
 
 	igt_subtest_group {
-		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed, then rebound");
 		igt_subtest("unbind-rebind")
 			unbind_rebind(&priv);
 
@@ -470,7 +500,7 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+		igt_describe("Check if a device believed to be closed can be cleanly unplugged, then restored");
 		igt_subtest("unplug-rescan")
 			unplug_rescan(&priv);
 
@@ -482,9 +512,33 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
-		igt_subtest("hotunbind-lateclose")
-			hotunbind_lateclose(&priv);
+		igt_describe("Check if the driver can be cleanly unbound from an open device, then released and rebound");
+		igt_subtest("hotunbind-rebind")
+			hotunbind_rebind(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
+
+	igt_fixture
+		post_healthcheck(&priv);
+
+	igt_subtest_group {
+		igt_describe("Check if an open device can be cleanly unplugged, then released and restored");
+		igt_subtest("hotunplug-rescan")
+			hotunplug_rescan(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
+
+	igt_fixture
+		post_healthcheck(&priv);
+
+	igt_subtest_group {
+		igt_describe("Check if the driver hot unbound from a still open device can be cleanly rebound, then the old instance released");
+		igt_subtest("hotrebind-lateclose")
+			hotrebind_lateclose(&priv);
 
 		igt_fixture
 			recover(&priv);
@@ -494,9 +548,9 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if a still open device can be cleanly unplugged, then released");
-		igt_subtest("hotunplug-lateclose")
-			hotunplug_lateclose(&priv);
+		igt_describe("Check if a still open while hot unplugged device can be cleanly restored, then the old instance released");
+		igt_subtest("hotreplug-lateclose")
+			hotreplug_lateclose(&priv);
 
 		igt_fixture
 			recover(&priv);
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 20/24] tests/core_hotunplug: Check health both before and after late close
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

In hot rebind / hot replug subtests, device health is now checked only
at the end of the subtest, after late close.  If something fails, we
may be not able to identify the failing phase of the subtest easily.

Run health checks also before late closing the device, right after hot
rebind / replug.  For still being able to perform late close while also
handling cleanup of potential device close misses in health checks, we
need to maintain two separate device file descriptors in our private
data structure.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 436517ce5..ac106d964 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -42,6 +42,7 @@ IGT_TEST_DESCRIPTION("Examine behavior of a driver on device hot unplug");
 struct hotunplug {
 	struct {
 		int drm;
+		int drm_hc;	/* for health check */
 		int sysfs_dev;
 		int sysfs_bus;
 		int sysfs_drv;
@@ -200,7 +201,9 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 
 static void cleanup(struct hotunplug *priv)
 {
-	priv->fd.drm = close_device(priv->fd.drm, "post ", "failed ");
+	priv->fd.drm = close_device(priv->fd.drm, "post ", "exercised ");
+	priv->fd.drm_hc = close_device(priv->fd.drm_hc, "post ",
+							"health checked ");
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
@@ -286,14 +289,14 @@ static void node_healthcheck(struct hotunplug *priv, unsigned flags)
 {
 	bool render = flags & FLAG_RENDER;
 	/* preserve potentially dirty device status stored in priv->fd.drm */
-	bool closed = priv->fd.drm == -1;
+	bool closed = priv->fd.drm_hc == -1;
 	int fd_drm;
 
 	priv->failure = render ? "Render device reopen failure!" :
 				 "DRM device reopen failure!";
 	fd_drm = local_drm_open_driver(render, "re", " for health check");
 	if (closed)	/* store fd for cleanup if not dirty */
-		priv->fd.drm = fd_drm;
+		priv->fd.drm_hc = fd_drm;
 
 	if (is_i915_device(fd_drm)) {
 		/* don't report library failed asserts as healthcheck failure */
@@ -311,7 +314,7 @@ static void node_healthcheck(struct hotunplug *priv, unsigned flags)
 
 	fd_drm = close_device(fd_drm, "", "health checked ");
 	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
-		priv->fd.drm = fd_drm;
+		priv->fd.drm_hc = fd_drm;
 }
 
 static void healthcheck(struct hotunplug *priv, bool recover)
@@ -334,7 +337,7 @@ static void recover(struct hotunplug *priv)
 
 	/* unbind the driver from a possibly hot rebound unhealthy device */
 	if (!faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0) &&
-	    priv->fd.drm == -1 && priv->failure)
+	    priv->fd.drm == -1 && priv->fd.drm_hc == -1 && priv->failure)
 		driver_unbind(priv, "post ", 60);
 
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
@@ -353,6 +356,7 @@ static void post_healthcheck(struct hotunplug *priv)
 
 	cleanup(priv);
 	igt_require(priv->fd.drm == -1);
+	igt_require(priv->fd.drm_hc == -1);
 }
 
 static void set_filter_from_device(int fd)
@@ -375,6 +379,7 @@ static void set_filter_from_device(int fd)
 static void unbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 
 	driver_unbind(priv, "", 0);
 
@@ -386,6 +391,7 @@ static void unbind_rebind(struct hotunplug *priv)
 static void unplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 
 	device_unplug(priv, "", 0);
 
@@ -397,6 +403,7 @@ static void unplug_rescan(struct hotunplug *priv)
 static void hotunbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unbind");
 
 	driver_unbind(priv, "hot ", 0);
@@ -412,6 +419,7 @@ static void hotunbind_rebind(struct hotunplug *priv)
 static void hotunplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unplug");
 
 	device_unplug(priv, "hot ", 0);
@@ -427,12 +435,15 @@ static void hotunplug_rescan(struct hotunplug *priv)
 static void hotrebind_lateclose(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot rebind");
 
 	driver_unbind(priv, "hot ", 60);
 
 	driver_bind(priv, 0);
 
+	healthcheck(priv, false);
+
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -442,12 +453,15 @@ static void hotrebind_lateclose(struct hotunplug *priv)
 static void hotreplug_lateclose(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot replug");
 
 	device_unplug(priv, "hot ", 60);
 
 	bus_rescan(priv, 0);
 
+	healthcheck(priv, false);
+
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -459,7 +473,7 @@ static void hotreplug_lateclose(struct hotunplug *priv)
 igt_main
 {
 	struct hotunplug priv = {
-		.fd		= { .drm = -1, .sysfs_dev = -1, },
+		.fd		= { .drm = -1, .drm_hc = -1, .sysfs_dev = -1, },
 		.failure	= NULL,
 	};
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 20/24] tests/core_hotunplug: Check health both before and after late close
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

In hot rebind / hot replug subtests, device health is now checked only
at the end of the subtest, after late close.  If something fails, we
may be not able to identify the failing phase of the subtest easily.

Run health checks also before late closing the device, right after hot
rebind / replug.  For still being able to perform late close while also
handling cleanup of potential device close misses in health checks, we
need to maintain two separate device file descriptors in our private
data structure.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 436517ce5..ac106d964 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -42,6 +42,7 @@ IGT_TEST_DESCRIPTION("Examine behavior of a driver on device hot unplug");
 struct hotunplug {
 	struct {
 		int drm;
+		int drm_hc;	/* for health check */
 		int sysfs_dev;
 		int sysfs_bus;
 		int sysfs_drv;
@@ -200,7 +201,9 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 
 static void cleanup(struct hotunplug *priv)
 {
-	priv->fd.drm = close_device(priv->fd.drm, "post ", "failed ");
+	priv->fd.drm = close_device(priv->fd.drm, "post ", "exercised ");
+	priv->fd.drm_hc = close_device(priv->fd.drm_hc, "post ",
+							"health checked ");
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
@@ -286,14 +289,14 @@ static void node_healthcheck(struct hotunplug *priv, unsigned flags)
 {
 	bool render = flags & FLAG_RENDER;
 	/* preserve potentially dirty device status stored in priv->fd.drm */
-	bool closed = priv->fd.drm == -1;
+	bool closed = priv->fd.drm_hc == -1;
 	int fd_drm;
 
 	priv->failure = render ? "Render device reopen failure!" :
 				 "DRM device reopen failure!";
 	fd_drm = local_drm_open_driver(render, "re", " for health check");
 	if (closed)	/* store fd for cleanup if not dirty */
-		priv->fd.drm = fd_drm;
+		priv->fd.drm_hc = fd_drm;
 
 	if (is_i915_device(fd_drm)) {
 		/* don't report library failed asserts as healthcheck failure */
@@ -311,7 +314,7 @@ static void node_healthcheck(struct hotunplug *priv, unsigned flags)
 
 	fd_drm = close_device(fd_drm, "", "health checked ");
 	if (closed || fd_drm < -1)	/* update status for post_healthcheck */
-		priv->fd.drm = fd_drm;
+		priv->fd.drm_hc = fd_drm;
 }
 
 static void healthcheck(struct hotunplug *priv, bool recover)
@@ -334,7 +337,7 @@ static void recover(struct hotunplug *priv)
 
 	/* unbind the driver from a possibly hot rebound unhealthy device */
 	if (!faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0) &&
-	    priv->fd.drm == -1 && priv->failure)
+	    priv->fd.drm == -1 && priv->fd.drm_hc == -1 && priv->failure)
 		driver_unbind(priv, "post ", 60);
 
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
@@ -353,6 +356,7 @@ static void post_healthcheck(struct hotunplug *priv)
 
 	cleanup(priv);
 	igt_require(priv->fd.drm == -1);
+	igt_require(priv->fd.drm_hc == -1);
 }
 
 static void set_filter_from_device(int fd)
@@ -375,6 +379,7 @@ static void set_filter_from_device(int fd)
 static void unbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 
 	driver_unbind(priv, "", 0);
 
@@ -386,6 +391,7 @@ static void unbind_rebind(struct hotunplug *priv)
 static void unplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 
 	device_unplug(priv, "", 0);
 
@@ -397,6 +403,7 @@ static void unplug_rescan(struct hotunplug *priv)
 static void hotunbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unbind");
 
 	driver_unbind(priv, "hot ", 0);
@@ -412,6 +419,7 @@ static void hotunbind_rebind(struct hotunplug *priv)
 static void hotunplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot unplug");
 
 	device_unplug(priv, "hot ", 0);
@@ -427,12 +435,15 @@ static void hotunplug_rescan(struct hotunplug *priv)
 static void hotrebind_lateclose(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot rebind");
 
 	driver_unbind(priv, "hot ", 60);
 
 	driver_bind(priv, 0);
 
+	healthcheck(priv, false);
+
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -442,12 +453,15 @@ static void hotrebind_lateclose(struct hotunplug *priv)
 static void hotreplug_lateclose(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.drm_hc, -1);
 	priv->fd.drm = local_drm_open_driver(false, "", " for hot replug");
 
 	device_unplug(priv, "hot ", 60);
 
 	bus_rescan(priv, 0);
 
+	healthcheck(priv, false);
+
 	priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -459,7 +473,7 @@ static void hotreplug_lateclose(struct hotunplug *priv)
 igt_main
 {
 	struct hotunplug priv = {
-		.fd		= { .drm = -1, .sysfs_dev = -1, },
+		.fd		= { .drm = -1, .drm_hc = -1, .sysfs_dev = -1, },
 		.failure	= NULL,
 	};
 
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Unbinding the i915 driver on some Haswell and Broadwell platforms with
Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
wakelocks=1 on cleanup".  The issue can be worked around by manually
enabling runtime power management for the conflicting audio adapter.
Use that method but also display a warning to preserve visibility of
the issue.  Also tag the workaround with a FIXME comment.

v2: Extend the scope of the workaround over Broadwell

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index ac106d964..3e2a76ddb 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -484,8 +484,23 @@ igt_main
 		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
 
 		if (is_i915_device(fd_drm)) {
+			uint32_t devid = intel_get_drm_devid(fd_drm);
+
 			gem_quiescent_gpu(fd_drm);
 			igt_require_gem(fd_drm);
+
+			/**
+			 * FIXME: Unbinding the i915 driver on some Haswell
+			 * platforms with Azalia audio results in a kernel WARN
+			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
+			 * below CI friendly user level workaround prevents the
+			 * warning from appearing.  Drop this hack as soon as
+			 * this is fixed in the kernel.
+			 */
+			if (igt_warn_on_f(IS_HASWELL(devid) ||
+					  IS_BROADWELL(devid),
+			    "Manually enabling audio PM to work around a kernel WARN\n"))
+				igt_pm_enable_audio_runtime_pm();
 		}
 
 		/* Make sure subtests always reopen the same device */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Unbinding the i915 driver on some Haswell and Broadwell platforms with
Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
wakelocks=1 on cleanup".  The issue can be worked around by manually
enabling runtime power management for the conflicting audio adapter.
Use that method but also display a warning to preserve visibility of
the issue.  Also tag the workaround with a FIXME comment.

v2: Extend the scope of the workaround over Broadwell

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index ac106d964..3e2a76ddb 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -484,8 +484,23 @@ igt_main
 		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
 
 		if (is_i915_device(fd_drm)) {
+			uint32_t devid = intel_get_drm_devid(fd_drm);
+
 			gem_quiescent_gpu(fd_drm);
 			igt_require_gem(fd_drm);
+
+			/**
+			 * FIXME: Unbinding the i915 driver on some Haswell
+			 * platforms with Azalia audio results in a kernel WARN
+			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
+			 * below CI friendly user level workaround prevents the
+			 * warning from appearing.  Drop this hack as soon as
+			 * this is fixed in the kernel.
+			 */
+			if (igt_warn_on_f(IS_HASWELL(devid) ||
+					  IS_BROADWELL(devid),
+			    "Manually enabling audio PM to work around a kernel WARN\n"))
+				igt_pm_enable_audio_runtime_pm();
 		}
 
 		/* Make sure subtests always reopen the same device */
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 22/24] tests/core_hotunplug: Duplicate debug messages in dmesg
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The purpose of debug messages displayed by the test is to make
identification of a subtest phase that fails more easy.  Since issues
exhibited by the test are mostly reported to dmesg, print those debug
messages to /dev/kmsg as well.

v2: Rebase on upstream.
v3: Refresh.
v4: Refresh.
v5: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 3e2a76ddb..67e67627f 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -53,6 +53,12 @@ struct hotunplug {
 
 /* Helpers */
 
+#define local_debug(fmt, msg...)			       \
+({							       \
+	igt_debug(fmt, msg);				       \
+	igt_kmsg(KMSG_DEBUG "%s: " fmt, igt_test_name(), msg); \
+})
+
 /**
  * Subtests must be able to close examined devices completely.  Don't
  * use drm_open_driver() since in case of an i915 device it opens it
@@ -62,8 +68,8 @@ static int local_drm_open_driver(bool render, const char *when, const char *why)
 {
 	int fd_drm;
 
-	igt_debug("%sopening %s device%s\n", when, render ? "render" : "DRM",
-		  why);
+	local_debug("%sopening %s device%s\n", when, render ? "render" : "DRM",
+		    why);
 
 	fd_drm = render ? __drm_open_driver_render(DRIVER_ANY) :
 			  __drm_open_driver(DRIVER_ANY);
@@ -86,7 +92,7 @@ static int close_device(int fd_drm, const char *when, const char *which)
 	if (fd_drm < 0)	/* not open - return current status */
 		return fd_drm;
 
-	igt_debug("%sclosing %sdevice instance\n", when, which);
+	local_debug("%sclosing %sdevice instance\n", when, which);
 	return local_close(fd_drm, "Device close failed");
 }
 
@@ -128,7 +134,7 @@ static void prepare(struct hotunplug *priv)
 static void driver_unbind(struct hotunplug *priv, const char *prefix,
 			  int timeout)
 {
-	igt_debug("%sunbinding the driver from the device\n", prefix);
+	local_debug("%sunbinding the driver from the device\n", prefix);
 	priv->failure = "Driver unbind failure!";
 
 	igt_set_timeout(timeout, "Driver unbind timeout!");
@@ -144,7 +150,7 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 /* Re-bind the driver to the device */
 static void driver_bind(struct hotunplug *priv, int timeout)
 {
-	igt_debug("rebinding the driver to the device\n");
+	local_debug("%s\n", "rebinding the driver to the device");
 	priv->failure = "Driver re-bind failure!";
 
 	igt_set_timeout(timeout, "Driver re-bind timeout!");
@@ -168,7 +174,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 				    O_DIRECTORY);
 	igt_assert_fd(priv->fd.sysfs_dev);
 
-	igt_debug("%sunplugging the device\n", prefix);
+	local_debug("%sunplugging the device\n", prefix);
 	priv->failure = "Device unplug failure!";
 
 	igt_set_timeout(timeout, "Device unplug timeout!");
@@ -186,7 +192,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 /* Re-discover the device by rescanning its bus */
 static void bus_rescan(struct hotunplug *priv, int timeout)
 {
-	igt_debug("rediscovering the device\n");
+	local_debug("%s\n", "rediscovering the device");
 	priv->failure = "Bus rescan failure!";
 
 	igt_set_timeout(timeout, "Bus rescan timeout!");
@@ -241,7 +247,7 @@ static int local_i915_healthcheck(int i915, const char *prefix)
 	if (hang_detected)
 		return -EIO;
 
-	igt_debug("%srunning i915 GPU healthcheck\n", prefix);
+	local_debug("%s%s\n", prefix, "running i915 GPU healthcheck");
 
 	if (local_i915_is_wedged(i915))
 		return -EIO;
@@ -276,7 +282,7 @@ static int local_i915_recover(int i915)
 	if (!local_i915_healthcheck(i915, "re-"))
 		return 0;
 
-	igt_debug("forcing i915 GPU reset\n");
+	local_debug("%s\n", "forcing i915 GPU reset");
 	igt_force_gpu_reset(i915);
 
 	hang_detected = false;
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 22/24] tests/core_hotunplug: Duplicate debug messages in dmesg
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

The purpose of debug messages displayed by the test is to make
identification of a subtest phase that fails more easy.  Since issues
exhibited by the test are mostly reported to dmesg, print those debug
messages to /dev/kmsg as well.

v2: Rebase on upstream.
v3: Refresh.
v4: Refresh.
v5: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 3e2a76ddb..67e67627f 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -53,6 +53,12 @@ struct hotunplug {
 
 /* Helpers */
 
+#define local_debug(fmt, msg...)			       \
+({							       \
+	igt_debug(fmt, msg);				       \
+	igt_kmsg(KMSG_DEBUG "%s: " fmt, igt_test_name(), msg); \
+})
+
 /**
  * Subtests must be able to close examined devices completely.  Don't
  * use drm_open_driver() since in case of an i915 device it opens it
@@ -62,8 +68,8 @@ static int local_drm_open_driver(bool render, const char *when, const char *why)
 {
 	int fd_drm;
 
-	igt_debug("%sopening %s device%s\n", when, render ? "render" : "DRM",
-		  why);
+	local_debug("%sopening %s device%s\n", when, render ? "render" : "DRM",
+		    why);
 
 	fd_drm = render ? __drm_open_driver_render(DRIVER_ANY) :
 			  __drm_open_driver(DRIVER_ANY);
@@ -86,7 +92,7 @@ static int close_device(int fd_drm, const char *when, const char *which)
 	if (fd_drm < 0)	/* not open - return current status */
 		return fd_drm;
 
-	igt_debug("%sclosing %sdevice instance\n", when, which);
+	local_debug("%sclosing %sdevice instance\n", when, which);
 	return local_close(fd_drm, "Device close failed");
 }
 
@@ -128,7 +134,7 @@ static void prepare(struct hotunplug *priv)
 static void driver_unbind(struct hotunplug *priv, const char *prefix,
 			  int timeout)
 {
-	igt_debug("%sunbinding the driver from the device\n", prefix);
+	local_debug("%sunbinding the driver from the device\n", prefix);
 	priv->failure = "Driver unbind failure!";
 
 	igt_set_timeout(timeout, "Driver unbind timeout!");
@@ -144,7 +150,7 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 /* Re-bind the driver to the device */
 static void driver_bind(struct hotunplug *priv, int timeout)
 {
-	igt_debug("rebinding the driver to the device\n");
+	local_debug("%s\n", "rebinding the driver to the device");
 	priv->failure = "Driver re-bind failure!";
 
 	igt_set_timeout(timeout, "Driver re-bind timeout!");
@@ -168,7 +174,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 				    O_DIRECTORY);
 	igt_assert_fd(priv->fd.sysfs_dev);
 
-	igt_debug("%sunplugging the device\n", prefix);
+	local_debug("%sunplugging the device\n", prefix);
 	priv->failure = "Device unplug failure!";
 
 	igt_set_timeout(timeout, "Device unplug timeout!");
@@ -186,7 +192,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 /* Re-discover the device by rescanning its bus */
 static void bus_rescan(struct hotunplug *priv, int timeout)
 {
-	igt_debug("rediscovering the device\n");
+	local_debug("%s\n", "rediscovering the device");
 	priv->failure = "Bus rescan failure!";
 
 	igt_set_timeout(timeout, "Bus rescan timeout!");
@@ -241,7 +247,7 @@ static int local_i915_healthcheck(int i915, const char *prefix)
 	if (hang_detected)
 		return -EIO;
 
-	igt_debug("%srunning i915 GPU healthcheck\n", prefix);
+	local_debug("%s%s\n", prefix, "running i915 GPU healthcheck");
 
 	if (local_i915_is_wedged(i915))
 		return -EIO;
@@ -276,7 +282,7 @@ static int local_i915_recover(int i915)
 	if (!local_i915_healthcheck(i915, "re-"))
 		return 0;
 
-	igt_debug("forcing i915 GPU reset\n");
+	local_debug("%s\n", "forcing i915 GPU reset");
 	igt_force_gpu_reset(i915);
 
 	hang_detected = false;
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
                   ` (22 preceding siblings ...)
  (?)
@ 2020-09-11 10:30 ` Janusz Krzysztofik
  2020-09-11 11:51     ` [igt-dev] " Petri Latvala
  -1 siblings, 1 reply; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Subtests which don't remove the device, only unbind the driver from it,
seem relatively safe and harmless for CI.  Remove them from the CI
blocklist.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/intel-ci/blacklist.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/intel-ci/blacklist.txt b/tests/intel-ci/blacklist.txt
index f9a57cb54..25b567038 100644
--- a/tests/intel-ci/blacklist.txt
+++ b/tests/intel-ci/blacklist.txt
@@ -120,7 +120,7 @@ igt@perf_pmu@cpu-hotplug
 
 # Currently fails and leaves the machine in a very bad state, and
 # causes coverage loss for other tests.
-igt@core_hotunplug@.*
+igt@core_hotunplug@.*plug.*
 
 # hangs several gens of hosts, and has no immediate fix
 igt@device_reset@reset-bound
\ No newline at end of file
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Unbinding and rebinding the driver to a device scenario is a subset of
unloading and reloading the module and should give equally correct
results.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/intel-ci/fast-feedback.testlist | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/intel-ci/fast-feedback.testlist b/tests/intel-ci/fast-feedback.testlist
index b98cdb245..aa2eb3295 100644
--- a/tests/intel-ci/fast-feedback.testlist
+++ b/tests/intel-ci/fast-feedback.testlist
@@ -158,6 +158,7 @@ igt@vgem_basic@sysfs
 # They will sometimes reveal issues of earlier tests leaving the
 # driver in a broken state that is not otherwise noticed in that test.
 
+igt@core_hotunplug@unbind-rebind
 igt@vgem_basic@unload
 igt@i915_module_load@reload
 igt@i915_pm_rpm@module-reload
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
@ 2020-09-11 10:30   ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 10:30 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, Petri Latvala, intel-gfx, Tvrtko Ursulin

Unbinding and rebinding the driver to a device scenario is a subset of
unloading and reloading the module and should give equally correct
results.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/intel-ci/fast-feedback.testlist | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/intel-ci/fast-feedback.testlist b/tests/intel-ci/fast-feedback.testlist
index b98cdb245..aa2eb3295 100644
--- a/tests/intel-ci/fast-feedback.testlist
+++ b/tests/intel-ci/fast-feedback.testlist
@@ -158,6 +158,7 @@ igt@vgem_basic@sysfs
 # They will sometimes reveal issues of earlier tests leaving the
 # driver in a broken state that is not otherwise noticed in that test.
 
+igt@core_hotunplug@unbind-rebind
 igt@vgem_basic@unload
 igt@i915_module_load@reload
 igt@i915_pm_rpm@module-reload
-- 
2.21.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for tests/core_hotunplug: Fixes and enhancements (rev6)
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
                   ` (24 preceding siblings ...)
  (?)
@ 2020-09-11 11:24 ` Patchwork
  2020-09-11 14:18   ` Petri Latvala
  -1 siblings, 1 reply; 77+ messages in thread
From: Patchwork @ 2020-09-11 11:24 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev


[-- Attachment #1.1: Type: text/plain, Size: 7354 bytes --]

== Series Details ==

Series: tests/core_hotunplug: Fixes and enhancements (rev6)
URL   : https://patchwork.freedesktop.org/series/79671/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8998 -> IGTPW_4974
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_4974:

### IGT changes ###

#### Possible regressions ####

  * {igt@core_hotunplug@unbind-rebind} (NEW):
    - fi-hsw-4770:        NOTRUN -> [WARN][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-hsw-4770/igt@core_hotunplug@unbind-rebind.html
    - fi-bdw-5557u:       NOTRUN -> [WARN][2]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-bdw-5557u/igt@core_hotunplug@unbind-rebind.html

  
New tests
---------

  New tests have been introduced between CI_DRM_8998 and IGTPW_4974:

### New IGT tests (1) ###

  * igt@core_hotunplug@unbind-rebind:
    - Statuses : 4 dmesg-warn(s) 29 pass(s) 2 warn(s)
    - Exec time: [0.38, 4.94] s

  

Known issues
------------

  Here are the changes found in IGTPW_4974 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-bsw-n3050:       [PASS][3] -> [DMESG-WARN][4] ([i915#1982]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
    - fi-byt-j1900:       [PASS][5] -> [DMESG-WARN][6] ([i915#1982])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-byt-j1900/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-byt-j1900/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@prime_self_import@basic-with_two_bos:
    - fi-tgl-y:           [PASS][7] -> [DMESG-WARN][8] ([i915#402])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-tgl-y/igt@prime_self_import@basic-with_two_bos.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-tgl-y/igt@prime_self_import@basic-with_two_bos.html

  * igt@vgem_basic@unload:
    - fi-kbl-guc:         [PASS][9] -> [DMESG-WARN][10] ([i915#2203])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-kbl-guc/igt@vgem_basic@unload.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-kbl-guc/igt@vgem_basic@unload.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_pm:
    - fi-icl-y:           [DMESG-FAIL][11] -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-icl-y/igt@i915_selftest@live@gt_pm.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-icl-y/igt@i915_selftest@live@gt_pm.html

  * igt@kms_busy@basic@flip:
    - fi-kbl-x1275:       [DMESG-WARN][13] ([i915#62] / [i915#92] / [i915#95]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-kbl-x1275/igt@kms_busy@basic@flip.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-kbl-x1275/igt@kms_busy@basic@flip.html

  * igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1:
    - fi-icl-u2:          [DMESG-WARN][15] ([i915#1982]) -> [PASS][16] +2 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1.html

  * igt@vgem_basic@dmabuf-fence:
    - fi-tgl-y:           [DMESG-WARN][17] ([i915#402]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-tgl-y/igt@vgem_basic@dmabuf-fence.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-tgl-y/igt@vgem_basic@dmabuf-fence.html

  
#### Warnings ####

  * igt@debugfs_test@read_all_entries:
    - fi-kbl-x1275:       [DMESG-WARN][19] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][20] ([i915#62] / [i915#92]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-kbl-x1275/igt@debugfs_test@read_all_entries.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-kbl-x1275/igt@debugfs_test@read_all_entries.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-kbl-x1275:       [DMESG-WARN][21] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][22] ([i915#1982] / [i915#62] / [i915#92] / [i915#95])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-kbl-x1275/igt@gem_exec_suspend@basic-s3.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-kbl-x1275/igt@gem_exec_suspend@basic-s3.html

  * igt@i915_module_load@reload:
    - fi-tgl-y:           [DMESG-WARN][23] ([i915#1982]) -> [DMESG-WARN][24] ([i915#1982] / [k.org#205379])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-tgl-y/igt@i915_module_load@reload.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-tgl-y/igt@i915_module_load@reload.html

  * igt@kms_force_connector_basic@prune-stale-modes:
    - fi-kbl-x1275:       [DMESG-WARN][25] ([i915#62] / [i915#92]) -> [DMESG-WARN][26] ([i915#62] / [i915#92] / [i915#95]) +3 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/fi-kbl-x1275/igt@kms_force_connector_basic@prune-stale-modes.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-kbl-x1275/igt@kms_force_connector_basic@prune-stale-modes.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2203]: https://gitlab.freedesktop.org/drm/intel/issues/2203
  [i915#2417]: https://gitlab.freedesktop.org/drm/intel/issues/2417
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95
  [k.org#205379]: https://bugzilla.kernel.org/show_bug.cgi?id=205379


Participating hosts (46 -> 39)
------------------------------

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-tgl-u2 fi-byt-squawks fi-bsw-cyan fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_5781 -> IGTPW_4974

  CI-20190529: 20190529
  CI_DRM_8998: fc63f52c694bfa9b097bdecd9183170adc57467b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_4974: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html
  IGT_5781: 66766dd7cd99465d977ac07db8a2413dbbfe8d84 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools



== Testlist changes ==

+igt@core_hotunplug@hotrebind-lateclose
+igt@core_hotunplug@hotreplug-lateclose
+igt@core_hotunplug@hotunbind-rebind
+igt@core_hotunplug@hotunplug-rescan
-igt@core_hotunplug@hotunbind-lateclose
-igt@core_hotunplug@hotunplug-lateclose

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html

[-- Attachment #1.2: Type: text/html, Size: 9720 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
  2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests Janusz Krzysztofik
@ 2020-09-11 11:51     ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 11:51 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, intel-gfx, Michał Winiarski

On Fri, Sep 11, 2020 at 12:30:38PM +0200, Janusz Krzysztofik wrote:
> Subject: [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests

Change to

intel-ci: Un-blocklist *bind* subtests of core_hotunplug


-- 
Petri Latvala
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
@ 2020-09-11 11:51     ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 11:51 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

On Fri, Sep 11, 2020 at 12:30:38PM +0200, Janusz Krzysztofik wrote:
> Subject: [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests

Change to

intel-ci: Un-blocklist *bind* subtests of core_hotunplug


-- 
Petri Latvala
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
  2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 11:52     ` Petri Latvala
  -1 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 11:52 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, intel-gfx, Michał Winiarski

On Fri, Sep 11, 2020 at 12:30:39PM +0200, Janusz Krzysztofik wrote:
> Subject: [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope

Same here, prefix with intel-ci


-- 
Petri Latvala
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
@ 2020-09-11 11:52     ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 11:52 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

On Fri, Sep 11, 2020 at 12:30:39PM +0200, Janusz Krzysztofik wrote:
> Subject: [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope

Same here, prefix with intel-ci


-- 
Petri Latvala
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
  2020-09-11 11:51     ` [igt-dev] " Petri Latvala
@ 2020-09-11 12:00       ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 12:00 UTC (permalink / raw)
  To: Petri Latvala; +Cc: igt-dev, intel-gfx, Michał Winiarski

Hi Petri,

On Fri, 2020-09-11 at 14:51 +0300, Petri Latvala wrote:
> On Fri, Sep 11, 2020 at 12:30:38PM +0200, Janusz Krzysztofik wrote:
> > Subject: [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
> 
> Change to
> 
> intel-ci: Un-blocklist *bind* subtests of core_hotunplug
> 

OK, and I guess the same applies to "tests/core_hotunplug: Add unbind-
rebind subtest to BAT scope" (if accepted).

Thanks,
Janusz

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
@ 2020-09-11 12:00       ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 12:00 UTC (permalink / raw)
  To: Petri Latvala; +Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

Hi Petri,

On Fri, 2020-09-11 at 14:51 +0300, Petri Latvala wrote:
> On Fri, Sep 11, 2020 at 12:30:38PM +0200, Janusz Krzysztofik wrote:
> > Subject: [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
> 
> Change to
> 
> intel-ci: Un-blocklist *bind* subtests of core_hotunplug
> 

OK, and I guess the same applies to "tests/core_hotunplug: Add unbind-
rebind subtest to BAT scope" (if accepted).

Thanks,
Janusz

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
  2020-09-11 11:52     ` [igt-dev] " Petri Latvala
@ 2020-09-11 12:01       ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 12:01 UTC (permalink / raw)
  To: Petri Latvala; +Cc: igt-dev, intel-gfx, Michał Winiarski

On Fri, 2020-09-11 at 14:52 +0300, Petri Latvala wrote:
> On Fri, Sep 11, 2020 at 12:30:39PM +0200, Janusz Krzysztofik wrote:
> > Subject: [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> 
> Same here, prefix with intel-ci
> 

Sure.

Thanks,
Janusz

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
@ 2020-09-11 12:01       ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 12:01 UTC (permalink / raw)
  To: Petri Latvala; +Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

On Fri, 2020-09-11 at 14:52 +0300, Petri Latvala wrote:
> On Fri, Sep 11, 2020 at 12:30:39PM +0200, Janusz Krzysztofik wrote:
> > Subject: [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> 
> Same here, prefix with intel-ci
> 

Sure.

Thanks,
Janusz

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
  2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 12:22     ` Petri Latvala
  -1 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 12:22 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, intel-gfx, Michał Winiarski

On Fri, Sep 11, 2020 at 12:30:36PM +0200, Janusz Krzysztofik wrote:
> Unbinding the i915 driver on some Haswell and Broadwell platforms with
> Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
> wakelocks=1 on cleanup".  The issue can be worked around by manually
> enabling runtime power management for the conflicting audio adapter.
> Use that method but also display a warning to preserve visibility of
> the issue.  Also tag the workaround with a FIXME comment.
> 
> v2: Extend the scope of the workaround over Broadwell
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  tests/core_hotunplug.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> index ac106d964..3e2a76ddb 100644
> --- a/tests/core_hotunplug.c
> +++ b/tests/core_hotunplug.c
> @@ -484,8 +484,23 @@ igt_main
>  		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
>  
>  		if (is_i915_device(fd_drm)) {
> +			uint32_t devid = intel_get_drm_devid(fd_drm);
> +
>  			gem_quiescent_gpu(fd_drm);
>  			igt_require_gem(fd_drm);
> +
> +			/**
> +			 * FIXME: Unbinding the i915 driver on some Haswell
> +			 * platforms with Azalia audio results in a kernel WARN
> +			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
> +			 * below CI friendly user level workaround prevents the
> +			 * warning from appearing.  Drop this hack as soon as
> +			 * this is fixed in the kernel.
> +			 */
> +			if (igt_warn_on_f(IS_HASWELL(devid) ||
> +					  IS_BROADWELL(devid),
> +			    "Manually enabling audio PM to work around a kernel WARN\n"))
> +				igt_pm_enable_audio_runtime_pm();

What happens without this? Is it just a kernel warning, or does the
operation also fail?

If the former, what does this gain? All it does is we lose the
capability to track whether the kernel still has that issue, we still
have to filter this warning in cibuglog.


-- 
Petri Latvala
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
@ 2020-09-11 12:22     ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 12:22 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

On Fri, Sep 11, 2020 at 12:30:36PM +0200, Janusz Krzysztofik wrote:
> Unbinding the i915 driver on some Haswell and Broadwell platforms with
> Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
> wakelocks=1 on cleanup".  The issue can be worked around by manually
> enabling runtime power management for the conflicting audio adapter.
> Use that method but also display a warning to preserve visibility of
> the issue.  Also tag the workaround with a FIXME comment.
> 
> v2: Extend the scope of the workaround over Broadwell
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  tests/core_hotunplug.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> index ac106d964..3e2a76ddb 100644
> --- a/tests/core_hotunplug.c
> +++ b/tests/core_hotunplug.c
> @@ -484,8 +484,23 @@ igt_main
>  		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
>  
>  		if (is_i915_device(fd_drm)) {
> +			uint32_t devid = intel_get_drm_devid(fd_drm);
> +
>  			gem_quiescent_gpu(fd_drm);
>  			igt_require_gem(fd_drm);
> +
> +			/**
> +			 * FIXME: Unbinding the i915 driver on some Haswell
> +			 * platforms with Azalia audio results in a kernel WARN
> +			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
> +			 * below CI friendly user level workaround prevents the
> +			 * warning from appearing.  Drop this hack as soon as
> +			 * this is fixed in the kernel.
> +			 */
> +			if (igt_warn_on_f(IS_HASWELL(devid) ||
> +					  IS_BROADWELL(devid),
> +			    "Manually enabling audio PM to work around a kernel WARN\n"))
> +				igt_pm_enable_audio_runtime_pm();

What happens without this? Is it just a kernel warning, or does the
operation also fail?

If the former, what does this gain? All it does is we lose the
capability to track whether the kernel still has that issue, we still
have to filter this warning in cibuglog.


-- 
Petri Latvala
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
  2020-09-11 12:22     ` [igt-dev] " Petri Latvala
@ 2020-09-11 13:15       ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 13:15 UTC (permalink / raw)
  To: Petri Latvala; +Cc: igt-dev, intel-gfx, Michał Winiarski

Hi Petri,

On Fri, 2020-09-11 at 15:22 +0300, Petri Latvala wrote:
> On Fri, Sep 11, 2020 at 12:30:36PM +0200, Janusz Krzysztofik wrote:
> > Unbinding the i915 driver on some Haswell and Broadwell platforms with
> > Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
> > wakelocks=1 on cleanup".  The issue can be worked around by manually
> > enabling runtime power management for the conflicting audio adapter.
> > Use that method but also display a warning to preserve visibility of
> > the issue.  Also tag the workaround with a FIXME comment.
> > 
> > v2: Extend the scope of the workaround over Broadwell
> > 
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  tests/core_hotunplug.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> > index ac106d964..3e2a76ddb 100644
> > --- a/tests/core_hotunplug.c
> > +++ b/tests/core_hotunplug.c
> > @@ -484,8 +484,23 @@ igt_main
> >  		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
> >  
> >  		if (is_i915_device(fd_drm)) {
> > +			uint32_t devid = intel_get_drm_devid(fd_drm);
> > +
> >  			gem_quiescent_gpu(fd_drm);
> >  			igt_require_gem(fd_drm);
> > +
> > +			/**
> > +			 * FIXME: Unbinding the i915 driver on some Haswell
> > +			 * platforms with Azalia audio results in a kernel WARN
> > +			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
> > +			 * below CI friendly user level workaround prevents the
> > +			 * warning from appearing.  Drop this hack as soon as
> > +			 * this is fixed in the kernel.
> > +			 */
> > +			if (igt_warn_on_f(IS_HASWELL(devid) ||
> > +					  IS_BROADWELL(devid),
> > +			    "Manually enabling audio PM to work around a kernel WARN\n"))
> > +				igt_pm_enable_audio_runtime_pm();
> 
> What happens without this? Is it just a kernel warning, or does the
> operation also fail?

runner: This test was killed due to a kernel taint (0x200).
(https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4901/shard-hsw4/igt@core_hotunplug@unbind-rebind.html)

That happens before the test completes so no results of the operation
are reported. 

> 
> If the former, what does this gain? 

CI unfriendly incompletes are avoided.

> All it does is we lose the
> capability to track whether the kernel still has that issue, we still
> have to filter this warning in cibuglog.

I know, but for now I can see no good alternative - we can either keep
the test still bocklisted or suppress the warning so CI coverage is not
affected.  i915_module_unload just unloads snd-hda-intel module
silently which prevents this issue from popping up.  If you think that
approach would be better, or we should recognize that issue as an
expected behaviour, I can drop the IGT warning.

Thanks,
Janusz

> 
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
@ 2020-09-11 13:15       ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-11 13:15 UTC (permalink / raw)
  To: Petri Latvala; +Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

Hi Petri,

On Fri, 2020-09-11 at 15:22 +0300, Petri Latvala wrote:
> On Fri, Sep 11, 2020 at 12:30:36PM +0200, Janusz Krzysztofik wrote:
> > Unbinding the i915 driver on some Haswell and Broadwell platforms with
> > Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
> > wakelocks=1 on cleanup".  The issue can be worked around by manually
> > enabling runtime power management for the conflicting audio adapter.
> > Use that method but also display a warning to preserve visibility of
> > the issue.  Also tag the workaround with a FIXME comment.
> > 
> > v2: Extend the scope of the workaround over Broadwell
> > 
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  tests/core_hotunplug.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> > index ac106d964..3e2a76ddb 100644
> > --- a/tests/core_hotunplug.c
> > +++ b/tests/core_hotunplug.c
> > @@ -484,8 +484,23 @@ igt_main
> >  		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
> >  
> >  		if (is_i915_device(fd_drm)) {
> > +			uint32_t devid = intel_get_drm_devid(fd_drm);
> > +
> >  			gem_quiescent_gpu(fd_drm);
> >  			igt_require_gem(fd_drm);
> > +
> > +			/**
> > +			 * FIXME: Unbinding the i915 driver on some Haswell
> > +			 * platforms with Azalia audio results in a kernel WARN
> > +			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
> > +			 * below CI friendly user level workaround prevents the
> > +			 * warning from appearing.  Drop this hack as soon as
> > +			 * this is fixed in the kernel.
> > +			 */
> > +			if (igt_warn_on_f(IS_HASWELL(devid) ||
> > +					  IS_BROADWELL(devid),
> > +			    "Manually enabling audio PM to work around a kernel WARN\n"))
> > +				igt_pm_enable_audio_runtime_pm();
> 
> What happens without this? Is it just a kernel warning, or does the
> operation also fail?

runner: This test was killed due to a kernel taint (0x200).
(https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4901/shard-hsw4/igt@core_hotunplug@unbind-rebind.html)

That happens before the test completes so no results of the operation
are reported. 

> 
> If the former, what does this gain? 

CI unfriendly incompletes are avoided.

> All it does is we lose the
> capability to track whether the kernel still has that issue, we still
> have to filter this warning in cibuglog.

I know, but for now I can see no good alternative - we can either keep
the test still bocklisted or suppress the warning so CI coverage is not
affected.  i915_module_unload just unloads snd-hda-intel module
silently which prevents this issue from popping up.  If you think that
approach would be better, or we should recognize that issue as an
expected behaviour, I can drop the IGT warning.

Thanks,
Janusz

> 
> 

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [igt-dev] ✓ Fi.CI.IGT: success for tests/core_hotunplug: Fixes and enhancements (rev6)
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
                   ` (25 preceding siblings ...)
  (?)
@ 2020-09-11 14:15 ` Patchwork
  -1 siblings, 0 replies; 77+ messages in thread
From: Patchwork @ 2020-09-11 14:15 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev


[-- Attachment #1.1: Type: text/plain, Size: 15496 bytes --]

== Series Details ==

Series: tests/core_hotunplug: Fixes and enhancements (rev6)
URL   : https://patchwork.freedesktop.org/series/79671/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8998_full -> IGTPW_4974_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_4974_full:

### IGT changes ###

#### Possible regressions ####

  * {igt@core_hotunplug@hotrebind-lateclose} (NEW):
    - shard-snb:          NOTRUN -> [FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-snb1/igt@core_hotunplug@hotrebind-lateclose.html
    - shard-iclb:         NOTRUN -> [FAIL][2]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-iclb3/igt@core_hotunplug@hotrebind-lateclose.html
    - shard-tglb:         NOTRUN -> [FAIL][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-tglb6/igt@core_hotunplug@hotrebind-lateclose.html
    - shard-glk:          NOTRUN -> [FAIL][4]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk5/igt@core_hotunplug@hotrebind-lateclose.html
    - shard-hsw:          NOTRUN -> [FAIL][5]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-hsw8/igt@core_hotunplug@hotrebind-lateclose.html
    - shard-kbl:          NOTRUN -> [FAIL][6]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl2/igt@core_hotunplug@hotrebind-lateclose.html

  * {igt@core_hotunplug@unbind-rebind} (NEW):
    - shard-hsw:          NOTRUN -> [WARN][7] +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-hsw6/igt@core_hotunplug@unbind-rebind.html

  
New tests
---------

  New tests have been introduced between CI_DRM_8998_full and IGTPW_4974_full:

### New IGT tests (3) ###

  * igt@core_hotunplug@hotrebind-lateclose:
    - Statuses : 7 fail(s)
    - Exec time: [0.40, 2.03] s

  * igt@core_hotunplug@hotunbind-rebind:
    - Statuses : 6 pass(s) 1 warn(s)
    - Exec time: [0.43, 1.98] s

  * igt@core_hotunplug@unbind-rebind:
    - Statuses : 6 pass(s) 1 warn(s)
    - Exec time: [0.40, 1.98] s

  

Known issues
------------

  Here are the changes found in IGTPW_4974_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_reloc@basic-many-active@bcs0:
    - shard-glk:          [PASS][8] -> [FAIL][9] ([i915#2389])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-glk1/igt@gem_exec_reloc@basic-many-active@bcs0.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk4/igt@gem_exec_reloc@basic-many-active@bcs0.html

  * igt@gem_exec_whisper@basic-fds:
    - shard-glk:          [PASS][10] -> [DMESG-WARN][11] ([i915#118] / [i915#95]) +1 similar issue
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-glk8/igt@gem_exec_whisper@basic-fds.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk7/igt@gem_exec_whisper@basic-fds.html

  * igt@gem_partial_pwrite_pread@reads-uncached:
    - shard-apl:          [PASS][12] -> [FAIL][13] ([i915#1635])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl6/igt@gem_partial_pwrite_pread@reads-uncached.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl6/igt@gem_partial_pwrite_pread@reads-uncached.html

  * igt@kms_big_fb@linear-8bpp-rotate-180:
    - shard-apl:          [PASS][14] -> [DMESG-WARN][15] ([i915#1635] / [i915#1982])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl7/igt@kms_big_fb@linear-8bpp-rotate-180.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl4/igt@kms_big_fb@linear-8bpp-rotate-180.html
    - shard-kbl:          [PASS][16] -> [DMESG-WARN][17] ([i915#1982])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-kbl3/igt@kms_big_fb@linear-8bpp-rotate-180.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl6/igt@kms_big_fb@linear-8bpp-rotate-180.html

  * igt@kms_color@pipe-c-legacy-gamma:
    - shard-kbl:          [PASS][18] -> [FAIL][19] ([i915#71])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-kbl1/igt@kms_color@pipe-c-legacy-gamma.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl4/igt@kms_color@pipe-c-legacy-gamma.html
    - shard-apl:          [PASS][20] -> [FAIL][21] ([i915#1635] / [i915#71])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl6/igt@kms_color@pipe-c-legacy-gamma.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl3/igt@kms_color@pipe-c-legacy-gamma.html

  * igt@kms_cursor_edge_walk@pipe-c-64x64-right-edge:
    - shard-glk:          [PASS][22] -> [DMESG-WARN][23] ([i915#1982])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-glk1/igt@kms_cursor_edge_walk@pipe-c-64x64-right-edge.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk5/igt@kms_cursor_edge_walk@pipe-c-64x64-right-edge.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [PASS][24] -> [DMESG-WARN][25] ([i915#180]) +7 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-kbl1/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl7/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_frontbuffer_tracking@fbcpsr-rgb565-draw-mmap-wc:
    - shard-tglb:         [PASS][26] -> [DMESG-WARN][27] ([i915#1982]) +1 similar issue
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-tglb6/igt@kms_frontbuffer_tracking@fbcpsr-rgb565-draw-mmap-wc.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-tglb7/igt@kms_frontbuffer_tracking@fbcpsr-rgb565-draw-mmap-wc.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [PASS][28] -> [SKIP][29] ([fdo#109441]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-iclb6/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_setmode@basic:
    - shard-apl:          [PASS][30] -> [FAIL][31] ([i915#1635] / [i915#31])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl1/igt@kms_setmode@basic.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl6/igt@kms_setmode@basic.html

  
#### Possible fixes ####

  * igt@gem_exec_reloc@basic-many-active@rcs0:
    - shard-apl:          [FAIL][32] ([i915#1635] / [i915#2389]) -> [PASS][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl7/igt@gem_exec_reloc@basic-many-active@rcs0.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl8/igt@gem_exec_reloc@basic-many-active@rcs0.html

  * igt@gem_exec_whisper@basic-fds-priority:
    - shard-glk:          [DMESG-WARN][34] ([i915#118] / [i915#95]) -> [PASS][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-glk7/igt@gem_exec_whisper@basic-fds-priority.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk6/igt@gem_exec_whisper@basic-fds-priority.html

  * igt@gem_workarounds@suspend-resume:
    - shard-iclb:         [INCOMPLETE][36] ([i915#1185]) -> [PASS][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-iclb3/igt@gem_workarounds@suspend-resume.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-iclb6/igt@gem_workarounds@suspend-resume.html

  * igt@i915_pm_dc@dc5-psr:
    - shard-iclb:         [FAIL][38] ([i915#1899]) -> [PASS][39]
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-iclb4/igt@i915_pm_dc@dc5-psr.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-iclb7/igt@i915_pm_dc@dc5-psr.html

  * igt@i915_suspend@forcewake:
    - shard-kbl:          [INCOMPLETE][40] ([i915#155] / [i915#636]) -> [PASS][41]
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-kbl2/igt@i915_suspend@forcewake.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl2/igt@i915_suspend@forcewake.html

  * igt@kms_cursor_edge_walk@pipe-c-128x128-top-edge:
    - shard-glk:          [DMESG-WARN][42] ([i915#1982]) -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-glk3/igt@kms_cursor_edge_walk@pipe-c-128x128-top-edge.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk7/igt@kms_cursor_edge_walk@pipe-c-128x128-top-edge.html

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [FAIL][44] ([i915#79]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-glk3/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ab-hdmi-a1-hdmi-a2.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-glk9/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-hdmi-a1:
    - shard-hsw:          [INCOMPLETE][46] ([i915#2055]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-hsw7/igt@kms_flip@flip-vs-suspend-interruptible@c-hdmi-a1.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-hsw4/igt@kms_flip@flip-vs-suspend-interruptible@c-hdmi-a1.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-kbl:          [DMESG-WARN][48] ([i915#180]) -> [PASS][49] +4 similar issues
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-kbl4/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl4/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-render:
    - shard-tglb:         [DMESG-WARN][50] ([i915#1982]) -> [PASS][51] +5 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-tglb7/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-render.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-tglb7/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-render.html

  * igt@kms_psr@psr2_primary_mmap_gtt:
    - shard-iclb:         [SKIP][52] ([fdo#109441]) -> [PASS][53] +1 similar issue
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-iclb8/igt@kms_psr@psr2_primary_mmap_gtt.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-iclb2/igt@kms_psr@psr2_primary_mmap_gtt.html

  * igt@kms_universal_plane@universal-plane-gen9-features-pipe-a:
    - shard-kbl:          [DMESG-WARN][54] ([i915#1982]) -> [PASS][55] +1 similar issue
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-kbl6/igt@kms_universal_plane@universal-plane-gen9-features-pipe-a.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-kbl4/igt@kms_universal_plane@universal-plane-gen9-features-pipe-a.html

  * igt@perf_pmu@module-unload:
    - shard-apl:          [DMESG-WARN][56] ([i915#1635] / [i915#1982]) -> [PASS][57] +1 similar issue
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl2/igt@perf_pmu@module-unload.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl8/igt@perf_pmu@module-unload.html

  * igt@prime_busy@hang@bcs0:
    - shard-hsw:          [INCOMPLETE][58] -> [PASS][59]
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-hsw6/igt@prime_busy@hang@bcs0.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-hsw7/igt@prime_busy@hang@bcs0.html

  
#### Warnings ####

  * igt@kms_content_protection@legacy:
    - shard-apl:          [TIMEOUT][60] ([i915#1319] / [i915#1635] / [i915#1958]) -> [FAIL][61] ([fdo#110321] / [fdo#110336] / [i915#1635])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl1/igt@kms_content_protection@legacy.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl2/igt@kms_content_protection@legacy.html

  * igt@kms_dp_dsc@basic-dsc-enable-edp:
    - shard-iclb:         [SKIP][62] ([fdo#109349]) -> [DMESG-WARN][63] ([i915#1226])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-iclb5/igt@kms_dp_dsc@basic-dsc-enable-edp.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-iclb2/igt@kms_dp_dsc@basic-dsc-enable-edp.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-apl:          [FAIL][64] ([fdo#108145] / [i915#1635] / [i915#265]) -> [DMESG-FAIL][65] ([fdo#108145] / [i915#1635] / [i915#1982]) +1 similar issue
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8998/shard-apl2/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/shard-apl7/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109349]: https://bugs.freedesktop.org/show_bug.cgi?id=109349
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110321]: https://bugs.freedesktop.org/show_bug.cgi?id=110321
  [fdo#110336]: https://bugs.freedesktop.org/show_bug.cgi?id=110336
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1185]: https://gitlab.freedesktop.org/drm/intel/issues/1185
  [i915#1226]: https://gitlab.freedesktop.org/drm/intel/issues/1226
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#155]: https://gitlab.freedesktop.org/drm/intel/issues/155
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1899]: https://gitlab.freedesktop.org/drm/intel/issues/1899
  [i915#1958]: https://gitlab.freedesktop.org/drm/intel/issues/1958
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2055]: https://gitlab.freedesktop.org/drm/intel/issues/2055
  [i915#2389]: https://gitlab.freedesktop.org/drm/intel/issues/2389
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#636]: https://gitlab.freedesktop.org/drm/intel/issues/636
  [i915#71]: https://gitlab.freedesktop.org/drm/intel/issues/71
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (11 -> 8)
------------------------------

  Missing    (3): pig-skl-6260u pig-glk-j5005 pig-icl-1065g7 


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_5781 -> IGTPW_4974
  * Piglit: piglit_4509 -> None

  CI-20190529: 20190529
  CI_DRM_8998: fc63f52c694bfa9b097bdecd9183170adc57467b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_4974: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html
  IGT_5781: 66766dd7cd99465d977ac07db8a2413dbbfe8d84 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html

[-- Attachment #1.2: Type: text/html, Size: 18639 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
  2020-09-11 13:15       ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 14:17         ` Petri Latvala
  -1 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 14:17 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, intel-gfx, Michał Winiarski

On Fri, Sep 11, 2020 at 03:15:43PM +0200, Janusz Krzysztofik wrote:
> Hi Petri,
> 
> On Fri, 2020-09-11 at 15:22 +0300, Petri Latvala wrote:
> > On Fri, Sep 11, 2020 at 12:30:36PM +0200, Janusz Krzysztofik wrote:
> > > Unbinding the i915 driver on some Haswell and Broadwell platforms with
> > > Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
> > > wakelocks=1 on cleanup".  The issue can be worked around by manually
> > > enabling runtime power management for the conflicting audio adapter.
> > > Use that method but also display a warning to preserve visibility of
> > > the issue.  Also tag the workaround with a FIXME comment.
> > > 
> > > v2: Extend the scope of the workaround over Broadwell
> > > 
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > > ---
> > >  tests/core_hotunplug.c | 15 +++++++++++++++
> > >  1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> > > index ac106d964..3e2a76ddb 100644
> > > --- a/tests/core_hotunplug.c
> > > +++ b/tests/core_hotunplug.c
> > > @@ -484,8 +484,23 @@ igt_main
> > >  		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
> > >  
> > >  		if (is_i915_device(fd_drm)) {
> > > +			uint32_t devid = intel_get_drm_devid(fd_drm);
> > > +
> > >  			gem_quiescent_gpu(fd_drm);
> > >  			igt_require_gem(fd_drm);
> > > +
> > > +			/**
> > > +			 * FIXME: Unbinding the i915 driver on some Haswell
> > > +			 * platforms with Azalia audio results in a kernel WARN
> > > +			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
> > > +			 * below CI friendly user level workaround prevents the
> > > +			 * warning from appearing.  Drop this hack as soon as
> > > +			 * this is fixed in the kernel.
> > > +			 */
> > > +			if (igt_warn_on_f(IS_HASWELL(devid) ||
> > > +					  IS_BROADWELL(devid),
> > > +			    "Manually enabling audio PM to work around a kernel WARN\n"))
> > > +				igt_pm_enable_audio_runtime_pm();
> > 
> > What happens without this? Is it just a kernel warning, or does the
> > operation also fail?
> 
> runner: This test was killed due to a kernel taint (0x200).
> (https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4901/shard-hsw4/igt@core_hotunplug@unbind-rebind.html)
> 
> That happens before the test completes so no results of the operation
> are reported. 

Ah, right. I had a brainfart. Indeed this igt_warn is better.


Reviewed-by: Petri Latvala <petri.latvala@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround
@ 2020-09-11 14:17         ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 14:17 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

On Fri, Sep 11, 2020 at 03:15:43PM +0200, Janusz Krzysztofik wrote:
> Hi Petri,
> 
> On Fri, 2020-09-11 at 15:22 +0300, Petri Latvala wrote:
> > On Fri, Sep 11, 2020 at 12:30:36PM +0200, Janusz Krzysztofik wrote:
> > > Unbinding the i915 driver on some Haswell and Broadwell platforms with
> > > Azalia audio results in a kernel WARNING on "i915 raw-wakerefs=1
> > > wakelocks=1 on cleanup".  The issue can be worked around by manually
> > > enabling runtime power management for the conflicting audio adapter.
> > > Use that method but also display a warning to preserve visibility of
> > > the issue.  Also tag the workaround with a FIXME comment.
> > > 
> > > v2: Extend the scope of the workaround over Broadwell
> > > 
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > > ---
> > >  tests/core_hotunplug.c | 15 +++++++++++++++
> > >  1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> > > index ac106d964..3e2a76ddb 100644
> > > --- a/tests/core_hotunplug.c
> > > +++ b/tests/core_hotunplug.c
> > > @@ -484,8 +484,23 @@ igt_main
> > >  		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
> > >  
> > >  		if (is_i915_device(fd_drm)) {
> > > +			uint32_t devid = intel_get_drm_devid(fd_drm);
> > > +
> > >  			gem_quiescent_gpu(fd_drm);
> > >  			igt_require_gem(fd_drm);
> > > +
> > > +			/**
> > > +			 * FIXME: Unbinding the i915 driver on some Haswell
> > > +			 * platforms with Azalia audio results in a kernel WARN
> > > +			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
> > > +			 * below CI friendly user level workaround prevents the
> > > +			 * warning from appearing.  Drop this hack as soon as
> > > +			 * this is fixed in the kernel.
> > > +			 */
> > > +			if (igt_warn_on_f(IS_HASWELL(devid) ||
> > > +					  IS_BROADWELL(devid),
> > > +			    "Manually enabling audio PM to work around a kernel WARN\n"))
> > > +				igt_pm_enable_audio_runtime_pm();
> > 
> > What happens without this? Is it just a kernel warning, or does the
> > operation also fail?
> 
> runner: This test was killed due to a kernel taint (0x200).
> (https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4901/shard-hsw4/igt@core_hotunplug@unbind-rebind.html)
> 
> That happens before the test completes so no results of the operation
> are reported. 

Ah, right. I had a brainfart. Indeed this igt_warn is better.


Reviewed-by: Petri Latvala <petri.latvala@intel.com>
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] ✓ Fi.CI.BAT: success for tests/core_hotunplug: Fixes and enhancements (rev6)
  2020-09-11 11:24 ` [igt-dev] ✓ Fi.CI.BAT: success for tests/core_hotunplug: Fixes and enhancements (rev6) Patchwork
@ 2020-09-11 14:18   ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 14:18 UTC (permalink / raw)
  To: igt-dev

On Fri, Sep 11, 2020 at 11:24:18AM +0000, Patchwork wrote:
> == Series Details ==
> 
> Series: tests/core_hotunplug: Fixes and enhancements (rev6)
> URL   : https://patchwork.freedesktop.org/series/79671/
> State : success
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_8998 -> IGTPW_4974
> ====================================================
> 
> Summary
> -------
> 
>   **SUCCESS**
> 
>   No regressions found.
> 
>   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/index.html
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in IGTPW_4974:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * {igt@core_hotunplug@unbind-rebind} (NEW):
>     - fi-hsw-4770:        NOTRUN -> [WARN][1]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-hsw-4770/igt@core_hotunplug@unbind-rebind.html
>     - fi-bdw-5557u:       NOTRUN -> [WARN][2]
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4974/fi-bdw-5557u/igt@core_hotunplug@unbind-rebind.html
> 
>   
> New tests
> ---------
> 
>   New tests have been introduced between CI_DRM_8998 and IGTPW_4974:
> 
> ### New IGT tests (1) ###
> 
>   * igt@core_hotunplug@unbind-rebind:
>     - Statuses : 4 dmesg-warn(s) 29 pass(s) 2 warn(s)
>     - Exec time: [0.38, 4.94] s


2 warns from the workaround, those dmesg-warns are existing issues.


-- 
Petri Latvala
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
  2020-09-11 12:00       ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-11 14:20         ` Petri Latvala
  -1 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 14:20 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, intel-gfx, Michał Winiarski

On Fri, Sep 11, 2020 at 02:00:11PM +0200, Janusz Krzysztofik wrote:
> Hi Petri,
> 
> On Fri, 2020-09-11 at 14:51 +0300, Petri Latvala wrote:
> > On Fri, Sep 11, 2020 at 12:30:38PM +0200, Janusz Krzysztofik wrote:
> > > Subject: [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
> > 
> > Change to
> > 
> > intel-ci: Un-blocklist *bind* subtests of core_hotunplug
> > 
> 
> OK, and I guess the same applies to "tests/core_hotunplug: Add unbind-
> rebind subtest to BAT scope" (if accepted).


Speaking of accepted, now that the results are in, for the two intel-ci patches:

Acked-by: Petri Latvala <petri.latvala@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
@ 2020-09-11 14:20         ` Petri Latvala
  0 siblings, 0 replies; 77+ messages in thread
From: Petri Latvala @ 2020-09-11 14:20 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: igt-dev, intel-gfx, Michał Winiarski, Tvrtko Ursulin

On Fri, Sep 11, 2020 at 02:00:11PM +0200, Janusz Krzysztofik wrote:
> Hi Petri,
> 
> On Fri, 2020-09-11 at 14:51 +0300, Petri Latvala wrote:
> > On Fri, Sep 11, 2020 at 12:30:38PM +0200, Janusz Krzysztofik wrote:
> > > Subject: [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests
> > 
> > Change to
> > 
> > intel-ci: Un-blocklist *bind* subtests of core_hotunplug
> > 
> 
> OK, and I guess the same applies to "tests/core_hotunplug: Add unbind-
> rebind subtest to BAT scope" (if accepted).


Speaking of accepted, now that the results are in, for the two intel-ci patches:

Acked-by: Petri Latvala <petri.latvala@intel.com>
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
  2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-14 18:18   ` Michał Winiarski
  -1 siblings, 0 replies; 77+ messages in thread
From: Michał Winiarski @ 2020-09-14 18:18 UTC (permalink / raw)
  To: Janusz Krzysztofik, igt-dev; +Cc: intel-gfx, Michał Winiarski

Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> Clean up the test code, add some new basic subtests, then unblock
> unbind test variants.
> 
> No incompletes / aborts nor subsequently run test issues have been
> reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
> unidentified driver sysfs issue but the device is fully recovered and
> left in a usable state.  Perceived Haswell/Broadwell issue with audio
> power management has been worked around and its potential occurrence
> is reported as an IGT warning.
> 
> Series changelog:
> v2: New patch "Un-blocklist *bind* subtests added.
> v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
>     from subtest failures".
>   - a new patche "Clean up device open error handling" added, an old
>     patch "Fix missing newline" obsoleted by the new one dropped,
>   - other new patches added:
>     - "Let the driver time out essential sysfs operations",
>     - "More thorough i915 healthcheck and recovery",
>   - a patch "Add 'lateclose before restore' variants" from another
>     series included.
> v4: Optional patch "Duplicate debug messages in dmesg" from another
>     series included.
> v5: New patch added with Haswell audio related kernel warning worked
>     around and replaced with an IGT warning to preserve visibility of
>     the issue.
> v6: New patch added for also checking health of render device nodes,
>   - new patch added with proper handling of health check before late
>     close,
>   - inclusion of unbind-rebind scenario to BAT scope proposed.
> 
> @Michał: Since some patch updates are trivial, I've preserved your
> v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
> marked your R-b as v1/v2 applicable.  Please have a look and confirm if
> you are still OK with them.

Feel free to add:
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

For the whole series (with the exception of intel-ci part).

-Michał

> 
> @Tvrtko: As I already asked before, please support my attempt to remove
> the unbind test variants from the blocklist.
> 
> @Petri, @Martin: Assuming CI results will be as good as those obtained
> on Trybot, please give me your green light for merging this series if
> you have no objections.
> 
> Thanks,
> Janusz
> 
> Janusz Krzysztofik (24):
>   tests/core_hotunplug: Use igt_assert_fd()
>   tests/core_hotunplug: Constify dev_bus_addr string
>   tests/core_hotunplug: Clean up device open error handling
>   tests/core_hotunplug: Consolidate duplicated debug messages
>   tests/core_hotunplug: Assert successful device filter application
>   tests/core_hotunplug: Maintain a single data structure instance
>   tests/core_hotunplug: Pass errors via a data structure field
>   tests/core_hotunplug: Handle device close errors
>   tests/core_hotunplug: Prepare invariant data once per test run
>   tests/core_hotunplug: Skip selectively on sysfs close errors
>   tests/core_hotunplug: Recover from subtest failures
>   tests/core_hotunplug: Fail subtests on device close errors
>   tests/core_hotunplug: Let the driver time out essential sysfs
>     operations
>   tests/core_hotunplug: Process return values of sysfs operations
>   tests/core_hotunplug: Assert expected device presence/absence
>   tests/core_hotunplug: Explicitly ignore unused return values
>   tests/core_hotunplug: Also check health of render device node
>   tests/core_hotunplug: More thorough i915 healthcheck and recovery
>   tests/core_hotunplug: Add 'lateclose before restore' variants
>   tests/core_hotunplug: Check health both before and after late close
>   tests/core_hotunplug: HSW/BDW audio issue workaround
>   tests/core_hotunplug: Duplicate debug messages in dmesg
>   tests/core_hotunplug: Un-blocklist *bind* subtests
>   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> 
>  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
>  tests/intel-ci/blacklist.txt          |   2 +-
>  tests/intel-ci/fast-feedback.testlist |   1 +
>  3 files changed, 431 insertions(+), 132 deletions(-)
> 
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-14 18:18   ` Michał Winiarski
  0 siblings, 0 replies; 77+ messages in thread
From: Michał Winiarski @ 2020-09-14 18:18 UTC (permalink / raw)
  To: Janusz Krzysztofik, igt-dev; +Cc: intel-gfx, Michał Winiarski

Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> Clean up the test code, add some new basic subtests, then unblock
> unbind test variants.
> 
> No incompletes / aborts nor subsequently run test issues have been
> reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
> unidentified driver sysfs issue but the device is fully recovered and
> left in a usable state.  Perceived Haswell/Broadwell issue with audio
> power management has been worked around and its potential occurrence
> is reported as an IGT warning.
> 
> Series changelog:
> v2: New patch "Un-blocklist *bind* subtests added.
> v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
>     from subtest failures".
>   - a new patche "Clean up device open error handling" added, an old
>     patch "Fix missing newline" obsoleted by the new one dropped,
>   - other new patches added:
>     - "Let the driver time out essential sysfs operations",
>     - "More thorough i915 healthcheck and recovery",
>   - a patch "Add 'lateclose before restore' variants" from another
>     series included.
> v4: Optional patch "Duplicate debug messages in dmesg" from another
>     series included.
> v5: New patch added with Haswell audio related kernel warning worked
>     around and replaced with an IGT warning to preserve visibility of
>     the issue.
> v6: New patch added for also checking health of render device nodes,
>   - new patch added with proper handling of health check before late
>     close,
>   - inclusion of unbind-rebind scenario to BAT scope proposed.
> 
> @Michał: Since some patch updates are trivial, I've preserved your
> v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
> marked your R-b as v1/v2 applicable.  Please have a look and confirm if
> you are still OK with them.

Feel free to add:
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

For the whole series (with the exception of intel-ci part).

-Michał

> 
> @Tvrtko: As I already asked before, please support my attempt to remove
> the unbind test variants from the blocklist.
> 
> @Petri, @Martin: Assuming CI results will be as good as those obtained
> on Trybot, please give me your green light for merging this series if
> you have no objections.
> 
> Thanks,
> Janusz
> 
> Janusz Krzysztofik (24):
>   tests/core_hotunplug: Use igt_assert_fd()
>   tests/core_hotunplug: Constify dev_bus_addr string
>   tests/core_hotunplug: Clean up device open error handling
>   tests/core_hotunplug: Consolidate duplicated debug messages
>   tests/core_hotunplug: Assert successful device filter application
>   tests/core_hotunplug: Maintain a single data structure instance
>   tests/core_hotunplug: Pass errors via a data structure field
>   tests/core_hotunplug: Handle device close errors
>   tests/core_hotunplug: Prepare invariant data once per test run
>   tests/core_hotunplug: Skip selectively on sysfs close errors
>   tests/core_hotunplug: Recover from subtest failures
>   tests/core_hotunplug: Fail subtests on device close errors
>   tests/core_hotunplug: Let the driver time out essential sysfs
>     operations
>   tests/core_hotunplug: Process return values of sysfs operations
>   tests/core_hotunplug: Assert expected device presence/absence
>   tests/core_hotunplug: Explicitly ignore unused return values
>   tests/core_hotunplug: Also check health of render device node
>   tests/core_hotunplug: More thorough i915 healthcheck and recovery
>   tests/core_hotunplug: Add 'lateclose before restore' variants
>   tests/core_hotunplug: Check health both before and after late close
>   tests/core_hotunplug: HSW/BDW audio issue workaround
>   tests/core_hotunplug: Duplicate debug messages in dmesg
>   tests/core_hotunplug: Un-blocklist *bind* subtests
>   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> 
>  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
>  tests/intel-ci/blacklist.txt          |   2 +-
>  tests/intel-ci/fast-feedback.testlist |   1 +
>  3 files changed, 431 insertions(+), 132 deletions(-)
> 
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
  2020-09-14 18:18   ` [igt-dev] " Michał Winiarski
@ 2020-09-14 19:30     ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-14 19:30 UTC (permalink / raw)
  To: Michał Winiarski, igt-dev; +Cc: intel-gfx, Lakshminarayana Vudum

On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > Clean up the test code, add some new basic subtests, then unblock
> > unbind test variants.
> > 
> > No incompletes / aborts nor subsequently run test issues have been
> > reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
> > unidentified driver sysfs issue but the device is fully recovered and
> > left in a usable state.  Perceived Haswell/Broadwell issue with audio
> > power management has been worked around and its potential occurrence
> > is reported as an IGT warning.
> > 
> > Series changelog:
> > v2: New patch "Un-blocklist *bind* subtests added.
> > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> >     from subtest failures".
> >   - a new patche "Clean up device open error handling" added, an old
> >     patch "Fix missing newline" obsoleted by the new one dropped,
> >   - other new patches added:
> >     - "Let the driver time out essential sysfs operations",
> >     - "More thorough i915 healthcheck and recovery",
> >   - a patch "Add 'lateclose before restore' variants" from another
> >     series included.
> > v4: Optional patch "Duplicate debug messages in dmesg" from another
> >     series included.
> > v5: New patch added with Haswell audio related kernel warning worked
> >     around and replaced with an IGT warning to preserve visibility of
> >     the issue.
> > v6: New patch added for also checking health of render device nodes,
> >   - new patch added with proper handling of health check before late
> >     close,
> >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > 
> > @Michał: Since some patch updates are trivial, I've preserved your
> > v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
> > marked your R-b as v1/v2 applicable.  Please have a look and confirm if
> > you are still OK with them.
> 
> Feel free to add:
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> For the whole series (with the exception of intel-ci part).

Pushed.

@Petri, @Michał - thank you for review.

@Lakshmi:
- please open a new bug for the issue reported by the igt@core
_hotunplug@hotrebind-lateclose subtest failing on all platforms,
- IGT warning reported by igt@core_hotunplug@*bind* on Haswell and
Broadwell platofrms is caused by the same issue as the one reported now
in a similar way on Haswell by igt@device_reset@unbind-reset-rebind -
please update the associated filter so it covers all those tests.

Thanks,
Janusz


> 
> -Michał
> 
> > @Tvrtko: As I already asked before, please support my attempt to remove
> > the unbind test variants from the blocklist.
> > 
> > @Petri, @Martin: Assuming CI results will be as good as those obtained
> > on Trybot, please give me your green light for merging this series if
> > you have no objections.
> > 
> > Thanks,
> > Janusz
> > 
> > Janusz Krzysztofik (24):
> >   tests/core_hotunplug: Use igt_assert_fd()
> >   tests/core_hotunplug: Constify dev_bus_addr string
> >   tests/core_hotunplug: Clean up device open error handling
> >   tests/core_hotunplug: Consolidate duplicated debug messages
> >   tests/core_hotunplug: Assert successful device filter application
> >   tests/core_hotunplug: Maintain a single data structure instance
> >   tests/core_hotunplug: Pass errors via a data structure field
> >   tests/core_hotunplug: Handle device close errors
> >   tests/core_hotunplug: Prepare invariant data once per test run
> >   tests/core_hotunplug: Skip selectively on sysfs close errors
> >   tests/core_hotunplug: Recover from subtest failures
> >   tests/core_hotunplug: Fail subtests on device close errors
> >   tests/core_hotunplug: Let the driver time out essential sysfs
> >     operations
> >   tests/core_hotunplug: Process return values of sysfs operations
> >   tests/core_hotunplug: Assert expected device presence/absence
> >   tests/core_hotunplug: Explicitly ignore unused return values
> >   tests/core_hotunplug: Also check health of render device node
> >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> >   tests/core_hotunplug: Add 'lateclose before restore' variants
> >   tests/core_hotunplug: Check health both before and after late close
> >   tests/core_hotunplug: HSW/BDW audio issue workaround
> >   tests/core_hotunplug: Duplicate debug messages in dmesg
> >   tests/core_hotunplug: Un-blocklist *bind* subtests
> >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > 
> >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> >  tests/intel-ci/blacklist.txt          |   2 +-
> >  tests/intel-ci/fast-feedback.testlist |   1 +
> >  3 files changed, 431 insertions(+), 132 deletions(-)
> > 
> > -- 
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-14 19:30     ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-14 19:30 UTC (permalink / raw)
  To: Michał Winiarski, igt-dev
  Cc: intel-gfx, Petri Latvala, Lakshminarayana Vudum

On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > Clean up the test code, add some new basic subtests, then unblock
> > unbind test variants.
> > 
> > No incompletes / aborts nor subsequently run test issues have been
> > reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
> > unidentified driver sysfs issue but the device is fully recovered and
> > left in a usable state.  Perceived Haswell/Broadwell issue with audio
> > power management has been worked around and its potential occurrence
> > is reported as an IGT warning.
> > 
> > Series changelog:
> > v2: New patch "Un-blocklist *bind* subtests added.
> > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> >     from subtest failures".
> >   - a new patche "Clean up device open error handling" added, an old
> >     patch "Fix missing newline" obsoleted by the new one dropped,
> >   - other new patches added:
> >     - "Let the driver time out essential sysfs operations",
> >     - "More thorough i915 healthcheck and recovery",
> >   - a patch "Add 'lateclose before restore' variants" from another
> >     series included.
> > v4: Optional patch "Duplicate debug messages in dmesg" from another
> >     series included.
> > v5: New patch added with Haswell audio related kernel warning worked
> >     around and replaced with an IGT warning to preserve visibility of
> >     the issue.
> > v6: New patch added for also checking health of render device nodes,
> >   - new patch added with proper handling of health check before late
> >     close,
> >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > 
> > @Michał: Since some patch updates are trivial, I've preserved your
> > v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
> > marked your R-b as v1/v2 applicable.  Please have a look and confirm if
> > you are still OK with them.
> 
> Feel free to add:
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> For the whole series (with the exception of intel-ci part).

Pushed.

@Petri, @Michał - thank you for review.

@Lakshmi:
- please open a new bug for the issue reported by the igt@core
_hotunplug@hotrebind-lateclose subtest failing on all platforms,
- IGT warning reported by igt@core_hotunplug@*bind* on Haswell and
Broadwell platofrms is caused by the same issue as the one reported now
in a similar way on Haswell by igt@device_reset@unbind-reset-rebind -
please update the associated filter so it covers all those tests.

Thanks,
Janusz


> 
> -Michał
> 
> > @Tvrtko: As I already asked before, please support my attempt to remove
> > the unbind test variants from the blocklist.
> > 
> > @Petri, @Martin: Assuming CI results will be as good as those obtained
> > on Trybot, please give me your green light for merging this series if
> > you have no objections.
> > 
> > Thanks,
> > Janusz
> > 
> > Janusz Krzysztofik (24):
> >   tests/core_hotunplug: Use igt_assert_fd()
> >   tests/core_hotunplug: Constify dev_bus_addr string
> >   tests/core_hotunplug: Clean up device open error handling
> >   tests/core_hotunplug: Consolidate duplicated debug messages
> >   tests/core_hotunplug: Assert successful device filter application
> >   tests/core_hotunplug: Maintain a single data structure instance
> >   tests/core_hotunplug: Pass errors via a data structure field
> >   tests/core_hotunplug: Handle device close errors
> >   tests/core_hotunplug: Prepare invariant data once per test run
> >   tests/core_hotunplug: Skip selectively on sysfs close errors
> >   tests/core_hotunplug: Recover from subtest failures
> >   tests/core_hotunplug: Fail subtests on device close errors
> >   tests/core_hotunplug: Let the driver time out essential sysfs
> >     operations
> >   tests/core_hotunplug: Process return values of sysfs operations
> >   tests/core_hotunplug: Assert expected device presence/absence
> >   tests/core_hotunplug: Explicitly ignore unused return values
> >   tests/core_hotunplug: Also check health of render device node
> >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> >   tests/core_hotunplug: Add 'lateclose before restore' variants
> >   tests/core_hotunplug: Check health both before and after late close
> >   tests/core_hotunplug: HSW/BDW audio issue workaround
> >   tests/core_hotunplug: Duplicate debug messages in dmesg
> >   tests/core_hotunplug: Un-blocklist *bind* subtests
> >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > 
> >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> >  tests/intel-ci/blacklist.txt          |   2 +-
> >  tests/intel-ci/fast-feedback.testlist |   1 +
> >  3 files changed, 431 insertions(+), 132 deletions(-)
> > 
> > -- 
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
  2020-09-14 19:30     ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-14 20:43       ` Vudum, Lakshminarayana
  -1 siblings, 0 replies; 77+ messages in thread
From: Vudum, Lakshminarayana @ 2020-09-14 20:43 UTC (permalink / raw)
  To: Janusz Krzysztofik, Winiarski, Michal, igt-dev; +Cc: intel-gfx

igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log. Otherwise I filed the issue https://gitlab.freedesktop.org/drm/intel/-/issues/2464

Thanks,
Lakshmi.

-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Monday, September 14, 2020 12:31 PM
To: Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>; Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>
Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements

On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > Clean up the test code, add some new basic subtests, then unblock 
> > unbind test variants.
> > 
> > No incompletes / aborts nor subsequently run test issues have been 
> > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > far unidentified driver sysfs issue but the device is fully 
> > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > issue with audio power management has been worked around and its 
> > potential occurrence is reported as an IGT warning.
> > 
> > Series changelog:
> > v2: New patch "Un-blocklist *bind* subtests added.
> > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> >     from subtest failures".
> >   - a new patche "Clean up device open error handling" added, an old
> >     patch "Fix missing newline" obsoleted by the new one dropped,
> >   - other new patches added:
> >     - "Let the driver time out essential sysfs operations",
> >     - "More thorough i915 healthcheck and recovery",
> >   - a patch "Add 'lateclose before restore' variants" from another
> >     series included.
> > v4: Optional patch "Duplicate debug messages in dmesg" from another
> >     series included.
> > v5: New patch added with Haswell audio related kernel warning worked
> >     around and replaced with an IGT warning to preserve visibility of
> >     the issue.
> > v6: New patch added for also checking health of render device nodes,
> >   - new patch added with proper handling of health check before late
> >     close,
> >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > 
> > @Michał: Since some patch updates are trivial, I've preserved your
> > v1/v2 Reviewd-by: except for patches with non-trivial changes, where 
> > I marked your R-b as v1/v2 applicable.  Please have a look and 
> > confirm if you are still OK with them.
> 
> Feel free to add:
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> For the whole series (with the exception of intel-ci part).

Pushed.

@Petri, @Michał - thank you for review.

@Lakshmi:
- please open a new bug for the issue reported by the igt@core _hotunplug@hotrebind-lateclose subtest failing on all platforms,
- IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.

Thanks,
Janusz


> 
> -Michał
> 
> > @Tvrtko: As I already asked before, please support my attempt to 
> > remove the unbind test variants from the blocklist.
> > 
> > @Petri, @Martin: Assuming CI results will be as good as those 
> > obtained on Trybot, please give me your green light for merging this 
> > series if you have no objections.
> > 
> > Thanks,
> > Janusz
> > 
> > Janusz Krzysztofik (24):
> >   tests/core_hotunplug: Use igt_assert_fd()
> >   tests/core_hotunplug: Constify dev_bus_addr string
> >   tests/core_hotunplug: Clean up device open error handling
> >   tests/core_hotunplug: Consolidate duplicated debug messages
> >   tests/core_hotunplug: Assert successful device filter application
> >   tests/core_hotunplug: Maintain a single data structure instance
> >   tests/core_hotunplug: Pass errors via a data structure field
> >   tests/core_hotunplug: Handle device close errors
> >   tests/core_hotunplug: Prepare invariant data once per test run
> >   tests/core_hotunplug: Skip selectively on sysfs close errors
> >   tests/core_hotunplug: Recover from subtest failures
> >   tests/core_hotunplug: Fail subtests on device close errors
> >   tests/core_hotunplug: Let the driver time out essential sysfs
> >     operations
> >   tests/core_hotunplug: Process return values of sysfs operations
> >   tests/core_hotunplug: Assert expected device presence/absence
> >   tests/core_hotunplug: Explicitly ignore unused return values
> >   tests/core_hotunplug: Also check health of render device node
> >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> >   tests/core_hotunplug: Add 'lateclose before restore' variants
> >   tests/core_hotunplug: Check health both before and after late close
> >   tests/core_hotunplug: HSW/BDW audio issue workaround
> >   tests/core_hotunplug: Duplicate debug messages in dmesg
> >   tests/core_hotunplug: Un-blocklist *bind* subtests
> >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > 
> >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> >  tests/intel-ci/blacklist.txt          |   2 +-
> >  tests/intel-ci/fast-feedback.testlist |   1 +
> >  3 files changed, 431 insertions(+), 132 deletions(-)
> > 
> > --
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-14 20:43       ` Vudum, Lakshminarayana
  0 siblings, 0 replies; 77+ messages in thread
From: Vudum, Lakshminarayana @ 2020-09-14 20:43 UTC (permalink / raw)
  To: Janusz Krzysztofik, Winiarski, Michal, igt-dev; +Cc: intel-gfx, Latvala, Petri

igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log. Otherwise I filed the issue https://gitlab.freedesktop.org/drm/intel/-/issues/2464

Thanks,
Lakshmi.

-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Monday, September 14, 2020 12:31 PM
To: Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>; Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>
Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements

On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > Clean up the test code, add some new basic subtests, then unblock 
> > unbind test variants.
> > 
> > No incompletes / aborts nor subsequently run test issues have been 
> > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > far unidentified driver sysfs issue but the device is fully 
> > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > issue with audio power management has been worked around and its 
> > potential occurrence is reported as an IGT warning.
> > 
> > Series changelog:
> > v2: New patch "Un-blocklist *bind* subtests added.
> > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> >     from subtest failures".
> >   - a new patche "Clean up device open error handling" added, an old
> >     patch "Fix missing newline" obsoleted by the new one dropped,
> >   - other new patches added:
> >     - "Let the driver time out essential sysfs operations",
> >     - "More thorough i915 healthcheck and recovery",
> >   - a patch "Add 'lateclose before restore' variants" from another
> >     series included.
> > v4: Optional patch "Duplicate debug messages in dmesg" from another
> >     series included.
> > v5: New patch added with Haswell audio related kernel warning worked
> >     around and replaced with an IGT warning to preserve visibility of
> >     the issue.
> > v6: New patch added for also checking health of render device nodes,
> >   - new patch added with proper handling of health check before late
> >     close,
> >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > 
> > @Michał: Since some patch updates are trivial, I've preserved your
> > v1/v2 Reviewd-by: except for patches with non-trivial changes, where 
> > I marked your R-b as v1/v2 applicable.  Please have a look and 
> > confirm if you are still OK with them.
> 
> Feel free to add:
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> For the whole series (with the exception of intel-ci part).

Pushed.

@Petri, @Michał - thank you for review.

@Lakshmi:
- please open a new bug for the issue reported by the igt@core _hotunplug@hotrebind-lateclose subtest failing on all platforms,
- IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.

Thanks,
Janusz


> 
> -Michał
> 
> > @Tvrtko: As I already asked before, please support my attempt to 
> > remove the unbind test variants from the blocklist.
> > 
> > @Petri, @Martin: Assuming CI results will be as good as those 
> > obtained on Trybot, please give me your green light for merging this 
> > series if you have no objections.
> > 
> > Thanks,
> > Janusz
> > 
> > Janusz Krzysztofik (24):
> >   tests/core_hotunplug: Use igt_assert_fd()
> >   tests/core_hotunplug: Constify dev_bus_addr string
> >   tests/core_hotunplug: Clean up device open error handling
> >   tests/core_hotunplug: Consolidate duplicated debug messages
> >   tests/core_hotunplug: Assert successful device filter application
> >   tests/core_hotunplug: Maintain a single data structure instance
> >   tests/core_hotunplug: Pass errors via a data structure field
> >   tests/core_hotunplug: Handle device close errors
> >   tests/core_hotunplug: Prepare invariant data once per test run
> >   tests/core_hotunplug: Skip selectively on sysfs close errors
> >   tests/core_hotunplug: Recover from subtest failures
> >   tests/core_hotunplug: Fail subtests on device close errors
> >   tests/core_hotunplug: Let the driver time out essential sysfs
> >     operations
> >   tests/core_hotunplug: Process return values of sysfs operations
> >   tests/core_hotunplug: Assert expected device presence/absence
> >   tests/core_hotunplug: Explicitly ignore unused return values
> >   tests/core_hotunplug: Also check health of render device node
> >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> >   tests/core_hotunplug: Add 'lateclose before restore' variants
> >   tests/core_hotunplug: Check health both before and after late close
> >   tests/core_hotunplug: HSW/BDW audio issue workaround
> >   tests/core_hotunplug: Duplicate debug messages in dmesg
> >   tests/core_hotunplug: Un-blocklist *bind* subtests
> >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > 
> >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> >  tests/intel-ci/blacklist.txt          |   2 +-
> >  tests/intel-ci/fast-feedback.testlist |   1 +
> >  3 files changed, 431 insertions(+), 132 deletions(-)
> > 
> > --
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
  2020-09-14 20:43       ` [igt-dev] " Vudum, Lakshminarayana
@ 2020-09-15  7:47         ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-15  7:47 UTC (permalink / raw)
  To: Vudum, Lakshminarayana, Winiarski, Michal, igt-dev; +Cc: intel-gfx

Hi Lakshmi,

On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.

Here is a fresh evidence:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html

Thanks,
Janusz

>  Otherwise I filed the issue https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> 
> Thanks,
> Lakshmi.
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
> Sent: Monday, September 14, 2020 12:31 PM
> To: Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>; Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
> 
> On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > Clean up the test code, add some new basic subtests, then unblock 
> > > unbind test variants.
> > > 
> > > No incompletes / aborts nor subsequently run test issues have been 
> > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > far unidentified driver sysfs issue but the device is fully 
> > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > issue with audio power management has been worked around and its 
> > > potential occurrence is reported as an IGT warning.
> > > 
> > > Series changelog:
> > > v2: New patch "Un-blocklist *bind* subtests added.
> > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > >     from subtest failures".
> > >   - a new patche "Clean up device open error handling" added, an old
> > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > >   - other new patches added:
> > >     - "Let the driver time out essential sysfs operations",
> > >     - "More thorough i915 healthcheck and recovery",
> > >   - a patch "Add 'lateclose before restore' variants" from another
> > >     series included.
> > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > >     series included.
> > > v5: New patch added with Haswell audio related kernel warning worked
> > >     around and replaced with an IGT warning to preserve visibility of
> > >     the issue.
> > > v6: New patch added for also checking health of render device nodes,
> > >   - new patch added with proper handling of health check before late
> > >     close,
> > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > 
> > > @Michał: Since some patch updates are trivial, I've preserved your
> > > v1/v2 Reviewd-by: except for patches with non-trivial changes, where 
> > > I marked your R-b as v1/v2 applicable.  Please have a look and 
> > > confirm if you are still OK with them.
> > 
> > Feel free to add:
> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > For the whole series (with the exception of intel-ci part).
> 
> Pushed.
> 
> @Petri, @Michał - thank you for review.
> 
> @Lakshmi:
> - please open a new bug for the issue reported by the igt@core _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> 
> Thanks,
> Janusz
> 
> 
> > -Michał
> > 
> > > @Tvrtko: As I already asked before, please support my attempt to 
> > > remove the unbind test variants from the blocklist.
> > > 
> > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > obtained on Trybot, please give me your green light for merging this 
> > > series if you have no objections.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > Janusz Krzysztofik (24):
> > >   tests/core_hotunplug: Use igt_assert_fd()
> > >   tests/core_hotunplug: Constify dev_bus_addr string
> > >   tests/core_hotunplug: Clean up device open error handling
> > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > >   tests/core_hotunplug: Assert successful device filter application
> > >   tests/core_hotunplug: Maintain a single data structure instance
> > >   tests/core_hotunplug: Pass errors via a data structure field
> > >   tests/core_hotunplug: Handle device close errors
> > >   tests/core_hotunplug: Prepare invariant data once per test run
> > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > >   tests/core_hotunplug: Recover from subtest failures
> > >   tests/core_hotunplug: Fail subtests on device close errors
> > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > >     operations
> > >   tests/core_hotunplug: Process return values of sysfs operations
> > >   tests/core_hotunplug: Assert expected device presence/absence
> > >   tests/core_hotunplug: Explicitly ignore unused return values
> > >   tests/core_hotunplug: Also check health of render device node
> > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > >   tests/core_hotunplug: Check health both before and after late close
> > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > 
> > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > >  tests/intel-ci/blacklist.txt          |   2 +-
> > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > 
> > > --
> > > 2.21.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-15  7:47         ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-15  7:47 UTC (permalink / raw)
  To: Vudum, Lakshminarayana, Winiarski, Michal, igt-dev
  Cc: intel-gfx, Latvala, Petri

Hi Lakshmi,

On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.

Here is a fresh evidence:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html

Thanks,
Janusz

>  Otherwise I filed the issue https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> 
> Thanks,
> Lakshmi.
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
> Sent: Monday, September 14, 2020 12:31 PM
> To: Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>; Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
> 
> On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > Clean up the test code, add some new basic subtests, then unblock 
> > > unbind test variants.
> > > 
> > > No incompletes / aborts nor subsequently run test issues have been 
> > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > far unidentified driver sysfs issue but the device is fully 
> > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > issue with audio power management has been worked around and its 
> > > potential occurrence is reported as an IGT warning.
> > > 
> > > Series changelog:
> > > v2: New patch "Un-blocklist *bind* subtests added.
> > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > >     from subtest failures".
> > >   - a new patche "Clean up device open error handling" added, an old
> > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > >   - other new patches added:
> > >     - "Let the driver time out essential sysfs operations",
> > >     - "More thorough i915 healthcheck and recovery",
> > >   - a patch "Add 'lateclose before restore' variants" from another
> > >     series included.
> > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > >     series included.
> > > v5: New patch added with Haswell audio related kernel warning worked
> > >     around and replaced with an IGT warning to preserve visibility of
> > >     the issue.
> > > v6: New patch added for also checking health of render device nodes,
> > >   - new patch added with proper handling of health check before late
> > >     close,
> > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > 
> > > @Michał: Since some patch updates are trivial, I've preserved your
> > > v1/v2 Reviewd-by: except for patches with non-trivial changes, where 
> > > I marked your R-b as v1/v2 applicable.  Please have a look and 
> > > confirm if you are still OK with them.
> > 
> > Feel free to add:
> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > For the whole series (with the exception of intel-ci part).
> 
> Pushed.
> 
> @Petri, @Michał - thank you for review.
> 
> @Lakshmi:
> - please open a new bug for the issue reported by the igt@core _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> 
> Thanks,
> Janusz
> 
> 
> > -Michał
> > 
> > > @Tvrtko: As I already asked before, please support my attempt to 
> > > remove the unbind test variants from the blocklist.
> > > 
> > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > obtained on Trybot, please give me your green light for merging this 
> > > series if you have no objections.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > Janusz Krzysztofik (24):
> > >   tests/core_hotunplug: Use igt_assert_fd()
> > >   tests/core_hotunplug: Constify dev_bus_addr string
> > >   tests/core_hotunplug: Clean up device open error handling
> > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > >   tests/core_hotunplug: Assert successful device filter application
> > >   tests/core_hotunplug: Maintain a single data structure instance
> > >   tests/core_hotunplug: Pass errors via a data structure field
> > >   tests/core_hotunplug: Handle device close errors
> > >   tests/core_hotunplug: Prepare invariant data once per test run
> > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > >   tests/core_hotunplug: Recover from subtest failures
> > >   tests/core_hotunplug: Fail subtests on device close errors
> > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > >     operations
> > >   tests/core_hotunplug: Process return values of sysfs operations
> > >   tests/core_hotunplug: Assert expected device presence/absence
> > >   tests/core_hotunplug: Explicitly ignore unused return values
> > >   tests/core_hotunplug: Also check health of render device node
> > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > >   tests/core_hotunplug: Check health both before and after late close
> > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > 
> > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > >  tests/intel-ci/blacklist.txt          |   2 +-
> > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > 
> > > --
> > > 2.21.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
  2020-09-15  7:47         ` [igt-dev] " Janusz Krzysztofik
@ 2020-09-15 15:39           ` Vudum, Lakshminarayana
  -1 siblings, 0 replies; 77+ messages in thread
From: Vudum, Lakshminarayana @ 2020-09-15 15:39 UTC (permalink / raw)
  To: Janusz Krzysztofik, Winiarski, Michal, igt-dev; +Cc: intel-gfx

Hi Janusz,

I have filed https://gitlab.freedesktop.org/drm/intel/-/issues/2469 for igt@core_hotunplug@hotrebind-lateclose failure. 
Is it GUC issue?

Thanks,
Lakshmi


-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Tuesday, September 15, 2020 12:47 AM
To: Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>; Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>
Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements

Hi Lakshmi,

On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.

Here is a fresh evidence:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html

Thanks,
Janusz

>  Otherwise I filed the issue 
> https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> 
> Thanks,
> Lakshmi.
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Sent: Monday, September 14, 2020 12:31 PM
> To: Winiarski, Michal <michal.winiarski@intel.com>; 
> igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; 
> intel-gfx@lists.freedesktop.org; Latvala, Petri 
> <petri.latvala@intel.com>; Vudum, Lakshminarayana 
> <lakshminarayana.vudum@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: 
> Fixes and enhancements
> 
> On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > Clean up the test code, add some new basic subtests, then unblock 
> > > unbind test variants.
> > > 
> > > No incompletes / aborts nor subsequently run test issues have been 
> > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > far unidentified driver sysfs issue but the device is fully 
> > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > issue with audio power management has been worked around and its 
> > > potential occurrence is reported as an IGT warning.
> > > 
> > > Series changelog:
> > > v2: New patch "Un-blocklist *bind* subtests added.
> > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > >     from subtest failures".
> > >   - a new patche "Clean up device open error handling" added, an old
> > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > >   - other new patches added:
> > >     - "Let the driver time out essential sysfs operations",
> > >     - "More thorough i915 healthcheck and recovery",
> > >   - a patch "Add 'lateclose before restore' variants" from another
> > >     series included.
> > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > >     series included.
> > > v5: New patch added with Haswell audio related kernel warning worked
> > >     around and replaced with an IGT warning to preserve visibility of
> > >     the issue.
> > > v6: New patch added for also checking health of render device nodes,
> > >   - new patch added with proper handling of health check before late
> > >     close,
> > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > 
> > > @Michał: Since some patch updates are trivial, I've preserved your
> > > v1/v2 Reviewd-by: except for patches with non-trivial changes, 
> > > where I marked your R-b as v1/v2 applicable.  Please have a look 
> > > and confirm if you are still OK with them.
> > 
> > Feel free to add:
> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > For the whole series (with the exception of intel-ci part).
> 
> Pushed.
> 
> @Petri, @Michał - thank you for review.
> 
> @Lakshmi:
> - please open a new bug for the issue reported by the igt@core 
> _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> 
> Thanks,
> Janusz
> 
> 
> > -Michał
> > 
> > > @Tvrtko: As I already asked before, please support my attempt to 
> > > remove the unbind test variants from the blocklist.
> > > 
> > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > obtained on Trybot, please give me your green light for merging 
> > > this series if you have no objections.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > Janusz Krzysztofik (24):
> > >   tests/core_hotunplug: Use igt_assert_fd()
> > >   tests/core_hotunplug: Constify dev_bus_addr string
> > >   tests/core_hotunplug: Clean up device open error handling
> > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > >   tests/core_hotunplug: Assert successful device filter application
> > >   tests/core_hotunplug: Maintain a single data structure instance
> > >   tests/core_hotunplug: Pass errors via a data structure field
> > >   tests/core_hotunplug: Handle device close errors
> > >   tests/core_hotunplug: Prepare invariant data once per test run
> > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > >   tests/core_hotunplug: Recover from subtest failures
> > >   tests/core_hotunplug: Fail subtests on device close errors
> > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > >     operations
> > >   tests/core_hotunplug: Process return values of sysfs operations
> > >   tests/core_hotunplug: Assert expected device presence/absence
> > >   tests/core_hotunplug: Explicitly ignore unused return values
> > >   tests/core_hotunplug: Also check health of render device node
> > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > >   tests/core_hotunplug: Check health both before and after late close
> > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > 
> > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > >  tests/intel-ci/blacklist.txt          |   2 +-
> > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > 
> > > --
> > > 2.21.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-15 15:39           ` Vudum, Lakshminarayana
  0 siblings, 0 replies; 77+ messages in thread
From: Vudum, Lakshminarayana @ 2020-09-15 15:39 UTC (permalink / raw)
  To: Janusz Krzysztofik, Winiarski, Michal, igt-dev; +Cc: intel-gfx, Latvala, Petri

Hi Janusz,

I have filed https://gitlab.freedesktop.org/drm/intel/-/issues/2469 for igt@core_hotunplug@hotrebind-lateclose failure. 
Is it GUC issue?

Thanks,
Lakshmi


-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Tuesday, September 15, 2020 12:47 AM
To: Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>; Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>
Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements

Hi Lakshmi,

On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.

Here is a fresh evidence:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html

Thanks,
Janusz

>  Otherwise I filed the issue 
> https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> 
> Thanks,
> Lakshmi.
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Sent: Monday, September 14, 2020 12:31 PM
> To: Winiarski, Michal <michal.winiarski@intel.com>; 
> igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; 
> intel-gfx@lists.freedesktop.org; Latvala, Petri 
> <petri.latvala@intel.com>; Vudum, Lakshminarayana 
> <lakshminarayana.vudum@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: 
> Fixes and enhancements
> 
> On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > Clean up the test code, add some new basic subtests, then unblock 
> > > unbind test variants.
> > > 
> > > No incompletes / aborts nor subsequently run test issues have been 
> > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > far unidentified driver sysfs issue but the device is fully 
> > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > issue with audio power management has been worked around and its 
> > > potential occurrence is reported as an IGT warning.
> > > 
> > > Series changelog:
> > > v2: New patch "Un-blocklist *bind* subtests added.
> > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > >     from subtest failures".
> > >   - a new patche "Clean up device open error handling" added, an old
> > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > >   - other new patches added:
> > >     - "Let the driver time out essential sysfs operations",
> > >     - "More thorough i915 healthcheck and recovery",
> > >   - a patch "Add 'lateclose before restore' variants" from another
> > >     series included.
> > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > >     series included.
> > > v5: New patch added with Haswell audio related kernel warning worked
> > >     around and replaced with an IGT warning to preserve visibility of
> > >     the issue.
> > > v6: New patch added for also checking health of render device nodes,
> > >   - new patch added with proper handling of health check before late
> > >     close,
> > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > 
> > > @Michał: Since some patch updates are trivial, I've preserved your
> > > v1/v2 Reviewd-by: except for patches with non-trivial changes, 
> > > where I marked your R-b as v1/v2 applicable.  Please have a look 
> > > and confirm if you are still OK with them.
> > 
> > Feel free to add:
> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > For the whole series (with the exception of intel-ci part).
> 
> Pushed.
> 
> @Petri, @Michał - thank you for review.
> 
> @Lakshmi:
> - please open a new bug for the issue reported by the igt@core 
> _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> 
> Thanks,
> Janusz
> 
> 
> > -Michał
> > 
> > > @Tvrtko: As I already asked before, please support my attempt to 
> > > remove the unbind test variants from the blocklist.
> > > 
> > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > obtained on Trybot, please give me your green light for merging 
> > > this series if you have no objections.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > Janusz Krzysztofik (24):
> > >   tests/core_hotunplug: Use igt_assert_fd()
> > >   tests/core_hotunplug: Constify dev_bus_addr string
> > >   tests/core_hotunplug: Clean up device open error handling
> > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > >   tests/core_hotunplug: Assert successful device filter application
> > >   tests/core_hotunplug: Maintain a single data structure instance
> > >   tests/core_hotunplug: Pass errors via a data structure field
> > >   tests/core_hotunplug: Handle device close errors
> > >   tests/core_hotunplug: Prepare invariant data once per test run
> > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > >   tests/core_hotunplug: Recover from subtest failures
> > >   tests/core_hotunplug: Fail subtests on device close errors
> > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > >     operations
> > >   tests/core_hotunplug: Process return values of sysfs operations
> > >   tests/core_hotunplug: Assert expected device presence/absence
> > >   tests/core_hotunplug: Explicitly ignore unused return values
> > >   tests/core_hotunplug: Also check health of render device node
> > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > >   tests/core_hotunplug: Check health both before and after late close
> > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > 
> > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > >  tests/intel-ci/blacklist.txt          |   2 +-
> > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > 
> > > --
> > > 2.21.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
  2020-09-15 15:39           ` [igt-dev] " Vudum, Lakshminarayana
@ 2020-09-16  7:59             ` Janusz Krzysztofik
  -1 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-16  7:59 UTC (permalink / raw)
  To: Vudum, Lakshminarayana, Winiarski, Michal, igt-dev; +Cc: intel-gfx

Hi Lakshmi,

On Tue, 2020-09-15 at 15:39 +0000, Vudum, Lakshminarayana wrote:
> Hi Janusz,
> 
> I have filed https://gitlab.freedesktop.org/drm/intel/-/issues/2469 for igt@core_hotunplug@hotrebind-lateclose failure. 
> Is it GUC issue?

Wow, I thought that issue got hidden behind another one and I forgot
about that issueit.  That's great you've identified it.  And yes, it is
GuC specific.  However, as far as I can tell, the test recovers from
that condition so it is not the root cause of the subtest failures -
those happen on non-GuC platforms as well.

Then, we need to open another bug with a filter that captures the
following from the test standard error:

(core_hotunplug:2056) igt_aux-CRITICAL: Test assertion failure function igt_fork_hang_detector, file ../lib/igt_aux.c:517:
(core_hotunplug:2056) igt_aux-CRITICAL: Failed assertion: igt_params_set(fd, "reset", "%d", 1 )
(core_hotunplug:2056) igt_aux-CRITICAL: Last errno: 13, Permission denied

I have no idea if CI filters are able to trigger more than one bug from
a single subtest run, if not then I think the GuC issue should have
higher priority set so both are visible.

Thanks,
Janusz

> 
> Thanks,
> Lakshmi
> 
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
> Sent: Tuesday, September 15, 2020 12:47 AM
> To: Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>; Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
> 
> Hi Lakshmi,
> 
> On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> > igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.
> 
> Here is a fresh evidence:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html
> 
> Thanks,
> Janusz
> 
> >  Otherwise I filed the issue 
> > https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> > 
> > Thanks,
> > Lakshmi.
> > 
> > -----Original Message-----
> > From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Sent: Monday, September 14, 2020 12:31 PM
> > To: Winiarski, Michal <michal.winiarski@intel.com>; 
> > igt-dev@lists.freedesktop.org
> > Cc: Michał Winiarski <michal@hardline.pl>; 
> > intel-gfx@lists.freedesktop.org; Latvala, Petri 
> > <petri.latvala@intel.com>; Vudum, Lakshminarayana 
> > <lakshminarayana.vudum@intel.com>
> > Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: 
> > Fixes and enhancements
> > 
> > On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > > Clean up the test code, add some new basic subtests, then unblock 
> > > > unbind test variants.
> > > > 
> > > > No incompletes / aborts nor subsequently run test issues have been 
> > > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > > far unidentified driver sysfs issue but the device is fully 
> > > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > > issue with audio power management has been worked around and its 
> > > > potential occurrence is reported as an IGT warning.
> > > > 
> > > > Series changelog:
> > > > v2: New patch "Un-blocklist *bind* subtests added.
> > > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > > >     from subtest failures".
> > > >   - a new patche "Clean up device open error handling" added, an old
> > > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > > >   - other new patches added:
> > > >     - "Let the driver time out essential sysfs operations",
> > > >     - "More thorough i915 healthcheck and recovery",
> > > >   - a patch "Add 'lateclose before restore' variants" from another
> > > >     series included.
> > > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > > >     series included.
> > > > v5: New patch added with Haswell audio related kernel warning worked
> > > >     around and replaced with an IGT warning to preserve visibility of
> > > >     the issue.
> > > > v6: New patch added for also checking health of render device nodes,
> > > >   - new patch added with proper handling of health check before late
> > > >     close,
> > > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > > 
> > > > @Michał: Since some patch updates are trivial, I've preserved your
> > > > v1/v2 Reviewd-by: except for patches with non-trivial changes, 
> > > > where I marked your R-b as v1/v2 applicable.  Please have a look 
> > > > and confirm if you are still OK with them.
> > > 
> > > Feel free to add:
> > > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > > 
> > > For the whole series (with the exception of intel-ci part).
> > 
> > Pushed.
> > 
> > @Petri, @Michał - thank you for review.
> > 
> > @Lakshmi:
> > - please open a new bug for the issue reported by the igt@core 
> > _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> > - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> > 
> > Thanks,
> > Janusz
> > 
> > 
> > > -Michał
> > > 
> > > > @Tvrtko: As I already asked before, please support my attempt to 
> > > > remove the unbind test variants from the blocklist.
> > > > 
> > > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > > obtained on Trybot, please give me your green light for merging 
> > > > this series if you have no objections.
> > > > 
> > > > Thanks,
> > > > Janusz
> > > > 
> > > > Janusz Krzysztofik (24):
> > > >   tests/core_hotunplug: Use igt_assert_fd()
> > > >   tests/core_hotunplug: Constify dev_bus_addr string
> > > >   tests/core_hotunplug: Clean up device open error handling
> > > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > > >   tests/core_hotunplug: Assert successful device filter application
> > > >   tests/core_hotunplug: Maintain a single data structure instance
> > > >   tests/core_hotunplug: Pass errors via a data structure field
> > > >   tests/core_hotunplug: Handle device close errors
> > > >   tests/core_hotunplug: Prepare invariant data once per test run
> > > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > > >   tests/core_hotunplug: Recover from subtest failures
> > > >   tests/core_hotunplug: Fail subtests on device close errors
> > > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > > >     operations
> > > >   tests/core_hotunplug: Process return values of sysfs operations
> > > >   tests/core_hotunplug: Assert expected device presence/absence
> > > >   tests/core_hotunplug: Explicitly ignore unused return values
> > > >   tests/core_hotunplug: Also check health of render device node
> > > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > > >   tests/core_hotunplug: Check health both before and after late close
> > > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > > 
> > > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > > >  tests/intel-ci/blacklist.txt          |   2 +-
> > > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > > 
> > > > --
> > > > 2.21.1
> > > > 
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
@ 2020-09-16  7:59             ` Janusz Krzysztofik
  0 siblings, 0 replies; 77+ messages in thread
From: Janusz Krzysztofik @ 2020-09-16  7:59 UTC (permalink / raw)
  To: Vudum, Lakshminarayana, Winiarski, Michal, igt-dev
  Cc: intel-gfx, Latvala, Petri

Hi Lakshmi,

On Tue, 2020-09-15 at 15:39 +0000, Vudum, Lakshminarayana wrote:
> Hi Janusz,
> 
> I have filed https://gitlab.freedesktop.org/drm/intel/-/issues/2469 for igt@core_hotunplug@hotrebind-lateclose failure. 
> Is it GUC issue?

Wow, I thought that issue got hidden behind another one and I forgot
about that issueit.  That's great you've identified it.  And yes, it is
GuC specific.  However, as far as I can tell, the test recovers from
that condition so it is not the root cause of the subtest failures -
those happen on non-GuC platforms as well.

Then, we need to open another bug with a filter that captures the
following from the test standard error:

(core_hotunplug:2056) igt_aux-CRITICAL: Test assertion failure function igt_fork_hang_detector, file ../lib/igt_aux.c:517:
(core_hotunplug:2056) igt_aux-CRITICAL: Failed assertion: igt_params_set(fd, "reset", "%d", 1 )
(core_hotunplug:2056) igt_aux-CRITICAL: Last errno: 13, Permission denied

I have no idea if CI filters are able to trigger more than one bug from
a single subtest run, if not then I think the GuC issue should have
higher priority set so both are visible.

Thanks,
Janusz

> 
> Thanks,
> Lakshmi
> 
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
> Sent: Tuesday, September 15, 2020 12:47 AM
> To: Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>; Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
> 
> Hi Lakshmi,
> 
> On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> > igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.
> 
> Here is a fresh evidence:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html
> 
> Thanks,
> Janusz
> 
> >  Otherwise I filed the issue 
> > https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> > 
> > Thanks,
> > Lakshmi.
> > 
> > -----Original Message-----
> > From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Sent: Monday, September 14, 2020 12:31 PM
> > To: Winiarski, Michal <michal.winiarski@intel.com>; 
> > igt-dev@lists.freedesktop.org
> > Cc: Michał Winiarski <michal@hardline.pl>; 
> > intel-gfx@lists.freedesktop.org; Latvala, Petri 
> > <petri.latvala@intel.com>; Vudum, Lakshminarayana 
> > <lakshminarayana.vudum@intel.com>
> > Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: 
> > Fixes and enhancements
> > 
> > On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > > Clean up the test code, add some new basic subtests, then unblock 
> > > > unbind test variants.
> > > > 
> > > > No incompletes / aborts nor subsequently run test issues have been 
> > > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > > far unidentified driver sysfs issue but the device is fully 
> > > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > > issue with audio power management has been worked around and its 
> > > > potential occurrence is reported as an IGT warning.
> > > > 
> > > > Series changelog:
> > > > v2: New patch "Un-blocklist *bind* subtests added.
> > > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > > >     from subtest failures".
> > > >   - a new patche "Clean up device open error handling" added, an old
> > > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > > >   - other new patches added:
> > > >     - "Let the driver time out essential sysfs operations",
> > > >     - "More thorough i915 healthcheck and recovery",
> > > >   - a patch "Add 'lateclose before restore' variants" from another
> > > >     series included.
> > > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > > >     series included.
> > > > v5: New patch added with Haswell audio related kernel warning worked
> > > >     around and replaced with an IGT warning to preserve visibility of
> > > >     the issue.
> > > > v6: New patch added for also checking health of render device nodes,
> > > >   - new patch added with proper handling of health check before late
> > > >     close,
> > > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > > 
> > > > @Michał: Since some patch updates are trivial, I've preserved your
> > > > v1/v2 Reviewd-by: except for patches with non-trivial changes, 
> > > > where I marked your R-b as v1/v2 applicable.  Please have a look 
> > > > and confirm if you are still OK with them.
> > > 
> > > Feel free to add:
> > > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > > 
> > > For the whole series (with the exception of intel-ci part).
> > 
> > Pushed.
> > 
> > @Petri, @Michał - thank you for review.
> > 
> > @Lakshmi:
> > - please open a new bug for the issue reported by the igt@core 
> > _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> > - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> > 
> > Thanks,
> > Janusz
> > 
> > 
> > > -Michał
> > > 
> > > > @Tvrtko: As I already asked before, please support my attempt to 
> > > > remove the unbind test variants from the blocklist.
> > > > 
> > > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > > obtained on Trybot, please give me your green light for merging 
> > > > this series if you have no objections.
> > > > 
> > > > Thanks,
> > > > Janusz
> > > > 
> > > > Janusz Krzysztofik (24):
> > > >   tests/core_hotunplug: Use igt_assert_fd()
> > > >   tests/core_hotunplug: Constify dev_bus_addr string
> > > >   tests/core_hotunplug: Clean up device open error handling
> > > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > > >   tests/core_hotunplug: Assert successful device filter application
> > > >   tests/core_hotunplug: Maintain a single data structure instance
> > > >   tests/core_hotunplug: Pass errors via a data structure field
> > > >   tests/core_hotunplug: Handle device close errors
> > > >   tests/core_hotunplug: Prepare invariant data once per test run
> > > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > > >   tests/core_hotunplug: Recover from subtest failures
> > > >   tests/core_hotunplug: Fail subtests on device close errors
> > > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > > >     operations
> > > >   tests/core_hotunplug: Process return values of sysfs operations
> > > >   tests/core_hotunplug: Assert expected device presence/absence
> > > >   tests/core_hotunplug: Explicitly ignore unused return values
> > > >   tests/core_hotunplug: Also check health of render device node
> > > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > > >   tests/core_hotunplug: Check health both before and after late close
> > > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > > 
> > > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > > >  tests/intel-ci/blacklist.txt          |   2 +-
> > > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > > 
> > > > --
> > > > 2.21.1
> > > > 
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2020-09-16  7:59 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 10:30 [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
2020-09-11 10:30 ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 01/24] tests/core_hotunplug: Use igt_assert_fd() Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 02/24] tests/core_hotunplug: Constify dev_bus_addr string Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 03/24] tests/core_hotunplug: Clean up device open error handling Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 04/24] tests/core_hotunplug: Consolidate duplicated debug messages Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 05/24] tests/core_hotunplug: Assert successful device filter application Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 06/24] tests/core_hotunplug: Maintain a single data structure instance Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 07/24] tests/core_hotunplug: Pass errors via a data structure field Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 08/24] tests/core_hotunplug: Handle device close errors Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 09/24] tests/core_hotunplug: Prepare invariant data once per test run Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 10/24] tests/core_hotunplug: Skip selectively on sysfs close errors Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 11/24] tests/core_hotunplug: Recover from subtest failures Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 12/24] tests/core_hotunplug: Fail subtests on device close errors Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 13/24] tests/core_hotunplug: Let the driver time out essential sysfs operations Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 14/24] tests/core_hotunplug: Process return values of " Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 15/24] tests/core_hotunplug: Assert expected device presence/absence Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 16/24] tests/core_hotunplug: Explicitly ignore unused return values Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 17/24] tests/core_hotunplug: Also check health of render device node Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 18/24] tests/core_hotunplug: More thorough i915 healthcheck and recovery Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 19/24] tests/core_hotunplug: Add 'lateclose before restore' variants Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 20/24] tests/core_hotunplug: Check health both before and after late close Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 21/24] tests/core_hotunplug: HSW/BDW audio issue workaround Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 12:22   ` [Intel-gfx] " Petri Latvala
2020-09-11 12:22     ` [igt-dev] " Petri Latvala
2020-09-11 13:15     ` [Intel-gfx] " Janusz Krzysztofik
2020-09-11 13:15       ` [igt-dev] " Janusz Krzysztofik
2020-09-11 14:17       ` [Intel-gfx] " Petri Latvala
2020-09-11 14:17         ` [igt-dev] " Petri Latvala
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 22/24] tests/core_hotunplug: Duplicate debug messages in dmesg Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 23/24] tests/core_hotunplug: Un-blocklist *bind* subtests Janusz Krzysztofik
2020-09-11 11:51   ` Petri Latvala
2020-09-11 11:51     ` [igt-dev] " Petri Latvala
2020-09-11 12:00     ` [Intel-gfx] " Janusz Krzysztofik
2020-09-11 12:00       ` [igt-dev] " Janusz Krzysztofik
2020-09-11 14:20       ` [Intel-gfx] " Petri Latvala
2020-09-11 14:20         ` [igt-dev] " Petri Latvala
2020-09-11 10:30 ` [Intel-gfx] [PATCH i-g-t v6 24/24] tests/core_hotunplug: Add unbind-rebind subtest to BAT scope Janusz Krzysztofik
2020-09-11 10:30   ` [igt-dev] " Janusz Krzysztofik
2020-09-11 11:52   ` [Intel-gfx] " Petri Latvala
2020-09-11 11:52     ` [igt-dev] " Petri Latvala
2020-09-11 12:01     ` [Intel-gfx] " Janusz Krzysztofik
2020-09-11 12:01       ` [igt-dev] " Janusz Krzysztofik
2020-09-11 11:24 ` [igt-dev] ✓ Fi.CI.BAT: success for tests/core_hotunplug: Fixes and enhancements (rev6) Patchwork
2020-09-11 14:18   ` Petri Latvala
2020-09-11 14:15 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
2020-09-14 18:18 ` [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements Michał Winiarski
2020-09-14 18:18   ` [igt-dev] " Michał Winiarski
2020-09-14 19:30   ` Janusz Krzysztofik
2020-09-14 19:30     ` [igt-dev] " Janusz Krzysztofik
2020-09-14 20:43     ` Vudum, Lakshminarayana
2020-09-14 20:43       ` [igt-dev] " Vudum, Lakshminarayana
2020-09-15  7:47       ` Janusz Krzysztofik
2020-09-15  7:47         ` [igt-dev] " Janusz Krzysztofik
2020-09-15 15:39         ` Vudum, Lakshminarayana
2020-09-15 15:39           ` [igt-dev] " Vudum, Lakshminarayana
2020-09-16  7:59           ` Janusz Krzysztofik
2020-09-16  7:59             ` [igt-dev] " Janusz Krzysztofik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.