All of lore.kernel.org
 help / color / mirror / Atom feed
* [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
@ 2019-03-28 16:47 Janusz Krzysztofik
  2019-03-28 17:20 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2019-03-28 16:47 UTC (permalink / raw)
  To: igt-dev; +Cc: simon.b.lee

Run a dummy load in background to put some workload on a device, then try
to either remove (unplug) the device from its bus, or unbind the device's
driver from it, depending on which subtest has been selected.

The driver hot unbind / device hot unplug operation is expected to succeed
in a reasonable time, however long timeouts are used to allow kernel
level timeouts pop up first if any.

Please note that if running both subtests consecutively, the second one
is always skipped as the device is no longer available as soon as the
first subtest is completed.  The most reliable way to run another subtest
is to reboot the system first, then select next subtest to be run.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
---
 tests/Makefile.sources |   1 +
 tests/drm_hot_remove.c | 106 +++++++++++++++++++++++++++++++++++++++++
 tests/meson.build      |   1 +
 3 files changed, 108 insertions(+)
 create mode 100644 tests/drm_hot_remove.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 71ccf00a..0f194e2c 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -18,6 +18,7 @@ TESTS_progs = \
 	core_getversion \
 	core_setmaster_vs_auth \
 	debugfs_test \
+	drm_hot_remove \
 	drm_import_export \
 	drm_mm \
 	drm_read \
diff --git a/tests/drm_hot_remove.c b/tests/drm_hot_remove.c
new file mode 100644
index 00000000..fbb033b4
--- /dev/null
+++ b/tests/drm_hot_remove.c
@@ -0,0 +1,106 @@
+/*
+ * Copyright © 2019 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "igt.h"
+#include "igt_device.h"
+#include "igt_dummyload.h"
+#include "igt_sysfs.h"
+
+#include <getopt.h>
+#include <limits.h>
+#include <string.h>
+#include <unistd.h>
+
+
+static void driver_unbind(int device)
+{
+	char path[PATH_MAX], *dev_bus_addr;
+	int dir, len;
+
+	dir = igt_sysfs_open(device);
+
+	len = readlinkat(dir, "device", path, sizeof(path));
+	path[len] = '\0';
+	dev_bus_addr = strrchr(path, '/') + 1;
+
+	igt_sysfs_set(dir, "device/driver/unbind", dev_bus_addr);
+
+	close(dir);
+}
+
+static void device_unplug(int device)
+{
+	int dir;
+
+	dir = igt_sysfs_open(device);
+
+	igt_sysfs_set(dir, "device/remove", "1");
+
+	close(dir);
+}
+
+igt_main {
+	int device;
+
+	igt_fixture {
+		device = __drm_open_driver(DRIVER_ANY);
+	}
+
+	igt_subtest("unplug") {
+		igt_spin_t *spin;
+
+		/* Verify if a suitable DRM device/driver exists */
+		igt_skip_on(device < 0);
+
+		/* Run background workload */
+		spin = igt_spin_batch_new(device, .flags = IGT_SPIN_POLL_RUN);
+		igt_spin_busywait_until_running(spin);
+
+		/* Make the device disappear */
+		igt_set_timeout(60, "Device unplug timeout!");
+		device_unplug(device);
+		igt_reset_timeout();
+
+		close(device);
+		device = -1;
+	}
+
+	igt_subtest("unbind") {
+		igt_spin_t *spin;
+
+		/* Verify if a suitable DRM device/driver exists */
+		igt_skip_on(device < 0);
+
+		/* Run background workload */
+		spin = igt_spin_batch_new(device, .flags = IGT_SPIN_POLL_RUN);
+		igt_spin_busywait_until_running(spin);
+
+		/* Unbind the driver */
+		igt_set_timeout(60, "Driver unbind timeout!");
+		driver_unbind(device);
+		igt_reset_timeout();
+
+		close(device);
+		device = -1;
+	}
+}
diff --git a/tests/meson.build b/tests/meson.build
index 9015f809..379addb1 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -5,6 +5,7 @@ test_progs = [
 	'core_getversion',
 	'core_setmaster_vs_auth',
 	'debugfs_test',
+	'drm_hot_remove',
 	'drm_import_export',
 	'drm_mm',
 	'drm_read',
-- 
2.20.1

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for tests: Add a new test for driver/device hot remove
  2019-03-28 16:47 [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove Janusz Krzysztofik
@ 2019-03-28 17:20 ` Patchwork
  2019-03-29  3:33 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
  2019-03-29 10:47 ` [igt-dev] [PATCH] " Petri Latvala
  2 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2019-03-28 17:20 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev

== Series Details ==

Series: tests: Add a new test for driver/device hot remove
URL   : https://patchwork.freedesktop.org/series/58681/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5831 -> IGTPW_2726
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/58681/revisions/1/mbox/

Known issues
------------

  Here are the changes found in IGTPW_2726 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_uncore:
    - fi-ivb-3770:        PASS -> DMESG-FAIL [fdo#110210]

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         PASS -> FAIL [fdo#103182] +1

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362] +1

  
#### Possible fixes ####

  * igt@i915_selftest@live_contexts:
    - fi-bdw-gvtdvm:      DMESG-FAIL [fdo#110235 ] -> PASS

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         INCOMPLETE [fdo#103927] / [fdo#109720] -> PASS

  
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#109720]: https://bugs.freedesktop.org/show_bug.cgi?id=109720
  [fdo#110210]: https://bugs.freedesktop.org/show_bug.cgi?id=110210
  [fdo#110235 ]: https://bugs.freedesktop.org/show_bug.cgi?id=110235 


Participating hosts (43 -> 39)
------------------------------

  Missing    (4): fi-kbl-soraka fi-ilk-m540 fi-bsw-cyan fi-hsw-4200u 


Build changes
-------------

    * IGT: IGT_4911 -> IGTPW_2726

  CI_DRM_5831: 8cac0cc264d2a6af0b33370b542b12d516e022c5 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_2726: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2726/
  IGT_4911: d9fe699ea45406e279b78d1afdb4d57a205a3c99 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools



== Testlist changes ==

+igt@drm_hot_remove@unbind
+igt@drm_hot_remove@unplug

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2726/
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: failure for tests: Add a new test for driver/device hot remove
  2019-03-28 16:47 [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove Janusz Krzysztofik
  2019-03-28 17:20 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
@ 2019-03-29  3:33 ` Patchwork
  2019-03-29 10:47 ` [igt-dev] [PATCH] " Petri Latvala
  2 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2019-03-29  3:33 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev

== Series Details ==

Series: tests: Add a new test for driver/device hot remove
URL   : https://patchwork.freedesktop.org/series/58681/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5831_full -> IGTPW_2726_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_2726_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_2726_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/58681/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_2726_full:

### IGT changes ###

#### Possible regressions ####

  * {igt@drm_hot_remove@unbind} (NEW):
    - shard-snb:          NOTRUN -> DMESG-WARN +1

  * {igt@drm_hot_remove@unplug} (NEW):
    - shard-hsw:          NOTRUN -> DMESG-WARN +1

  * igt@runner@aborted:
    - shard-hsw:          NOTRUN -> ( 2 FAIL )
    - shard-snb:          NOTRUN -> ( 2 FAIL )

  
New tests
---------

  New tests have been introduced between CI_DRM_5831_full and IGTPW_2726_full:

### New IGT tests (2) ###

  * igt@drm_hot_remove@unbind:
    - Statuses : 2 dmesg-warn(s) 3 incomplete(s)
    - Exec time: [0.0, 0.61] s

  * igt@drm_hot_remove@unplug:
    - Statuses : 2 dmesg-warn(s) 3 incomplete(s)
    - Exec time: [0.0, 0.93] s

  

Known issues
------------

  Here are the changes found in IGTPW_2726_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * {igt@drm_hot_remove@unbind} (NEW):
    - shard-apl:          NOTRUN -> INCOMPLETE [fdo#103927] +1
    - shard-glk:          NOTRUN -> INCOMPLETE [fdo#103359] / [k.org#198133] +1

  * {igt@drm_hot_remove@unplug} (NEW):
    - shard-kbl:          NOTRUN -> INCOMPLETE [fdo#103665] +1

  * igt@gem_tiled_swapping@non-threaded:
    - shard-hsw:          PASS -> FAIL [fdo#108686]

  * igt@kms_atomic_transition@6x-modeset-transitions-fencing:
    - shard-glk:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing:
    - shard-apl:          PASS -> INCOMPLETE [fdo#103927]

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
    - shard-kbl:          PASS -> DMESG-WARN [fdo#110222]

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-snb:          PASS -> DMESG-WARN [fdo#110222]

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-f:
    - shard-apl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_cursor_legacy@cursorb-vs-flipb-legacy:
    - shard-hsw:          PASS -> INCOMPLETE [fdo#103540]

  * igt@kms_fbcon_fbt@fbc:
    - shard-glk:          NOTRUN -> SKIP [fdo#109271] +13

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-onoff:
    - shard-apl:          PASS -> FAIL [fdo#103167]
    - shard-kbl:          PASS -> FAIL [fdo#103167]

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparant-fb:
    - shard-apl:          NOTRUN -> FAIL [fdo#108145]

  * igt@kms_psr@basic:
    - shard-apl:          NOTRUN -> SKIP [fdo#109271] +15

  * igt@kms_rotation_crc@multiplane-rotation-cropping-top:
    - shard-kbl:          PASS -> FAIL [fdo#109016]

  
#### Possible fixes ####

  * igt@i915_pm_rc6_residency@rc6-accuracy:
    - shard-snb:          SKIP [fdo#109271] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
    - shard-snb:          DMESG-WARN [fdo#110222] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
    - shard-hsw:          DMESG-WARN [fdo#110222] -> PASS

  * igt@kms_cursor_crc@cursor-128x128-dpms:
    - shard-kbl:          FAIL [fdo#103232] -> PASS
    - shard-apl:          FAIL [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-64x64-dpms:
    - shard-glk:          FAIL [fdo#103232] -> PASS

  * igt@kms_draw_crc@draw-method-xrgb8888-render-xtiled:
    - shard-glk:          FAIL [fdo#107791] -> PASS

  * igt@kms_flip@flip-vs-suspend:
    - shard-kbl:          DMESG-WARN [fdo#108566] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-cpu:
    - shard-apl:          FAIL [fdo#103167] -> PASS
    - shard-kbl:          FAIL [fdo#103167] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-plflip-blt:
    - shard-glk:          FAIL [fdo#103167] -> PASS

  * igt@kms_plane_scaling@pipe-c-scaler-with-rotation:
    - shard-glk:          SKIP [fdo#109271] / [fdo#109278] -> PASS

  * igt@kms_sequence@queue-idle:
    - shard-apl:          INCOMPLETE [fdo#103927] -> PASS

  * igt@kms_vblank@pipe-a-ts-continuation-modeset:
    - shard-apl:          FAIL [fdo#104894] -> PASS
    - shard-kbl:          FAIL [fdo#104894] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104894]: https://bugs.freedesktop.org/show_bug.cgi?id=104894
  [fdo#107791]: https://bugs.freedesktop.org/show_bug.cgi?id=107791
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108686]: https://bugs.freedesktop.org/show_bug.cgi?id=108686
  [fdo#109016]: https://bugs.freedesktop.org/show_bug.cgi?id=109016
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#110222]: https://bugs.freedesktop.org/show_bug.cgi?id=110222
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (10 -> 5)
------------------------------

  Missing    (5): shard-skl pig-hsw-4770r pig-glk-j5005 shard-iclb pig-skl-6260u 


Build changes
-------------

    * IGT: IGT_4911 -> IGTPW_2726
    * Piglit: piglit_4509 -> None

  CI_DRM_5831: 8cac0cc264d2a6af0b33370b542b12d516e022c5 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_2726: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2726/
  IGT_4911: d9fe699ea45406e279b78d1afdb4d57a205a3c99 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2726/
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-03-28 16:47 [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove Janusz Krzysztofik
  2019-03-28 17:20 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
  2019-03-29  3:33 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
@ 2019-03-29 10:47 ` Petri Latvala
  2019-03-29 12:06   ` Krzysztofik, Janusz
  2019-04-01  7:22   ` Daniel Vetter
  2 siblings, 2 replies; 10+ messages in thread
From: Petri Latvala @ 2019-03-29 10:47 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, simon.b.lee

On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> Run a dummy load in background to put some workload on a device, then try
> to either remove (unplug) the device from its bus, or unbind the device's
> driver from it, depending on which subtest has been selected.
> 
> The driver hot unbind / device hot unplug operation is expected to succeed
> in a reasonable time, however long timeouts are used to allow kernel
> level timeouts pop up first if any.
> 
> Please note that if running both subtests consecutively, the second one
> is always skipped as the device is no longer available as soon as the
> first subtest is completed.  The most reliable way to run another subtest
> is to reboot the system first, then select next subtest to be run.

This is a requirement that won't fly for CI use. Is the
rebinding/whatever of the device not possible to do? By the test?

Does it also apply to running other test binaries after running the
first subtest? As in, is it just a timing issue?


> +	igt_subtest("unplug") {
> +		igt_spin_t *spin;
> +
> +		/* Verify if a suitable DRM device/driver exists */
> +		igt_skip_on(device < 0);
> +
> +		/* Run background workload */
> +		spin = igt_spin_batch_new(device, .flags = IGT_SPIN_POLL_RUN);


igt_spin_batch_new requires DRIVER_INTEL, doesn't it?


> +		igt_spin_busywait_until_running(spin);
> +
> +		/* Make the device disappear */
> +		igt_set_timeout(60, "Device unplug timeout!");
> +		device_unplug(device);
> +		igt_reset_timeout();
> +
> +		close(device);
> +		device = -1;
> +	}
> +
> +	igt_subtest("unbind") {
> +		igt_spin_t *spin;
> +
> +		/* Verify if a suitable DRM device/driver exists */
> +		igt_skip_on(device < 0);


Ah, I see. The skip if running consecutively is because the previous
subtest closed the fd.

Another fixture before this subtest, get the device plugged back,
reopen driver?



-- 
Petri Latvala
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-03-29 10:47 ` [igt-dev] [PATCH] " Petri Latvala
@ 2019-03-29 12:06   ` Krzysztofik, Janusz
  2019-04-01  7:22   ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Krzysztofik, Janusz @ 2019-03-29 12:06 UTC (permalink / raw)
  To: Latvala, Petri; +Cc: igt-dev, Lee, Simon B

Hi Petri,

On Friday, March 29, 2019 11:47:32 AM CET Petri Latvala wrote:
> On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > Run a dummy load in background to put some workload on a device, then try
> > to either remove (unplug) the device from its bus, or unbind the device's
> > driver from it, depending on which subtest has been selected.
> > 
> > The driver hot unbind / device hot unplug operation is expected to succeed
> > in a reasonable time, however long timeouts are used to allow kernel
> > level timeouts pop up first if any.
> > 
> > Please note that if running both subtests consecutively, the second one
> > is always skipped as the device is no longer available as soon as the
> > first subtest is completed.  The most reliable way to run another subtest
> > is to reboot the system first, then select next subtest to be run.
> 
> This is a requirement that won't fly for CI use.

As far as I know, selftests are used in a similar way, requiring reboot after 
each subtest for reliable results.

> Is the
> rebinding/whatever of the device not possible to do? By the test?

See below.

> Does it also apply to running other test binaries after running the
> first subtest?

Yes, it does.

> As in, is it just a timing issue?

See below.

> > +	igt_subtest("unplug") {
> > +		igt_spin_t *spin;
> > +
> > +		/* Verify if a suitable DRM device/driver exists */
> > +		igt_skip_on(device < 0);
> > +
> > +		/* Run background workload */
> > +		spin = igt_spin_batch_new(device, .flags = 
IGT_SPIN_POLL_RUN);
> 
> igt_spin_batch_new requires DRIVER_INTEL, doesn't it?

Yes, it does.  The test will be just skipped on non-Intel drivers, I believe, 
unless they provide their own igt_spin_batch_new implementation.

> > +		igt_spin_busywait_until_running(spin);
> > +
> > +		/* Make the device disappear */
> > +		igt_set_timeout(60, "Device unplug timeout!");
> > +		device_unplug(device);
> > +		igt_reset_timeout();
> > +
> > +		close(device);
> > +		device = -1;
> > +	}
> > +
> > +	igt_subtest("unbind") {
> > +		igt_spin_t *spin;
> > +
> > +		/* Verify if a suitable DRM device/driver exists */
> > +		igt_skip_on(device < 0);
> 
> Ah, I see. The skip if running consecutively is because the previous
> subtest closed the fd.

There is no point in keeping it open as it is associated with a device which 
no longer exists.

> Another fixture before this subtest, get the device plugged back,
> reopen driver?

What should happen if that fails?  Or how to verify health of the re-plugged 
or re-bound device?

I think the problem is, a device which has just been hot removed only 
virtually and can't be reinitialized from scratch while being physically re-
plugged back may be in an unpredictable state and may not initialize correctly 
with the driver again.  If I tried to do that, I think I would extend the 
scope of this test with a check how rebinding the driver to a device in an 
unknown state works.  That's a candidate for a separate test, I believe.

I think it's better to explicitly require reboot for the device to re-appear 
than risk other (sub)tests are affected.

Thanks,
Janusz
--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-03-29 10:47 ` [igt-dev] [PATCH] " Petri Latvala
  2019-03-29 12:06   ` Krzysztofik, Janusz
@ 2019-04-01  7:22   ` Daniel Vetter
  2019-04-01  7:41     ` Daniel Vetter
  2019-04-01  8:55     ` Krzysztofik, Janusz
  1 sibling, 2 replies; 10+ messages in thread
From: Daniel Vetter @ 2019-04-01  7:22 UTC (permalink / raw)
  To: Janusz Krzysztofik, igt-dev, simon.b.lee

On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > Run a dummy load in background to put some workload on a device, then try
> > to either remove (unplug) the device from its bus, or unbind the device's
> > driver from it, depending on which subtest has been selected.
> > 
> > The driver hot unbind / device hot unplug operation is expected to succeed
> > in a reasonable time, however long timeouts are used to allow kernel
> > level timeouts pop up first if any.
> > 
> > Please note that if running both subtests consecutively, the second one
> > is always skipped as the device is no longer available as soon as the
> > first subtest is completed.  The most reliable way to run another subtest
> > is to reboot the system first, then select next subtest to be run.
> 
> This is a requirement that won't fly for CI use. Is the
> rebinding/whatever of the device not possible to do? By the test?
> 
> Does it also apply to running other test binaries after running the
> first subtest? As in, is it just a timing issue?

Yeah like the module reload testcase this testcase needs to restore the
driver to working state afterwards. I think there's a corresponding rebind
sysfs file too.

btw since you started with this, bunch more subtests we need to validate
this:
- unbind while an ioctl is called. Especially fun with all the ones that
  can block. We need a subtest for each ioctl to make sure stuff works
  correctly. At least, complex ioctl probably need more than that.

- unplug while atomic commit pending

- unplug with dma-buf exported, with various external users doing stuff to
  that dma-buf

- unplug with dma-fence exported (either as syncpt, drm_syncobj or
  implicitly attached to a dma-buf), this one probably needs lots of
  a few variants to cover everything.

There's probably a lot more (you should discover endless amounts of oops,
this area is full of bugs), but this should at least get you started.

And you probably want an entire team of engineers fixing the kernel bugs
you uncover :-/

Cheers, Daniel

> > +	igt_subtest("unplug") {
> > +		igt_spin_t *spin;
> > +
> > +		/* Verify if a suitable DRM device/driver exists */
> > +		igt_skip_on(device < 0);
> > +
> > +		/* Run background workload */
> > +		spin = igt_spin_batch_new(device, .flags = IGT_SPIN_POLL_RUN);
> 
> 
> igt_spin_batch_new requires DRIVER_INTEL, doesn't it?
> 
> 
> > +		igt_spin_busywait_until_running(spin);
> > +
> > +		/* Make the device disappear */
> > +		igt_set_timeout(60, "Device unplug timeout!");
> > +		device_unplug(device);
> > +		igt_reset_timeout();
> > +
> > +		close(device);
> > +		device = -1;
> > +	}
> > +
> > +	igt_subtest("unbind") {
> > +		igt_spin_t *spin;
> > +
> > +		/* Verify if a suitable DRM device/driver exists */
> > +		igt_skip_on(device < 0);
> 
> 
> Ah, I see. The skip if running consecutively is because the previous
> subtest closed the fd.
> 
> Another fixture before this subtest, get the device plugged back,
> reopen driver?
> 
> 
> 
> -- 
> Petri Latvala
> _______________________________________________
> igt-dev mailing list
> igt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/igt-dev

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-04-01  7:22   ` Daniel Vetter
@ 2019-04-01  7:41     ` Daniel Vetter
  2019-04-01  8:55     ` Krzysztofik, Janusz
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Vetter @ 2019-04-01  7:41 UTC (permalink / raw)
  To: Janusz Krzysztofik, igt-dev, simon.b.lee

On Mon, Apr 01, 2019 at 09:22:34AM +0200, Daniel Vetter wrote:
> On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> > On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > > Run a dummy load in background to put some workload on a device, then try
> > > to either remove (unplug) the device from its bus, or unbind the device's
> > > driver from it, depending on which subtest has been selected.
> > > 
> > > The driver hot unbind / device hot unplug operation is expected to succeed
> > > in a reasonable time, however long timeouts are used to allow kernel
> > > level timeouts pop up first if any.
> > > 
> > > Please note that if running both subtests consecutively, the second one
> > > is always skipped as the device is no longer available as soon as the
> > > first subtest is completed.  The most reliable way to run another subtest
> > > is to reboot the system first, then select next subtest to be run.
> > 
> > This is a requirement that won't fly for CI use. Is the
> > rebinding/whatever of the device not possible to do? By the test?
> > 
> > Does it also apply to running other test binaries after running the
> > first subtest? As in, is it just a timing issue?
> 
> Yeah like the module reload testcase this testcase needs to restore the
> driver to working state afterwards. I think there's a corresponding rebind
> sysfs file too.
> 
> btw since you started with this, bunch more subtests we need to validate
> this:
> - unbind while an ioctl is called. Especially fun with all the ones that
>   can block. We need a subtest for each ioctl to make sure stuff works
>   correctly. At least, complex ioctl probably need more than that.
> 
> - unplug while atomic commit pending
> 
> - unplug with dma-buf exported, with various external users doing stuff to
>   that dma-buf
> 
> - unplug with dma-fence exported (either as syncpt, drm_syncobj or
>   implicitly attached to a dma-buf), this one probably needs lots of
>   a few variants to cover everything.

- unplug with multiple busy contexts (from multiple fd) active, to
  exercise scheduler paths.

> There's probably a lot more (you should discover endless amounts of oops,
> this area is full of bugs), but this should at least get you started.
> 
> And you probably want an entire team of engineers fixing the kernel bugs
> you uncover :-/
> 
> Cheers, Daniel
> 
> > > +	igt_subtest("unplug") {
> > > +		igt_spin_t *spin;
> > > +
> > > +		/* Verify if a suitable DRM device/driver exists */
> > > +		igt_skip_on(device < 0);
> > > +
> > > +		/* Run background workload */
> > > +		spin = igt_spin_batch_new(device, .flags = IGT_SPIN_POLL_RUN);
> > 
> > 
> > igt_spin_batch_new requires DRIVER_INTEL, doesn't it?
> > 
> > 
> > > +		igt_spin_busywait_until_running(spin);
> > > +
> > > +		/* Make the device disappear */
> > > +		igt_set_timeout(60, "Device unplug timeout!");
> > > +		device_unplug(device);
> > > +		igt_reset_timeout();
> > > +
> > > +		close(device);
> > > +		device = -1;
> > > +	}
> > > +
> > > +	igt_subtest("unbind") {
> > > +		igt_spin_t *spin;
> > > +
> > > +		/* Verify if a suitable DRM device/driver exists */
> > > +		igt_skip_on(device < 0);
> > 
> > 
> > Ah, I see. The skip if running consecutively is because the previous
> > subtest closed the fd.
> > 
> > Another fixture before this subtest, get the device plugged back,
> > reopen driver?
> > 
> > 
> > 
> > -- 
> > Petri Latvala
> > _______________________________________________
> > igt-dev mailing list
> > igt-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/igt-dev
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-04-01  7:22   ` Daniel Vetter
  2019-04-01  7:41     ` Daniel Vetter
@ 2019-04-01  8:55     ` Krzysztofik, Janusz
  2019-04-01 10:16       ` Petri Latvala
  2019-04-02  8:47       ` Daniel Vetter
  1 sibling, 2 replies; 10+ messages in thread
From: Krzysztofik, Janusz @ 2019-04-01  8:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: igt-dev, Lee, Simon B

On Monday, April 1, 2019 9:22:34 AM CEST Daniel Vetter wrote:
> On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> > On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > > Run a dummy load in background to put some workload on a device, then
> > > try
> > > to either remove (unplug) the device from its bus, or unbind the
> > > device's
> > > driver from it, depending on which subtest has been selected.
> > > 
> > > The driver hot unbind / device hot unplug operation is expected to
> > > succeed
> > > in a reasonable time, however long timeouts are used to allow kernel
> > > level timeouts pop up first if any.
> > > 
> > > Please note that if running both subtests consecutively, the second one
> > > is always skipped as the device is no longer available as soon as the
> > > first subtest is completed.  The most reliable way to run another
> > > subtest
> > > is to reboot the system first, then select next subtest to be run.
> > 
> > This is a requirement that won't fly for CI use. Is the
> > rebinding/whatever of the device not possible to do? By the test?
> > 
> > Does it also apply to running other test binaries after running the
> > first subtest? As in, is it just a timing issue?
> 
> Yeah like the module reload testcase this testcase needs to restore the
> driver to working state afterwards. I think there's a corresponding rebind
> sysfs file too.

I see your point, however I don't see how that could be done.

I think we can't say that the module reload test restores the driver to 
working state.  It only TRIES to to that, and that's the merit of that test to 
check if module reload succeeds or not.  I think there is no way to implement 
"restore the driver to working state" that would work under all circumstances, 
even if there is something wrong (a bug?) that causes it failing.  In other 
words, I think a user should never assume the driver is in working state after 
the i915_module_load test is run as that operation may simply fail.  The user 
should look at the test result to learn about the driver state.

How the unplug/unbind test should proceed if it occurs there is something 
wrong after module reload?  FAIL? SKIP? Or still SUCCESS, assuming unplug 
itself succeeds? How the test should pass the information on module reload 
results to a user? If we start asserting results of module reload to ensure 
the driver is in working state, that will be a different test, not the 
intended one, I believe.

The best scenario for CI I can see is to schedule i915_module_load just 
following drm_hot_remove.

Am I missing something?

> btw since you started with this, bunch more subtests we need to validate
> this:
> - unbind while an ioctl is called. Especially fun with all the ones that
>   can block. We need a subtest for each ioctl to make sure stuff works
>   correctly. At least, complex ioctl probably need more than that.
> 
> - unplug while atomic commit pending
> 
> - unplug with dma-buf exported, with various external users doing stuff to
>   that dma-buf
> 
> - unplug with dma-fence exported (either as syncpt, drm_syncobj or
>   implicitly attached to a dma-buf), this one probably needs lots of
>   a few variants to cover everything.
> 
> There's probably a lot more (you should discover endless amounts of oops,
> this area is full of bugs), but this should at least get you started.

Many thanks for this list. As an i915 beginner, I was really lost while trying 
to define workloads which are worth of testing.  I'll probably ask some more 
question later if I'm in a position to continue this effort (see below ;-) ).

> And you probably want an entire team of engineers fixing the kernel bugs
> you uncover :-/

:-) I think that's still worth of effort, anyway ;-)

Thanks,
Janusz

> 
> Cheers, Daniel
> 
> > > +	igt_subtest("unplug") {
> > > +		igt_spin_t *spin;
> > > +
> > > +		/* Verify if a suitable DRM device/driver exists */
> > > +		igt_skip_on(device < 0);
> > > +
> > > +		/* Run background workload */
> > > +		spin = igt_spin_batch_new(device, .flags = 
IGT_SPIN_POLL_RUN);
> > 
> > igt_spin_batch_new requires DRIVER_INTEL, doesn't it?
> > 
> > > +		igt_spin_busywait_until_running(spin);
> > > +
> > > +		/* Make the device disappear */
> > > +		igt_set_timeout(60, "Device unplug timeout!");
> > > +		device_unplug(device);
> > > +		igt_reset_timeout();
> > > +
> > > +		close(device);
> > > +		device = -1;
> > > +	}
> > > +
> > > +	igt_subtest("unbind") {
> > > +		igt_spin_t *spin;
> > > +
> > > +		/* Verify if a suitable DRM device/driver exists */
> > > +		igt_skip_on(device < 0);
> > 
> > Ah, I see. The skip if running consecutively is because the previous
> > subtest closed the fd.
> > 
> > Another fixture before this subtest, get the device plugged back,
> > reopen driver?

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-04-01  8:55     ` Krzysztofik, Janusz
@ 2019-04-01 10:16       ` Petri Latvala
  2019-04-02  8:47       ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Petri Latvala @ 2019-04-01 10:16 UTC (permalink / raw)
  To: Krzysztofik, Janusz; +Cc: igt-dev, Lee, Simon B, Daniel Vetter

On Mon, Apr 01, 2019 at 08:55:37AM +0000, Krzysztofik, Janusz wrote:
> On Monday, April 1, 2019 9:22:34 AM CEST Daniel Vetter wrote:
> > On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> > > On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > > > Run a dummy load in background to put some workload on a device, then
> > > > try
> > > > to either remove (unplug) the device from its bus, or unbind the
> > > > device's
> > > > driver from it, depending on which subtest has been selected.
> > > > 
> > > > The driver hot unbind / device hot unplug operation is expected to
> > > > succeed
> > > > in a reasonable time, however long timeouts are used to allow kernel
> > > > level timeouts pop up first if any.
> > > > 
> > > > Please note that if running both subtests consecutively, the second one
> > > > is always skipped as the device is no longer available as soon as the
> > > > first subtest is completed.  The most reliable way to run another
> > > > subtest
> > > > is to reboot the system first, then select next subtest to be run.
> > > 
> > > This is a requirement that won't fly for CI use. Is the
> > > rebinding/whatever of the device not possible to do? By the test?
> > > 
> > > Does it also apply to running other test binaries after running the
> > > first subtest? As in, is it just a timing issue?
> > 
> > Yeah like the module reload testcase this testcase needs to restore the
> > driver to working state afterwards. I think there's a corresponding rebind
> > sysfs file too.
> 
> I see your point, however I don't see how that could be done.
> 
> I think we can't say that the module reload test restores the driver to 
> working state.  It only TRIES to to that, and that's the merit of that test to 
> check if module reload succeeds or not.  I think there is no way to implement 
> "restore the driver to working state" that would work under all circumstances, 
> even if there is something wrong (a bug?) that causes it failing.  In other 
> words, I think a user should never assume the driver is in working state after 
> the i915_module_load test is run as that operation may simply fail.  The user 
> should look at the test result to learn about the driver state.
> 

The best scenario is that driver restore is attempted. If it doesn't,
we blacklist this test in CI, or don't merge it. If that works, all is
fine, but we still take the same care with this test as with module
reload tests.

Even the current state of business is that we don't run the module
reloading tests with the sharded runs (Fi.CI.IGT), only as part of BAT
(Fi.CI.BAT), as the last tests to execute. Shards run some of the
selftests, but as a separate shard.

Btw the selftests do _not_ require the machine to be rebooted after
them, or between them. When they succeed, all is fine. FSVO fine. We
run all selftests and then reboot implicitly as they are the last
things to run on a particular kernel. Sometimes we don't. The
requirement for reboot after any selftests that you mentioned in
another mail is due to the particular environment they are run in,
which doesn't cope well with reloading the kernel module.

> How the unplug/unbind test should proceed if it occurs there is something 
> wrong after module reload?  FAIL? SKIP? Or still SUCCESS, assuming unplug 
> itself succeeds? How the test should pass the information on module reload 
> results to a user? If we start asserting results of module reload to ensure 
> the driver is in working state, that will be a different test, not the 
> intended one, I believe.

igt_warn() I'd say.


-- 
Petri Latvala
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove
  2019-04-01  8:55     ` Krzysztofik, Janusz
  2019-04-01 10:16       ` Petri Latvala
@ 2019-04-02  8:47       ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Vetter @ 2019-04-02  8:47 UTC (permalink / raw)
  To: Krzysztofik, Janusz; +Cc: igt-dev, Lee, Simon B, Daniel Vetter

On Mon, Apr 01, 2019 at 08:55:37AM +0000, Krzysztofik, Janusz wrote:
> On Monday, April 1, 2019 9:22:34 AM CEST Daniel Vetter wrote:
> > On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> > > On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > > > Run a dummy load in background to put some workload on a device, then
> > > > try
> > > > to either remove (unplug) the device from its bus, or unbind the
> > > > device's
> > > > driver from it, depending on which subtest has been selected.
> > > > 
> > > > The driver hot unbind / device hot unplug operation is expected to
> > > > succeed
> > > > in a reasonable time, however long timeouts are used to allow kernel
> > > > level timeouts pop up first if any.
> > > > 
> > > > Please note that if running both subtests consecutively, the second one
> > > > is always skipped as the device is no longer available as soon as the
> > > > first subtest is completed.  The most reliable way to run another
> > > > subtest
> > > > is to reboot the system first, then select next subtest to be run.
> > > 
> > > This is a requirement that won't fly for CI use. Is the
> > > rebinding/whatever of the device not possible to do? By the test?
> > > 
> > > Does it also apply to running other test binaries after running the
> > > first subtest? As in, is it just a timing issue?
> > 
> > Yeah like the module reload testcase this testcase needs to restore the
> > driver to working state afterwards. I think there's a corresponding rebind
> > sysfs file too.
> 
> I see your point, however I don't see how that could be done.
> 
> I think we can't say that the module reload test restores the driver to 
> working state.  It only TRIES to to that, and that's the merit of that test to 
> check if module reload succeeds or not.  I think there is no way to implement 
> "restore the driver to working state" that would work under all circumstances, 
> even if there is something wrong (a bug?) that causes it failing.  In other 
> words, I think a user should never assume the driver is in working state after 
> the i915_module_load test is run as that operation may simply fail.  The user 
> should look at the test result to learn about the driver state.
> 
> How the unplug/unbind test should proceed if it occurs there is something 
> wrong after module reload?  FAIL? SKIP? Or still SUCCESS, assuming unplug 
> itself succeeds? How the test should pass the information on module reload 
> results to a user? If we start asserting results of module reload to ensure 
> the driver is in working state, that will be a different test, not the 
> intended one, I believe.
> 
> The best scenario for CI I can see is to schedule i915_module_load just 
> following drm_hot_remove.
> 
> Am I missing something?

We have higher quality standards than "best effort, probably blows up". If
stuff does blow up we have mitigation measures in CI, as Petri explains,
but fundamentally we just need to fix the bugs.

And there's going to be tons of huge bugs in hotunplug of the entire
driver, that's why I joked you need an entire team of engineers helping
you fix stuff. This happened when we started seriously testing module
reload, or gpu reset, or atomic modesetting or any other big feature.

Wrt implementation: i915_hot_remove (it's not really a drm test) needs to
do all that on its own. And if the driver isn't restored (good sanity
check is to issue a batch with an MI write and check it still completes
and does something), then the test should fail. Because in reality if the
user unplugs and replugs, and the gpu/driver don't work, they're not going
to be happy. CI doesn't impose anything stricter here.

> > btw since you started with this, bunch more subtests we need to validate
> > this:
> > - unbind while an ioctl is called. Especially fun with all the ones that
> >   can block. We need a subtest for each ioctl to make sure stuff works
> >   correctly. At least, complex ioctl probably need more than that.
> > 
> > - unplug while atomic commit pending
> > 
> > - unplug with dma-buf exported, with various external users doing stuff to
> >   that dma-buf
> > 
> > - unplug with dma-fence exported (either as syncpt, drm_syncobj or
> >   implicitly attached to a dma-buf), this one probably needs lots of
> >   a few variants to cover everything.
> > 
> > There's probably a lot more (you should discover endless amounts of oops,
> > this area is full of bugs), but this should at least get you started.
> 
> Many thanks for this list. As an i915 beginner, I was really lost while trying 
> to define workloads which are worth of testing.  I'll probably ask some more 
> question later if I'm in a position to continue this effort (see below ;-) ).

btw, can you pls update the JIRA for this (or make one if we don't have
one yet). This is going to be a huge effort, would be good to make sure we
don't forget about any of the corner cases and validation ideas.
-Daniel

> 
> > And you probably want an entire team of engineers fixing the kernel bugs
> > you uncover :-/
> 
> :-) I think that's still worth of effort, anyway ;-)
> 
> Thanks,
> Janusz
> 
> > 
> > Cheers, Daniel
> > 
> > > > +	igt_subtest("unplug") {
> > > > +		igt_spin_t *spin;
> > > > +
> > > > +		/* Verify if a suitable DRM device/driver exists */
> > > > +		igt_skip_on(device < 0);
> > > > +
> > > > +		/* Run background workload */
> > > > +		spin = igt_spin_batch_new(device, .flags = 
> IGT_SPIN_POLL_RUN);
> > > 
> > > igt_spin_batch_new requires DRIVER_INTEL, doesn't it?
> > > 
> > > > +		igt_spin_busywait_until_running(spin);
> > > > +
> > > > +		/* Make the device disappear */
> > > > +		igt_set_timeout(60, "Device unplug timeout!");
> > > > +		device_unplug(device);
> > > > +		igt_reset_timeout();
> > > > +
> > > > +		close(device);
> > > > +		device = -1;
> > > > +	}
> > > > +
> > > > +	igt_subtest("unbind") {
> > > > +		igt_spin_t *spin;
> > > > +
> > > > +		/* Verify if a suitable DRM device/driver exists */
> > > > +		igt_skip_on(device < 0);
> > > 
> > > Ah, I see. The skip if running consecutively is because the previous
> > > subtest closed the fd.
> > > 
> > > Another fixture before this subtest, get the device plugged back,
> > > reopen driver?
> 
> --------------------------------------------------------------------
> 
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
> 
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
> przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
> others is strictly prohibited.
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-02  8:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-28 16:47 [igt-dev] [PATCH] tests: Add a new test for driver/device hot remove Janusz Krzysztofik
2019-03-28 17:20 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
2019-03-29  3:33 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
2019-03-29 10:47 ` [igt-dev] [PATCH] " Petri Latvala
2019-03-29 12:06   ` Krzysztofik, Janusz
2019-04-01  7:22   ` Daniel Vetter
2019-04-01  7:41     ` Daniel Vetter
2019-04-01  8:55     ` Krzysztofik, Janusz
2019-04-01 10:16       ` Petri Latvala
2019-04-02  8:47       ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.