dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result
@ 2017-07-18 15:16 Paul Kocialkowski
  2017-07-18 15:16 ` [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event Paul Kocialkowski
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Paul Kocialkowski @ 2017-07-18 15:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Lyude, Petri Latvala, Daniel Vetter, Arkadiusz Hiler

This patch introduces a workaround for a case where a uevent is issued
by the kernel because of DP link training failing on a connector
unrelated to the current test. Since the test depends on receiving a
hotplug uevent, it previously passed even though it should not have.

False positives also occur due to the plug/unplug events being delayed
and issued at resume time. This is mitigated by catching and flushing
hotplugs everytime a change is made on connectors, but it is not enough
to ensure that all hotplug events were caught and not delayed.

The problem here is that it is not possible to find out the exact reason
why a uevent is issued by the kernel. A possible way to fix this would
be to introduce more fields (the connector name and some reason why the
event is triggered would probably be sufficient).

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event
  2017-07-18 15:16 [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Paul Kocialkowski
@ 2017-07-18 15:16 ` Paul Kocialkowski
  2017-07-18 21:21   ` Chris Wilson
  2017-07-18 15:16 ` [PATCH i-g-t 2/2] tests/chamelium: Catch and flush hotplug uevents after each plug Paul Kocialkowski
  2017-07-18 20:12 ` [Intel-gfx] [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Lyude Paul
  2 siblings, 1 reply; 8+ messages in thread
From: Paul Kocialkowski @ 2017-07-18 15:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Lyude, David Airlie, Daniel Vetter

It may occur that a hotplug uevent is detected at resume, even though it
does not indicate that an actual hotplug happened. This is the case when
link training fails on any other connector.

There is currently no way to distinguish what connector caused a hotplug
uevent, nor what the reason for that uevent really is. This makes it
impossible to find out whether the test actually passed or not.

To circumvent this problem, the link status of each connector is
collected before and after suspend and compared to skip the test if
the state was good before and turned to bad after resume.

This only concerns the EDID change test, where we cannot check the
connector state (that is not supposed to have changed). For actual
hotplug tests, the tests should be safe since they check each
connector's state after receiving the uevent.

The situation described here happens with DP-VGA bridges that fail link
training after resume, as they need some more time to response on their
AUX channel.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@linux.intel.com>
---
 tests/chamelium.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tests/chamelium.c b/tests/chamelium.c
index e26f0557..8af33aaa 100644
--- a/tests/chamelium.c
+++ b/tests/chamelium.c
@@ -87,6 +87,31 @@ get_precalculated_crc(struct chamelium_port *port, int w, int h)
 }
 
 static void
+get_connectors_link_status_failed(data_t *data, bool *link_status_failed)
+{
+	drmModeConnector *connector;
+	uint64_t link_status;
+	drmModePropertyPtr prop;
+	int p;
+
+	for (p = 0; p < data->port_count; p++) {
+		connector = chamelium_port_get_connector(data->chamelium,
+							 data->ports[p], false);
+
+		igt_assert(kmstest_get_property(data->drm_fd,
+						connector->connector_id,
+						DRM_MODE_OBJECT_CONNECTOR,
+						"link-status", NULL,
+						&link_status, &prop));
+
+		link_status_failed[p] = link_status == DRM_MODE_LINK_STATUS_BAD;
+
+		drmModeFreeProperty(prop);
+		drmModeFreeConnector(connector);
+	}
+}
+
+static void
 require_connector_present(data_t *data, unsigned int type)
 {
 	int i;
@@ -310,6 +335,8 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port,
 				int alt_edid_id)
 {
 	struct udev_monitor *mon = igt_watch_hotplug();
+	bool link_status_failed[2][data->port_count];
+	int p;
 
 	reset_state(data, port);
 
@@ -326,8 +353,16 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port,
 	 */
 	chamelium_port_set_edid(data->chamelium, port, alt_edid_id);
 
+	get_connectors_link_status_failed(data, link_status_failed[0]);
+
 	igt_system_suspend_autoresume(state, test);
+
 	igt_assert(igt_hotplug_detected(mon, HOTPLUG_TIMEOUT));
+
+	get_connectors_link_status_failed(data, link_status_failed[1]);
+
+	for (p = 0; p < data->port_count; p++)
+		igt_skip_on(!link_status_failed[0][p] && link_status_failed[1][p]);
 }
 
 static igt_output_t *
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH i-g-t 2/2] tests/chamelium: Catch and flush hotplug uevents after each plug
  2017-07-18 15:16 [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Paul Kocialkowski
  2017-07-18 15:16 ` [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event Paul Kocialkowski
@ 2017-07-18 15:16 ` Paul Kocialkowski
  2017-07-18 20:12 ` [Intel-gfx] [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Lyude Paul
  2 siblings, 0 replies; 8+ messages in thread
From: Paul Kocialkowski @ 2017-07-18 15:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Lyude, Petri Latvala, Paul Kocialkowski, Daniel Vetter, Arkadiusz Hiler

This adds calls to igt_hotplug_detected and igt_flush_hotplugs to catch
and flush hotplugs from connector unplug (due to chamelium reset) and
plug. These need to be intercepted so that they are not delayed and
issued after resume, providing a false positive for the test result.

In addition, the final hotplug uevent flush is brought closer to the
suspend call, to decrease the likeliness of false positive.

However, false positives still do happen, because it is not possible to
make sure that the uevent caused by each connector's state change was
caught instead of being delayed and issued at resume time.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@linux.intel.com>
---
 tests/chamelium.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/chamelium.c b/tests/chamelium.c
index 8af33aaa..0528ffb3 100644
--- a/tests/chamelium.c
+++ b/tests/chamelium.c
@@ -340,12 +340,16 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port,
 
 	reset_state(data, port);
 
+	/* Catch the event and flush all remaining ones. */
+	igt_assert(igt_hotplug_detected(mon, HOTPLUG_TIMEOUT));
+	igt_flush_hotplugs(mon);
+
 	/* First plug in the port */
 	chamelium_port_set_edid(data->chamelium, port, edid_id);
 	chamelium_plug(data->chamelium, port);
-	wait_for_connector(data, port, DRM_MODE_CONNECTED);
+	igt_assert(igt_hotplug_detected(mon, HOTPLUG_TIMEOUT));
 
-	igt_flush_hotplugs(mon);
+	wait_for_connector(data, port, DRM_MODE_CONNECTED);
 
 	/*
 	 * Change the edid before we suspend. On resume, the machine should
@@ -355,6 +359,8 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port,
 
 	get_connectors_link_status_failed(data, link_status_failed[0]);
 
+	igt_flush_hotplugs(mon);
+
 	igt_system_suspend_autoresume(state, test);
 
 	igt_assert(igt_hotplug_detected(mon, HOTPLUG_TIMEOUT));
-- 
2.13.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result
  2017-07-18 15:16 [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Paul Kocialkowski
  2017-07-18 15:16 ` [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event Paul Kocialkowski
  2017-07-18 15:16 ` [PATCH i-g-t 2/2] tests/chamelium: Catch and flush hotplug uevents after each plug Paul Kocialkowski
@ 2017-07-18 20:12 ` Lyude Paul
  2 siblings, 0 replies; 8+ messages in thread
From: Lyude Paul @ 2017-07-18 20:12 UTC (permalink / raw)
  To: Paul Kocialkowski, intel-gfx, dri-devel; +Cc: Daniel Vetter

For the whole series

Reviewed-by: Lyude <lyude@redhat.com>

will push in just a sec

On Tue, 2017-07-18 at 18:16 +0300, Paul Kocialkowski wrote:
> This patch introduces a workaround for a case where a uevent is
> issued
> by the kernel because of DP link training failing on a connector
> unrelated to the current test. Since the test depends on receiving a
> hotplug uevent, it previously passed even though it should not have.
> 
> False positives also occur due to the plug/unplug events being
> delayed
> and issued at resume time. This is mitigated by catching and flushing
> hotplugs everytime a change is made on connectors, but it is not
> enough
> to ensure that all hotplug events were caught and not delayed.
> 
> The problem here is that it is not possible to find out the exact
> reason
> why a uevent is issued by the kernel. A possible way to fix this
> would
> be to introduce more fields (the connector name and some reason why
> the
> event is triggered would probably be sufficient).
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- 
Cheers,
	Lyude
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event
  2017-07-18 15:16 ` [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event Paul Kocialkowski
@ 2017-07-18 21:21   ` Chris Wilson
  2017-07-19  8:31     ` Paul Kocialkowski
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2017-07-18 21:21 UTC (permalink / raw)
  To: Paul Kocialkowski, intel-gfx, dri-devel; +Cc: David Airlie, Daniel Vetter

Quoting Paul Kocialkowski (2017-07-18 16:16:26)
> It may occur that a hotplug uevent is detected at resume, even though it
> does not indicate that an actual hotplug happened. This is the case when
> link training fails on any other connector.
> 
> There is currently no way to distinguish what connector caused a hotplug
> uevent, nor what the reason for that uevent really is. This makes it
> impossible to find out whether the test actually passed or not.

And you may get more than one and then this skips even though the test
passed. Looks like the patch is overcompensating. What you can do is
repeat the test a few times, and then look at all the different errors
you get. If the connector remains (no mst disappareance) once it goes
bad, it should remain bad and so not generate any new uevent. Or you
only repeat the test whilst link_status[old] != link_status[new].
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event
  2017-07-18 21:21   ` Chris Wilson
@ 2017-07-19  8:31     ` Paul Kocialkowski
  2017-07-19 15:47       ` Lyude Paul
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Kocialkowski @ 2017-07-19  8:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, dri-devel; +Cc: David Airlie, Daniel Vetter

On Tue, 2017-07-18 at 22:21 +0100, Chris Wilson wrote:
> Quoting Paul Kocialkowski (2017-07-18 16:16:26)
> > It may occur that a hotplug uevent is detected at resume, even
> > though it
> > does not indicate that an actual hotplug happened. This is the case
> > when
> > link training fails on any other connector.
> > 
> > There is currently no way to distinguish what connector caused a
> > hotplug
> > uevent, nor what the reason for that uevent really is. This makes it
> > impossible to find out whether the test actually passed or not.
> 
> And you may get more than one and then this skips even though the test
> passed. Looks like the patch is overcompensating. What you can do is
> repeat the test a few times, and then look at all the different errors
> you get. If the connector remains (no mst disappareance) once it goes
> bad, it should remain bad and so not generate any new uevent. Or you
> only repeat the test whilst link_status[old] != link_status[new].

I am not sure it is really desirable to repeat the test until we are
fairly certain it succeeds. This involves suspend/resume, that is
already long enough as it is.

Also, a uevent will be generated everytime link training fails,
regardless of whether it was already failing before (I just tested that
to make sure). In my case, it's due to a DP-VGA bridge that will
consistently fail link training in the first seconds after resume.

So this is actually even worse that I thought, because there is no way
to find out that this is why a uevent was generated if the link status
was already bad before.

So I don't see how we can manage with the current information at
disposal.

My main point here is that we need more information about what's going
on than simply "HOTPLUG=1". These patches demonstrate that working
around the lack of information is a pain for testing purposes and can
only leads to semi-working hackish workarounds.

Do you agree that this is what the problem really is?

-- 
Paul Kocialkowski <paul.kocialkowski@linux.intel.com>
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo, Finland
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event
  2017-07-19  8:31     ` Paul Kocialkowski
@ 2017-07-19 15:47       ` Lyude Paul
  0 siblings, 0 replies; 8+ messages in thread
From: Lyude Paul @ 2017-07-19 15:47 UTC (permalink / raw)
  To: Paul Kocialkowski, Chris Wilson, intel-gfx, dri-devel
  Cc: David Airlie, Daniel Vetter

On Wed, 2017-07-19 at 11:31 +0300, Paul Kocialkowski wrote:
> On Tue, 2017-07-18 at 22:21 +0100, Chris Wilson wrote:
> > Quoting Paul Kocialkowski (2017-07-18 16:16:26)
> > > It may occur that a hotplug uevent is detected at resume, even
> > > though it
> > > does not indicate that an actual hotplug happened. This is the
> > > case
> > > when
> > > link training fails on any other connector.
> > > 
> > > There is currently no way to distinguish what connector caused a
> > > hotplug
> > > uevent, nor what the reason for that uevent really is. This makes
> > > it
> > > impossible to find out whether the test actually passed or not.
> > 
> > And you may get more than one and then this skips even though the
> > test
> > passed. Looks like the patch is overcompensating. What you can do
> > is
> > repeat the test a few times, and then look at all the different
> > errors
> > you get. If the connector remains (no mst disappareance) once it
> > goes
> > bad, it should remain bad and so not generate any new uevent. Or
> > you
> > only repeat the test whilst link_status[old] != link_status[new].
> 
> I am not sure it is really desirable to repeat the test until we are
> fairly certain it succeeds. This involves suspend/resume, that is
> already long enough as it is.
> 
> Also, a uevent will be generated everytime link training fails,
> regardless of whether it was already failing before (I just tested
> that
> to make sure). In my case, it's due to a DP-VGA bridge that will
> consistently fail link training in the first seconds after resume.
> 
> So this is actually even worse that I thought, because there is no
> way
> to find out that this is why a uevent was generated if the link
> status
> was already bad before.
> 
> So I don't see how we can manage with the current information at
> disposal.
> 
> My main point here is that we need more information about what's
> going
> on than simply "HOTPLUG=1". These patches demonstrate that working
> around the lack of information is a pain for testing purposes and can
> only leads to semi-working hackish workarounds.
> 
> Do you agree that this is what the problem really is?
Yes, I agree we need more debugging information for when hotplugs fail.
This being said though, the fact that i915 is unconditionally sending
hotplugs on resume (this appears to be a hack that they did add to stop
from missign hotplug events between suspend/resume) is really what's
causing this problem specifically.

We really need the debugging stuff me and martin suggested for the
kernel, and also more drm helpers to actually do edid checks and that
sort of stuff so that we don't have to deal with dirty hacks like this
:\.
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event
@ 2017-07-18 15:15 Paul Kocialkowski
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Kocialkowski @ 2017-07-18 15:15 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Lyude, Petri Latvala, Paul Kocialkowski, Daniel Vetter, Arkadiusz Hiler

It may occur that a hotplug uevent is detected at resume, even though it
does not indicate that an actual hotplug happened. This is the case when
link training fails on any other connector.

There is currently no way to distinguish what connector caused a hotplug
uevent, nor what the reason for that uevent really is. This makes it
impossible to find out whether the test actually passed or not.

To circumvent this problem, the link status of each connector is
collected before and after suspend and compared to skip the test if
the state was good before and turned to bad after resume.

This only concerns the EDID change test, where we cannot check the
connector state (that is not supposed to have changed). For actual
hotplug tests, the tests should be safe since they check each
connector's state after receiving the uevent.

The situation described here happens with DP-VGA bridges that fail link
training after resume, as they need some more time to response on their
AUX channel.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@linux.intel.com>
---
 tests/chamelium.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tests/chamelium.c b/tests/chamelium.c
index e26f0557..8af33aaa 100644
--- a/tests/chamelium.c
+++ b/tests/chamelium.c
@@ -87,6 +87,31 @@ get_precalculated_crc(struct chamelium_port *port, int w, int h)
 }
 
 static void
+get_connectors_link_status_failed(data_t *data, bool *link_status_failed)
+{
+	drmModeConnector *connector;
+	uint64_t link_status;
+	drmModePropertyPtr prop;
+	int p;
+
+	for (p = 0; p < data->port_count; p++) {
+		connector = chamelium_port_get_connector(data->chamelium,
+							 data->ports[p], false);
+
+		igt_assert(kmstest_get_property(data->drm_fd,
+						connector->connector_id,
+						DRM_MODE_OBJECT_CONNECTOR,
+						"link-status", NULL,
+						&link_status, &prop));
+
+		link_status_failed[p] = link_status == DRM_MODE_LINK_STATUS_BAD;
+
+		drmModeFreeProperty(prop);
+		drmModeFreeConnector(connector);
+	}
+}
+
+static void
 require_connector_present(data_t *data, unsigned int type)
 {
 	int i;
@@ -310,6 +335,8 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port,
 				int alt_edid_id)
 {
 	struct udev_monitor *mon = igt_watch_hotplug();
+	bool link_status_failed[2][data->port_count];
+	int p;
 
 	reset_state(data, port);
 
@@ -326,8 +353,16 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port,
 	 */
 	chamelium_port_set_edid(data->chamelium, port, alt_edid_id);
 
+	get_connectors_link_status_failed(data, link_status_failed[0]);
+
 	igt_system_suspend_autoresume(state, test);
+
 	igt_assert(igt_hotplug_detected(mon, HOTPLUG_TIMEOUT));
+
+	get_connectors_link_status_failed(data, link_status_failed[1]);
+
+	for (p = 0; p < data->port_count; p++)
+		igt_skip_on(!link_status_failed[0][p] && link_status_failed[1][p]);
 }
 
 static igt_output_t *
-- 
2.13.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-07-19 15:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-18 15:16 [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Paul Kocialkowski
2017-07-18 15:16 ` [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event Paul Kocialkowski
2017-07-18 21:21   ` Chris Wilson
2017-07-19  8:31     ` Paul Kocialkowski
2017-07-19 15:47       ` Lyude Paul
2017-07-18 15:16 ` [PATCH i-g-t 2/2] tests/chamelium: Catch and flush hotplug uevents after each plug Paul Kocialkowski
2017-07-18 20:12 ` [Intel-gfx] [PATCH i-g-t 0/2] Unrelated hotplug uevent masking out actual test result Lyude Paul
  -- strict thread matches above, loose matches on Subject: below --
2017-07-18 15:15 [PATCH i-g-t 1/2] tests/chamelium: Skip suspend/resume test with unreliable hotplug event Paul Kocialkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).