All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] xfree86: Handle drm race condition
@ 2013-03-18 20:51 Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 1/7] xfree86: (Cleanup) Close fd if drm interface 1.4 could not be set Bryce Harrington
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst

Update:  Squashes a couple commits to avoid potential hang if
git bisecting.  No other changes from v1.

----
When starting up (on Ubuntu), X can hit an error trying to set the
version on the drm device.  We believe this is a race with plymouth (or
the kernel), since adding some delay to the boot results in a
functioning session for affected users.

So far we have not found a reliable way to reproduce the bug
synthetically.  It appears to affect users on fast booting hardware
(e.g. SSDs) when using the Intel graphics driver.

We have not root-caused the bug yet.  Currently we suspect the actual
breakage is underneath X (plymouth/lightdm/kernel), and are still
experimenting.  However, this patch does seem to improve things for
users, so it or parts of it may be worth your consideration for
inclusion in xserver.

I'm including the patch broken down into easily pickable chunks.  Note
the final patch in the series is highly optional; it handles EAGAIN
being passed from the ioctl - which looks impossible to happen in
practice.

https://bugs.launchpad.net/ubuntu/+source/libdrm/+bug/982889


Bryce Harrington (7):
  xfree86: (Cleanup) Close fd if drm interface 1.4 could not be set.
  xfree86: Track error code and add label for error handling.
  xfree86: Provide more details on failure
  xfree86: Keep trying to set interface on drm for 2 seconds.
  xfree86: Fix race condition failure opening drm.
  xfree86: Be verbose if waiting on opening the drm device
  xfree86: Also handle EAGAIN errors from drmSetInterfaceVersion().

 hw/xfree86/os-support/linux/lnx_platform.c |   43 ++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 8 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/7] xfree86: (Cleanup) Close fd if drm interface 1.4 could not be set.
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 2/7] xfree86: Track error code and add label for error handling Bryce Harrington
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst


Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 76f5583..69a5b8c 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -34,6 +34,7 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
     sv.drm_dd_minor = -1;       /* Don't care */
     if (drmSetInterfaceVersion(fd, &sv)) {
         ErrorF("setversion 1.4 failed\n");
+	close(fd);
         return FALSE;
     }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/7] xfree86: Track error code and add label for error handling.
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 1/7] xfree86: (Cleanup) Close fd if drm interface 1.4 could not be set Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 3/7] xfree86: Provide more details on failure Bryce Harrington
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst


Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 69a5b8c..6ee219a 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -23,6 +23,7 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
     drmSetVersion sv;
     char *buf;
     int fd;
+    int err = 0;
 
     fd = open(path, O_RDWR, O_CLOEXEC);
     if (fd == -1)
@@ -32,10 +33,10 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
     sv.drm_di_minor = 4;
     sv.drm_dd_major = -1;       /* Don't care */
     sv.drm_dd_minor = -1;       /* Don't care */
-    if (drmSetInterfaceVersion(fd, &sv)) {
+    err = drmSetInterfaceVersion(fd, &sv);
+    if (err) {
         ErrorF("setversion 1.4 failed\n");
-	close(fd);
-        return FALSE;
+	goto out;
     }
 
     xf86_add_platform_device(attribs);
@@ -44,8 +45,9 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
     xf86_add_platform_device_attrib(xf86_num_platform_devices - 1,
                                     ODEV_ATTRIB_BUSID, buf);
     drmFreeBusid(buf);
+out:
     close(fd);
-    return TRUE;
+    return (err == 0);
 }
 
 Bool
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/7] xfree86: Provide more details on failure
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 1/7] xfree86: (Cleanup) Close fd if drm interface 1.4 could not be set Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 2/7] xfree86: Track error code and add label for error handling Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 4/7] xfree86: Keep trying to set interface on drm for 2 seconds Bryce Harrington
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst


Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 6ee219a..3ae2db1 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -7,6 +7,8 @@
 #include <xf86drm.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <errno.h>
+#include <string.h>
 
 /* Linux platform device support */
 #include "xf86_OSproc.h"
@@ -35,7 +37,7 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
     sv.drm_dd_minor = -1;       /* Don't care */
     err = drmSetInterfaceVersion(fd, &sv);
     if (err) {
-        ErrorF("setversion 1.4 failed\n");
+        ErrorF("setversion 1.4 failed: %s\n", strerror(-err));
 	goto out;
     }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/7] xfree86: Keep trying to set interface on drm for 2 seconds.
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
                   ` (2 preceding siblings ...)
  2013-03-18 20:51 ` [PATCH v2 3/7] xfree86: Provide more details on failure Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 5/7] xfree86: Fix race condition failure opening drm Bryce Harrington
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst

And if we've had to delay booting due to not being able to set the
interface, fess up.

Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |   20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 3ae2db1..4094866 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -26,16 +26,26 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
     char *buf;
     int fd;
     int err = 0;
+    int tries = 0;
 
     fd = open(path, O_RDWR, O_CLOEXEC);
     if (fd == -1)
         return FALSE;
 
-    sv.drm_di_major = 1;
-    sv.drm_di_minor = 4;
-    sv.drm_dd_major = -1;       /* Don't care */
-    sv.drm_dd_minor = -1;       /* Don't care */
-    err = drmSetInterfaceVersion(fd, &sv);
+    while (tries++ < 2000) {
+	sv.drm_di_major = 1;
+	sv.drm_di_minor = 4;
+	sv.drm_dd_major = -1;       /* Don't care */
+	sv.drm_dd_minor = -1;       /* Don't care */
+
+	err = drmSetInterfaceVersion(fd, &sv);
+	if (!err) {
+	    if (tries > 1)
+		LogMessage(X_INFO, "setversion 1.4 succeeded on try #%d\n", tries);
+	    break;
+	}
+	usleep(1000);
+    }
     if (err) {
         ErrorF("setversion 1.4 failed: %s\n", strerror(-err));
 	goto out;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/7] xfree86: Fix race condition failure opening drm.
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
                   ` (3 preceding siblings ...)
  2013-03-18 20:51 ` [PATCH v2 4/7] xfree86: Keep trying to set interface on drm for 2 seconds Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 6/7] xfree86: Be verbose if waiting on opening the drm device Bryce Harrington
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst

If other processes have had drm open previously, xserver may attempt to
open the device too early and fail, with xserver error exit "Cannot
run in framebuffer mode" or Xorg.0.log messages about "setversion 1.4
failed".

In this situation, we're receiving back -EACCES from libdrm.  To address
this we need to re-set ourselves as the drm master, and keep trying to
set the interface until it works (or until we give up).

See https://bugs.launchpad.net/ubuntu/+source/libdrm/+bug/982889

Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 4094866..bb76d90 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -43,8 +43,14 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
 	    if (tries > 1)
 		LogMessage(X_INFO, "setversion 1.4 succeeded on try #%d\n", tries);
 	    break;
+	} if (err != -EACCES) {
+	    break;
 	}
+
 	usleep(1000);
+
+	if (!drmSetMaster(fd))
+	    LogMessage(X_INFO, "drmSetMaster succeeded\n");
     }
     if (err) {
         ErrorF("setversion 1.4 failed: %s\n", strerror(-err));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/7] xfree86: Be verbose if waiting on opening the drm device
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
                   ` (4 preceding siblings ...)
  2013-03-18 20:51 ` [PATCH v2 5/7] xfree86: Fix race condition failure opening drm Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-18 20:51 ` [PATCH v2 7/7] xfree86: Also handle EAGAIN errors from drmSetInterfaceVersion() Bryce Harrington
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst


Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index bb76d90..3386b67 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -43,7 +43,10 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
 	    if (tries > 1)
 		LogMessage(X_INFO, "setversion 1.4 succeeded on try #%d\n", tries);
 	    break;
-	} if (err != -EACCES) {
+	} if (err == -EACCES) {
+	    if (tries % 500 == 0)
+		LogMessage(X_INFO, "waiting on drm device...\n");
+	} else {
 	    break;
 	}
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 7/7] xfree86: Also handle EAGAIN errors from drmSetInterfaceVersion().
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
                   ` (5 preceding siblings ...)
  2013-03-18 20:51 ` [PATCH v2 6/7] xfree86: Be verbose if waiting on opening the drm device Bryce Harrington
@ 2013-03-18 20:51 ` Bryce Harrington
  2013-03-19  9:21 ` [PATCH v2 0/7] xfree86: Handle drm race condition Chris Wilson
  2013-03-30 18:02 ` Chris Wilson
  8 siblings, 0 replies; 20+ messages in thread
From: Bryce Harrington @ 2013-03-18 20:51 UTC (permalink / raw)
  To: intel-gfx, Chris Wilson; +Cc: Maarten Lankhorst

It has been suggested that the kernel may pass EAGAIN when the device is
unavailable.  This hasn't been seen in practice, and examination of the
function definition in libdrm suggests EAGAIN is handled internally so
would not be seen by the xserver when making this call.  So, this patch
is probably unneeded.  But include support anyway.

Signed-off-by: Bryce Harrington <bryce@canonical.com>
---
 hw/xfree86/os-support/linux/lnx_platform.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 3386b67..b05719d 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -46,6 +46,9 @@ get_drm_info(struct OdevAttributes *attribs, char *path)
 	} if (err == -EACCES) {
 	    if (tries % 500 == 0)
 		LogMessage(X_INFO, "waiting on drm device...\n");
+	} if (err == -EAGAIN) {
+	    if (tries % 500 == 0)
+		LogMessage(X_INFO, "drm device busy...\n");
 	} else {
 	    break;
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
                   ` (6 preceding siblings ...)
  2013-03-18 20:51 ` [PATCH v2 7/7] xfree86: Also handle EAGAIN errors from drmSetInterfaceVersion() Bryce Harrington
@ 2013-03-19  9:21 ` Chris Wilson
  2013-03-19 10:02   ` Maarten Lankhorst
  2013-03-30 18:02 ` Chris Wilson
  8 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2013-03-19  9:21 UTC (permalink / raw)
  To: Bryce Harrington; +Cc: Maarten Lankhorst, intel-gfx

On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
> Update:  Squashes a couple commits to avoid potential hang if
> git bisecting.  No other changes from v1.

I'd probably drop the last EAGAIN patch as that is part of the libdrm
API, but other than that it looks to be a reasonably self-contained w/a
for this perplexing problem.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-19  9:21 ` [PATCH v2 0/7] xfree86: Handle drm race condition Chris Wilson
@ 2013-03-19 10:02   ` Maarten Lankhorst
  2013-03-19 10:27     ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Maarten Lankhorst @ 2013-03-19 10:02 UTC (permalink / raw)
  To: Chris Wilson, Bryce Harrington, intel-gfx

Hey,

Op 19-03-13 10:21, Chris Wilson schreef:
> On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
>> Update:  Squashes a couple commits to avoid potential hang if
>> git bisecting.  No other changes from v1.
> I'd probably drop the last EAGAIN patch as that is part of the libdrm
> API, but other than that it looks to be a reasonably self-contained w/a
> for this perplexing problem.
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>
And completely wrong, version I pushed to ubuntu's xorg-server for comparison:

>8--

Nacked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -7,6 +7,7 @@
 #include <xf86drm.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <errno.h>
 
 /* Linux platform device support */
 #include "xf86_OSproc.h"
@@ -17,23 +18,54 @@
 
 #include "hotplug.h"
 
+static Bool get_drm_master(int fd)
+{
+    int ret, tries = 400;
+
+    LogMessage(X_INFO, "spinning!\n");
+
+    while (tries--) {
+        if (drmSetMaster(fd) >= 0)
+            return TRUE;
+
+        if (errno != EINVAL)
+            break;
+
+        usleep(10000);
+    }
+    return FALSE;
+}
+
 static Bool
 get_drm_info(struct OdevAttributes *attribs, char *path)
 {
     drmSetVersion sv;
     char *buf;
     int fd;
+    int err = 0;
 
     fd = open(path, O_RDWR, O_CLOEXEC);
     if (fd == -1)
         return FALSE;
 
-    sv.drm_di_major = 1;
-    sv.drm_di_minor = 4;
-    sv.drm_dd_major = -1;       /* Don't care */
-    sv.drm_dd_minor = -1;       /* Don't care */
-    if (drmSetInterfaceVersion(fd, &sv)) {
-        ErrorF("setversion 1.4 failed\n");
+    while (1) {
+        sv.drm_di_major = 1;
+        sv.drm_di_minor = 4;
+        sv.drm_dd_major = -1;       /* Don't care */
+        sv.drm_dd_minor = -1;       /* Don't care */
+
+        err = drmSetInterfaceVersion(fd, &sv);
+        if (!err)
+            break;
+
+        if (err == -EACCES) {
+            if (get_drm_master(fd))
+                continue;
+            ErrorF("drmSetMaster failed with -%i(%m)\n", errno);
+        } else
+            ErrorF("drmSetInterfaceVersion failed with %i(%s)\n", err, strerror(-err));
+
+        close(fd);
         return FALSE;
     }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-19 10:02   ` Maarten Lankhorst
@ 2013-03-19 10:27     ` Chris Wilson
  2013-03-19 10:50       ` Maarten Lankhorst
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2013-03-19 10:27 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-gfx

On Tue, Mar 19, 2013 at 11:02:14AM +0100, Maarten Lankhorst wrote:
> Hey,
> 
> Op 19-03-13 10:21, Chris Wilson schreef:
> > On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
> >> Update:  Squashes a couple commits to avoid potential hang if
> >> git bisecting.  No other changes from v1.
> > I'd probably drop the last EAGAIN patch as that is part of the libdrm
> > API, but other than that it looks to be a reasonably self-contained w/a
> > for this perplexing problem.
> >
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> > -Chris
> >
> And completely wrong, version I pushed to ubuntu's xorg-server for comparison:
> 
> Nacked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

So you pushed the busy-spin into drmSetMaster(), which is just a tighter
variant of the above.

Anything which adds the minimal delay, warns about that delay, and
works around the issue is fine by me.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-19 10:27     ` Chris Wilson
@ 2013-03-19 10:50       ` Maarten Lankhorst
  2013-03-19 11:10         ` Dave Airlie
  2013-03-19 21:13         ` Chris Wilson
  0 siblings, 2 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2013-03-19 10:50 UTC (permalink / raw)
  To: Chris Wilson, Bryce Harrington, intel-gfx, X.Org Devel List,
	dri-devel, Timo Aaltonen

Hey,

Op 19-03-13 11:27, Chris Wilson schreef:
> On Tue, Mar 19, 2013 at 11:02:14AM +0100, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 19-03-13 10:21, Chris Wilson schreef:
>>> On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
>>>> Update:  Squashes a couple commits to avoid potential hang if
>>>> git bisecting.  No other changes from v1.
>>> I'd probably drop the last EAGAIN patch as that is part of the libdrm
>>> API, but other than that it looks to be a reasonably self-contained w/a
>>> for this perplexing problem.
>>>
>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> -Chris
>>>
>> And completely wrong, version I pushed to ubuntu's xorg-server for comparison:
>>
>> Nacked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> So you pushed the busy-spin into drmSetMaster(), which is just a tighter
> variant of the above.
>
> Anything which adds the minimal delay, warns about that delay, and
> works around the issue is fine by me.
> -Chris

Here's what I think is happening, based on the information I have.

Because of the delayed fput in recent kernels, it is possible for plymouth to exit and not drop master right away.
It's put onto a workqueue to be freed slightly later. Xorg-server starts in the meantime, opens a fd, but because the fd
hasn't been closed by plymouth yet, it didn't get implicitly authenticated and it didn't get drm master either.

The drmSetMaster call is needed, but the spinning is really just waiting for the workqueue to run.

bryce's patch never worked, it just caused it to try drmsetinterfaceversion for a few seconds before timing out. That call
was failing because his patch series never tried to obtain drm master.

The get_drm_info call also makes it more likely to run into the same problem as well. It opens the fd and immediately
closes it again. This will re-trigger the race..

For testing I did a small patch in the drm core that drops drm master when opening device.
The patch is attached inline below.

radeon and intel driver both fail to load with it. Intel doesn't return an error, and falls back silently to modesetting.
radeon however complains similar to this:

[    42.876] (==) RADEON(G0): Depth 24, (--) framebuffer bpp 32
[    42.876] (II) RADEON(G0): Pixel depth = 24 bits stored in 4 bytes (32 bpp pixmaps)
[    42.876] (==) RADEON(G0): Default visual is TrueColor
[    42.876] (==) RADEON(G0): RGB weight 888
[    42.876] (II) RADEON(G0): Using 8 bits per RGB (8 bit DAC)
[    42.876] (--) RADEON(G0): Chipset: "TURKS" (ChipID = 0x6741)
[    42.961] (EE) RADEON(G0): [drm] failed to set drm interface version.
[    42.961] (EE) RADEON(G0): Kernel modesetting setup failed

I've seen this error before in one of the races, so it's not just a theoretical issue. Just another possible failure mode.

I think all drivers have to be fixed to handle this case correctly, and they should probably all do the same spinning as well.

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index f369429..1d3099f 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -339,6 +339,7 @@ static int drm_open_helper(struct inode *inode, struct file *filp,
 			}
 		}
 		mutex_unlock(&dev->struct_mutex);
+		drm_dropmaster_ioctl(dev, NULL, priv);
 	} else {
 		/* get a reference to the master */
 		priv->master = drm_master_get(priv->minor->master);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-19 10:50       ` Maarten Lankhorst
@ 2013-03-19 11:10         ` Dave Airlie
       [not found]           ` <CAPM=9twNnKV9DUNJ-BfrnXTrj=W+AGyxqauCPMW2kCx39bj_pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-03-19 21:13         ` Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Airlie @ 2013-03-19 11:10 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: X.Org Devel List, intel-gfx, dri-devel

>
> Because of the delayed fput in recent kernels, it is possible for plymouth to exit and not drop master right away.
> It's put onto a workqueue to be freed slightly later. Xorg-server starts in the meantime, opens a fd, but because the fd
> hasn't been closed by plymouth yet, it didn't get implicitly authenticated and it didn't get drm master either.
>

I thought plymouth explicitly dropped master, and closed later. I know
we "ab"use that fact on Fedora so X can grab the bo from plymouth
before it exits.

Dave.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
       [not found]           ` <CAPM=9twNnKV9DUNJ-BfrnXTrj=W+AGyxqauCPMW2kCx39bj_pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-03-19 12:18             ` Maarten Lankhorst
  0 siblings, 0 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2013-03-19 12:18 UTC (permalink / raw)
  To: Dave Airlie
  Cc: X.Org Devel List, intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Op 19-03-13 12:10, Dave Airlie schreef:
>> Because of the delayed fput in recent kernels, it is possible for plymouth to exit and not drop master right away.
>> It's put onto a workqueue to be freed slightly later. Xorg-server starts in the meantime, opens a fd, but because the fd
>> hasn't been closed by plymouth yet, it didn't get implicitly authenticated and it didn't get drm master either.
>>
> I thought plymouth explicitly dropped master, and closed later. I know
> we "ab"use that fact on Fedora so X can grab the bo from plymouth
> before it exits.
>
> Dave.
> _______________________________________________
> xorg-devel-go0+a7rfsptAfugRpC6u6w@public.gmane.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel
>
Well from trying the dropmaster kernel patch, it simply looks like there are just too many places that could get affected by this assumption.

Lets just try something ugly in the flush callback that's called before final fput instead, that should fix all our problems!

XXX: the big if is duplicated from drm_release, and it should probably be split into a separate function.
However if you're hit by the plymouth race, this might be a good thing to try.

The fix for drivers other than radeon/i915 is left as an excercise for the reader.

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index f369429..ecf8689 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -177,6 +177,50 @@ err:
 }
 EXPORT_SYMBOL(drm_open);
 
+int drm_flush(struct file *filp, fl_owner_t id)
+{
+	struct drm_file *file_priv = filp->private_data;
+	struct drm_device *dev = file_priv->minor->dev;
+
+	if (atomic_long_read(&filp->f_count) != 1 || !file_priv->is_master)
+		return 0;
+
+	mutex_lock(&dev->struct_mutex);
+
+	if (file_priv->is_master) {
+		struct drm_master *master = file_priv->master;
+		struct drm_file *temp;
+		list_for_each_entry(temp, &dev->filelist, lhead) {
+			if ((temp->master == file_priv->master) &&
+			    (temp != file_priv))
+				temp->authenticated = 0;
+		}
+
+		/**
+		 * Since the master is disappearing, so is the
+		 * possibility to lock.
+		 */
+
+		if (master->lock.hw_lock) {
+			if (dev->sigdata.lock == master->lock.hw_lock)
+				dev->sigdata.lock = NULL;
+			master->lock.hw_lock = NULL;
+			master->lock.file_priv = NULL;
+			wake_up_interruptible_all(&master->lock.lock_queue);
+		}
+
+		if (file_priv->minor->master == file_priv->master) {
+			/* drop the reference held my the minor */
+			if (dev->driver->master_drop)
+				dev->driver->master_drop(dev, file_priv, true);
+			drm_master_put(&file_priv->minor->master);
+		}
+	}
+	mutex_unlock(&dev->struct_mutex);
+	return 0;
+}
+EXPORT_SYMBOL(drm_flush);
+
 /**
  * File \c open operation.
  *
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 62aaf8d..6dcfec3 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1018,6 +1018,7 @@ static const struct vm_operations_struct i915_gem_vm_ops = {
 static const struct file_operations i915_driver_fops = {
 	.owner = THIS_MODULE,
 	.open = drm_open,
+	.flush = drm_flush,
 	.release = drm_release,
 	.unlocked_ioctl = drm_ioctl,
 	.mmap = drm_gem_mmap,
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 5cdd684..2c439f9 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -361,6 +361,7 @@ radeon_pci_resume(struct pci_dev *pdev)
 static const struct file_operations radeon_driver_kms_fops = {
 	.owner = THIS_MODULE,
 	.open = drm_open,
+	.flush = drm_flush,
 	.release = drm_release,
 	.unlocked_ioctl = drm_ioctl,
 	.mmap = radeon_mmap,
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 6cd30db..2a4f97d 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -1320,6 +1320,8 @@ extern int drm_stub_open(struct inode *inode, struct file *filp);
 extern int drm_fasync(int fd, struct file *filp, int on);
 extern ssize_t drm_read(struct file *filp, char __user *buffer,
 			size_t count, loff_t *offset);
+
+extern int drm_flush(struct file *filp, fl_owner_t id);
 extern int drm_release(struct inode *inode, struct file *filp);
 
 				/* Mapping support (drm_vm.h) */

_______________________________________________
xorg-devel-go0+a7rfsptAfugRpC6u6w@public.gmane.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-19 10:50       ` Maarten Lankhorst
  2013-03-19 11:10         ` Dave Airlie
@ 2013-03-19 21:13         ` Chris Wilson
  2013-03-20  8:40           ` Maarten Lankhorst
  1 sibling, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2013-03-19 21:13 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: X.Org Devel List, intel-gfx, dri-devel

On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
> The drmSetMaster call is needed, but the spinning is really just waiting for the workqueue to run.
> 
> bryce's patch never worked, it just caused it to try drmsetinterfaceversion for a few seconds before timing out. That call
> was failing because his patch series never tried to obtain drm master.

You missed that the series Bryce posted did contain the drmSetMaster()
call inside the loop to retry drmSetVersion(). :)

Your explanation as to why the delay is required is certainly
intriguing. Thanks,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-19 21:13         ` Chris Wilson
@ 2013-03-20  8:40           ` Maarten Lankhorst
  2013-03-20 10:43             ` Maarten Lankhorst
  2013-03-20 14:09             ` Chris Wilson
  0 siblings, 2 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2013-03-20  8:40 UTC (permalink / raw)
  To: Chris Wilson, Bryce Harrington, intel-gfx, X.Org Devel List,
	dri-devel, Timo Aaltonen

Hey,

Op 19-03-13 22:13, Chris Wilson schreef:
> On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
>> The drmSetMaster call is needed, but the spinning is really just waiting for the workqueue to run.
>>
>> bryce's patch never worked, it just caused it to try drmsetinterfaceversion for a few seconds before timing out. That call
>> was failing because his patch series never tried to obtain drm master.
> You missed that the series Bryce posted did contain the drmSetMaster()
> call inside the loop to retry drmSetVersion(). :)
>
>
Oh I must have missed that.

Is the drmSetInterfaceVersion call really needed here? If I look at DRM_IOCTL_GET_UNIQUE,
I don't see any requirement of drm master or anything, so it looks to me like for this specific race
the drmSetInterfaceVersion call can be skipped entirely without any side effects.
This would end up with cleaner code here, and drop the master requirement entirely.

Of course there's still a race that needs to be investigated, and is currently not completely understood, I think.

~Maarten

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-20  8:40           ` Maarten Lankhorst
@ 2013-03-20 10:43             ` Maarten Lankhorst
  2013-03-20 14:09             ` Chris Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2013-03-20 10:43 UTC (permalink / raw)
  To: Chris Wilson, Bryce Harrington, intel-gfx, X.Org Devel List,
	dri-devel, Timo Aaltonen, Dave Airlie

Op 20-03-13 09:40, Maarten Lankhorst schreef:
> Hey,
>
> Op 19-03-13 22:13, Chris Wilson schreef:
>> On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
>>> The drmSetMaster call is needed, but the spinning is really just waiting for the workqueue to run.
>>>
>>> bryce's patch never worked, it just caused it to try drmsetinterfaceversion for a few seconds before timing out. That call
>>> was failing because his patch series never tried to obtain drm master.
>> You missed that the series Bryce posted did contain the drmSetMaster()
>> call inside the loop to retry drmSetVersion(). :)
>>
>>
> Oh I must have missed that.
>
> Is the drmSetInterfaceVersion call really needed here? If I look at DRM_IOCTL_GET_UNIQUE,
> I don't see any requirement of drm master or anything, so it looks to me like for this specific race
> the drmSetInterfaceVersion call can be skipped entirely without any side effects.
> This would end up with cleaner code here, and drop the master requirement entirely.
>
> Of course there's still a race that needs to be investigated, and is currently not completely understood, I think.
>
Or worse, is that drmGetBusId call there even useful? From digging at the kernel it seems it's a per master value.
So if a device is hotplugged, it wouldn't be set yet. If someone else holds master, it wouldn't be set either.
In fact it would only be ever set from DRIOpenDRMMaster, but that call only happens a lot later, if it even happens at all.

It seems to me like opening the fd there should be removed entirely, and the bus id should be retrieved from the udev event instead.

I'll try to get something working for this.

~Maarten

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-20  8:40           ` Maarten Lankhorst
  2013-03-20 10:43             ` Maarten Lankhorst
@ 2013-03-20 14:09             ` Chris Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2013-03-20 14:09 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: X.Org Devel List, intel-gfx, dri-devel

On Wed, Mar 20, 2013 at 09:40:04AM +0100, Maarten Lankhorst wrote:
> Is the drmSetInterfaceVersion call really needed here? If I look at DRM_IOCTL_GET_UNIQUE,
> I don't see any requirement of drm master or anything, so it looks to me like for this specific race
> the drmSetInterfaceVersion call can be skipped entirely without any side effects.
> This would end up with cleaner code here, and drop the master requirement entirely.

Indeed, it does look like drmSetVersion() at that point is overkill.
Instead we will hit the race later in the drivers. For the purposes of
clearer code, we could happily lose that drmSetVersion().
 
> Of course there's still a race that needs to be investigated, and is currently not completely understood, I think.

We are all in agreement. Ultimately we want to root cause the race, in
the meantime we need a fallback to make sure that no desktop is left
behind!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
                   ` (7 preceding siblings ...)
  2013-03-19  9:21 ` [PATCH v2 0/7] xfree86: Handle drm race condition Chris Wilson
@ 2013-03-30 18:02 ` Chris Wilson
  2013-03-31 17:14   ` Ben Widawsky
  8 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2013-03-30 18:02 UTC (permalink / raw)
  To: Bryce Harrington; +Cc: Maarten Lankhorst, intel-gfx

On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
> Update:  Squashes a couple commits to avoid potential hang if
> git bisecting.  No other changes from v1.

I'm seeing another variation (both in lp and reported by Ben) whereby it
appears that the open("/dev/dri/card0") fails and so needs be pushed
into the repeat loop.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/7] xfree86: Handle drm race condition
  2013-03-30 18:02 ` Chris Wilson
@ 2013-03-31 17:14   ` Ben Widawsky
  0 siblings, 0 replies; 20+ messages in thread
From: Ben Widawsky @ 2013-03-31 17:14 UTC (permalink / raw)
  To: Chris Wilson, Bryce Harrington, intel-gfx, Maarten Lankhorst

On Sat, Mar 30, 2013 at 06:02:16PM +0000, Chris Wilson wrote:
> On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
> > Update:  Squashes a couple commits to avoid potential hang if
> > git bisecting.  No other changes from v1.
> 
> I'm seeing another variation (both in lp and reported by Ben) whereby it
> appears that the open("/dev/dri/card0") fails and so needs be pushed
> into the repeat loop.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

I can reproduce very easily. Let me know if I can help.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-03-31 17:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 20:51 [PATCH v2 0/7] xfree86: Handle drm race condition Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 1/7] xfree86: (Cleanup) Close fd if drm interface 1.4 could not be set Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 2/7] xfree86: Track error code and add label for error handling Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 3/7] xfree86: Provide more details on failure Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 4/7] xfree86: Keep trying to set interface on drm for 2 seconds Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 5/7] xfree86: Fix race condition failure opening drm Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 6/7] xfree86: Be verbose if waiting on opening the drm device Bryce Harrington
2013-03-18 20:51 ` [PATCH v2 7/7] xfree86: Also handle EAGAIN errors from drmSetInterfaceVersion() Bryce Harrington
2013-03-19  9:21 ` [PATCH v2 0/7] xfree86: Handle drm race condition Chris Wilson
2013-03-19 10:02   ` Maarten Lankhorst
2013-03-19 10:27     ` Chris Wilson
2013-03-19 10:50       ` Maarten Lankhorst
2013-03-19 11:10         ` Dave Airlie
     [not found]           ` <CAPM=9twNnKV9DUNJ-BfrnXTrj=W+AGyxqauCPMW2kCx39bj_pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-19 12:18             ` Maarten Lankhorst
2013-03-19 21:13         ` Chris Wilson
2013-03-20  8:40           ` Maarten Lankhorst
2013-03-20 10:43             ` Maarten Lankhorst
2013-03-20 14:09             ` Chris Wilson
2013-03-30 18:02 ` Chris Wilson
2013-03-31 17:14   ` Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.