All of lore.kernel.org
 help / color / mirror / Atom feed
* Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
@ 2016-05-27 12:23 Lukas Wunner
  2016-05-30  9:07   ` Bruno Prémont
  0 siblings, 1 reply; 5+ messages in thread
From: Lukas Wunner @ 2016-05-27 12:23 UTC (permalink / raw)
  To: Bruno Premont, platform-driver-x86; +Cc: Wilfried Klaebe, dri-devel

Hi Bruno,

Wilfried Klaebe has reported a deadlock in 4.6 which he bisected to
my commit 704ab614ec12 ("drm/i915: Defer probe if gmux is present but
its driver isn't"), but which is ultimately caused by your commit
4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes").

What's happening is that your commit calls vga_tryget() in apple-gmux,
which succeeds. When Xorg is launched, it opens /dev/vga_arbiter and
calls vga_get(), which deadlocks. See the attachments to this bugzilla
entry, in particular the stacktrace at the end of "kern.log / dmesg of
non-working Linux 4.6.0":
https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
https://bugzilla.kernel.org/attachment.cgi?id=217541

For some reason the deadlock only occurs if apple-gmux loads before
i915, so it seems there's a race condition in your commit. My commit
ensures that this is the case, because i915 cannot probe the panel's
EDID without gmux. (The panel is switched to the radeon GPU.)
Notably, your commit message says that: "It is expected to load/probe
gmux prior to graphics drivers." However your commit does not take any
precautions to actually ensure that. What's more, now that apple-gmux
*does* load before i915, your commit breaks.

I'm not really familiar with vgaarb, but grepping through the kernel
tree I can find only 2 drivers which use it, i915 and vfio. In both
cases, the kernel only briefly acquires a lock to write to VGA
registers, and immediately releases the lock afterwards. So it looks
to me like the intended usage is to not hold a lock over a prolonged
period of time. IIUC the proprietary nvidia driver is doing exactly
that, and you sought to work around this issue by also holding a lock
indefinitely in apple-gmux.c.

The proper way, at least from my point of view as a complete vgaarb
dimwit, seems to briefly acquire a lock whenever apple-gmux.c
accesses its registers in the 0x700 IO range. And likewise nvidia
ought to fix their driver to only acquire a lock whenever they
actually need it. It was noble that you tried to help the user
with the nvidia driver issue, but ultimately we can't workaround
nvidia's bugs if it causes breakage elsewhere. They need to fix
their closed source driver.

There's also this unresolved issue that your commit broke backlight
control on the MacBookPro11,3 and 11,5:
https://bugzilla.kernel.org/show_bug.cgi?id=105051

So what do we do? We need to do something because now we've got the
deadlock regression in 4.6 on top. :(

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
  2016-05-27 12:23 Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes") Lukas Wunner
@ 2016-05-30  9:07   ` Bruno Prémont
  0 siblings, 0 replies; 5+ messages in thread
From: Bruno Prémont @ 2016-05-30  9:07 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: platform-driver-x86, Wilfried Klaebe, dri-devel

Hi Lucas,

On Fri, 27 May 2016 14:23:02 +0200 Lukas Wunner wrote:
> Hi Bruno,
> 
> Wilfried Klaebe has reported a deadlock in 4.6 which he bisected to
> my commit 704ab614ec12 ("drm/i915: Defer probe if gmux is present but
> its driver isn't"), but which is ultimately caused by your commit
> 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes").
> 
> What's happening is that your commit calls vga_tryget() in apple-gmux,
> which succeeds. When Xorg is launched, it opens /dev/vga_arbiter and
> calls vga_get(), which deadlocks. See the attachments to this bugzilla
> entry, in particular the stacktrace at the end of "kern.log / dmesg of
> non-working Linux 4.6.0":
> https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> https://bugzilla.kernel.org/attachment.cgi?id=217541

Looks like here the two GPUs are visible and the right one is being
selected.
So your patch to allow including ISA devices in vga arbitration would
not change anything here.

Is it known at what time the lockup happens?
From the kernel trace the lockup only happens during write() and not
already at open time. Thus the command written to /dev/vga_arbiter would
be helpful for proper understanding.
From kernel side code I think it's a call related to radeon and not
intel in Xorg that calls out to vga_arbiter, but please provide
more details (either strace&/backtrace on Xorg or kernel with pr_debug
active for vgaargb and corresponding kernel log).


From the bug report, comment #11 I don't understand this part:
  I boot Linux directly via rEFInd. I got the radeon xorg drivers
  installed, but neither intel nor fbdev or kms.

Does this mean that neither Intel KMS nor Radeon KMS are able to
initialize, nor simplefb's fbdev?
The kernel log though reports proper startup of radeon (KMS) and i915
(KMS) so only userspace gets caught...
So details on the userspace stack (xf86-video-intel and libdrm versions)
would be helpful as well.

> For some reason the deadlock only occurs if apple-gmux loads before
> i915, so it seems there's a race condition in your commit. My commit
> ensures that this is the case, because i915 cannot probe the panel's
> EDID without gmux. (The panel is switched to the radeon GPU.)
> Notably, your commit message says that: "It is expected to load/probe
> gmux prior to graphics drivers." However your commit does not take any
> precautions to actually ensure that. What's more, now that apple-gmux
> *does* load before i915, your commit breaks.

I think here userspace and kernel-side parts are getting mixed-up.

From the failing kernel log, both i915 and radeon are loaded long before
the lock happens (with vgaarb being loaded yet earlier).

The only one having trouble is Xorg (would it be possible to get a
backtrace of Xorg to know what code called/triggered /dev/vga_arbiter
and what it intends to do?).

> I'm not really familiar with vgaarb, but grepping through the kernel
> tree I can find only 2 drivers which use it, i915 and vfio. In both
> cases, the kernel only briefly acquires a lock to write to VGA
> registers, and immediately releases the lock afterwards. So it looks
> to me like the intended usage is to not hold a lock over a prolonged
> period of time. IIUC the proprietary nvidia driver is doing exactly
> that, and you sought to work around this issue by also holding a lock
> indefinitely in apple-gmux.c.
> 
> The proper way, at least from my point of view as a complete vgaarb
> dimwit, seems to briefly acquire a lock whenever apple-gmux.c
> accesses its registers in the 0x700 IO range. And likewise nvidia
> ought to fix their driver to only acquire a lock whenever they
> actually need it. It was noble that you tried to help the user
> with the nvidia driver issue, but ultimately we can't workaround
> nvidia's bugs if it causes breakage elsewhere. They need to fix
> their closed source driver.
> 
> There's also this unresolved issue that your commit broke backlight
> control on the MacBookPro11,3 and 11,5:
> https://bugzilla.kernel.org/show_bug.cgi?id=105051

You had a patch allowing registering ISA devices to vgaarb (for when
Intel GPU is hidden), did anything happen with that one?

> So what do we do? We need to do something because now we've got the
> deadlock regression in 4.6 on top. :(
> 
> Thanks,
> 
> Lukas

Bruno

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
@ 2016-05-30  9:07   ` Bruno Prémont
  0 siblings, 0 replies; 5+ messages in thread
From: Bruno Prémont @ 2016-05-30  9:07 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: platform-driver-x86, Wilfried Klaebe, dri-devel

Hi Lucas,

On Fri, 27 May 2016 14:23:02 +0200 Lukas Wunner wrote:
> Hi Bruno,
> 
> Wilfried Klaebe has reported a deadlock in 4.6 which he bisected to
> my commit 704ab614ec12 ("drm/i915: Defer probe if gmux is present but
> its driver isn't"), but which is ultimately caused by your commit
> 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes").
> 
> What's happening is that your commit calls vga_tryget() in apple-gmux,
> which succeeds. When Xorg is launched, it opens /dev/vga_arbiter and
> calls vga_get(), which deadlocks. See the attachments to this bugzilla
> entry, in particular the stacktrace at the end of "kern.log / dmesg of
> non-working Linux 4.6.0":
> https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> https://bugzilla.kernel.org/attachment.cgi?id=217541

Looks like here the two GPUs are visible and the right one is being
selected.
So your patch to allow including ISA devices in vga arbitration would
not change anything here.

Is it known at what time the lockup happens?
>From the kernel trace the lockup only happens during write() and not
already at open time. Thus the command written to /dev/vga_arbiter would
be helpful for proper understanding.
>From kernel side code I think it's a call related to radeon and not
intel in Xorg that calls out to vga_arbiter, but please provide
more details (either strace&/backtrace on Xorg or kernel with pr_debug
active for vgaargb and corresponding kernel log).


>From the bug report, comment #11 I don't understand this part:
  I boot Linux directly via rEFInd. I got the radeon xorg drivers
  installed, but neither intel nor fbdev or kms.

Does this mean that neither Intel KMS nor Radeon KMS are able to
initialize, nor simplefb's fbdev?
The kernel log though reports proper startup of radeon (KMS) and i915
(KMS) so only userspace gets caught...
So details on the userspace stack (xf86-video-intel and libdrm versions)
would be helpful as well.

> For some reason the deadlock only occurs if apple-gmux loads before
> i915, so it seems there's a race condition in your commit. My commit
> ensures that this is the case, because i915 cannot probe the panel's
> EDID without gmux. (The panel is switched to the radeon GPU.)
> Notably, your commit message says that: "It is expected to load/probe
> gmux prior to graphics drivers." However your commit does not take any
> precautions to actually ensure that. What's more, now that apple-gmux
> *does* load before i915, your commit breaks.

I think here userspace and kernel-side parts are getting mixed-up.

>From the failing kernel log, both i915 and radeon are loaded long before
the lock happens (with vgaarb being loaded yet earlier).

The only one having trouble is Xorg (would it be possible to get a
backtrace of Xorg to know what code called/triggered /dev/vga_arbiter
and what it intends to do?).

> I'm not really familiar with vgaarb, but grepping through the kernel
> tree I can find only 2 drivers which use it, i915 and vfio. In both
> cases, the kernel only briefly acquires a lock to write to VGA
> registers, and immediately releases the lock afterwards. So it looks
> to me like the intended usage is to not hold a lock over a prolonged
> period of time. IIUC the proprietary nvidia driver is doing exactly
> that, and you sought to work around this issue by also holding a lock
> indefinitely in apple-gmux.c.
> 
> The proper way, at least from my point of view as a complete vgaarb
> dimwit, seems to briefly acquire a lock whenever apple-gmux.c
> accesses its registers in the 0x700 IO range. And likewise nvidia
> ought to fix their driver to only acquire a lock whenever they
> actually need it. It was noble that you tried to help the user
> with the nvidia driver issue, but ultimately we can't workaround
> nvidia's bugs if it causes breakage elsewhere. They need to fix
> their closed source driver.
> 
> There's also this unresolved issue that your commit broke backlight
> control on the MacBookPro11,3 and 11,5:
> https://bugzilla.kernel.org/show_bug.cgi?id=105051

You had a patch allowing registering ISA devices to vgaarb (for when
Intel GPU is hidden), did anything happen with that one?

> So what do we do? We need to do something because now we've got the
> deadlock regression in 4.6 on top. :(
> 
> Thanks,
> 
> Lukas

Bruno

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
  2016-05-30  9:07   ` Bruno Prémont
  (?)
@ 2016-05-30 10:24   ` Lukas Wunner
  2016-05-30 10:48     ` Bruno Prémont
  -1 siblings, 1 reply; 5+ messages in thread
From: Lukas Wunner @ 2016-05-30 10:24 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: platform-driver-x86, Wilfried Klaebe, dri-devel, Tiago Vignatti

On Mon, May 30, 2016 at 11:07:53AM +0200, Bruno Prémont wrote:
> On Fri, 27 May 2016 14:23:02 +0200 Lukas Wunner wrote:
> > Wilfried Klaebe has reported a deadlock in 4.6 which he bisected to
> > my commit 704ab614ec12 ("drm/i915: Defer probe if gmux is present but
> > its driver isn't"), but which is ultimately caused by your commit
> > 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes").
> > 
> > What's happening is that your commit calls vga_tryget() in apple-gmux,
> > which succeeds. When Xorg is launched, it opens /dev/vga_arbiter and
> > calls vga_get(), which deadlocks. See the attachments to this bugzilla
> > entry, in particular the stacktrace at the end of "kern.log / dmesg of
> > non-working Linux 4.6.0":
> > https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> > https://bugzilla.kernel.org/attachment.cgi?id=217541
> 
> Looks like here the two GPUs are visible and the right one is being
> selected.
> So your patch to allow including ISA devices in vga arbitration would
> not change anything here.
> 
> Is it known at what time the lockup happens?
> From the kernel trace the lockup only happens during write() and not
> already at open time. Thus the command written to /dev/vga_arbiter would
> be helpful for proper understanding.
> From kernel side code I think it's a call related to radeon and not
> intel in Xorg that calls out to vga_arbiter, but please provide
> more details (either strace&/backtrace on Xorg or kernel with pr_debug
> active for vgaargb and corresponding kernel log).

When Xorg starts up, it calls xf86VGAarbiterLock() a couple of times
from InitOutput():
https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/common/xf86Init.c#n569

This calls pci_device_vgaarb_lock():
https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/common/xf86VGAarbiter.c#n88

And this writes to /dev/vga_arbiter:
https://cgit.freedesktop.org/xorg/lib/libpciaccess/tree/src/common_vgaarb.c#n287

Adding Tiago Vignatti to cc: who wrote much of this code.


> From the bug report, comment #11 I don't understand this part:
>   I boot Linux directly via rEFInd. I got the radeon xorg drivers
>   installed, but neither intel nor fbdev or kms.
> 
> Does this mean that neither Intel KMS nor Radeon KMS are able to
> initialize, nor simplefb's fbdev?
> The kernel log though reports proper startup of radeon (KMS) and i915
> (KMS) so only userspace gets caught...
> So details on the userspace stack (xf86-video-intel and libdrm versions)
> would be helpful as well.

The kernel modules apple-gmux, i915 and radeon are present and loaded,
but the user space Xorg driver for i915 is not installed, only radeon.

But we (one the kernel side) cannot force people to have specific
userland drivers installed to avoid deadlocks in the kernel.


> > For some reason the deadlock only occurs if apple-gmux loads before
> > i915, so it seems there's a race condition in your commit. My commit
> > ensures that this is the case, because i915 cannot probe the panel's
> > EDID without gmux. (The panel is switched to the radeon GPU.)
> > Notably, your commit message says that: "It is expected to load/probe
> > gmux prior to graphics drivers." However your commit does not take any
> > precautions to actually ensure that. What's more, now that apple-gmux
> > *does* load before i915, your commit breaks.
> 
> I think here userspace and kernel-side parts are getting mixed-up.
> 
> From the failing kernel log, both i915 and radeon are loaded long before
> the lock happens (with vgaarb being loaded yet earlier).
> 
> The only one having trouble is Xorg (would it be possible to get a
> backtrace of Xorg to know what code called/triggered /dev/vga_arbiter
> and what it intends to do?).
> 
> > I'm not really familiar with vgaarb, but grepping through the kernel
> > tree I can find only 2 drivers which use it, i915 and vfio. In both
> > cases, the kernel only briefly acquires a lock to write to VGA
> > registers, and immediately releases the lock afterwards. So it looks
> > to me like the intended usage is to not hold a lock over a prolonged
> > period of time. IIUC the proprietary nvidia driver is doing exactly
> > that, and you sought to work around this issue by also holding a lock
> > indefinitely in apple-gmux.c.
> > 
> > The proper way, at least from my point of view as a complete vgaarb
> > dimwit, seems to briefly acquire a lock whenever apple-gmux.c
> > accesses its registers in the 0x700 IO range. And likewise nvidia
> > ought to fix their driver to only acquire a lock whenever they
> > actually need it. It was noble that you tried to help the user
> > with the nvidia driver issue, but ultimately we can't workaround
> > nvidia's bugs if it causes breakage elsewhere. They need to fix
> > their closed source driver.
> > 
> > There's also this unresolved issue that your commit broke backlight
> > control on the MacBookPro11,3 and 11,5:
> > https://bugzilla.kernel.org/show_bug.cgi?id=105051
> 
> You had a patch allowing registering ISA devices to vgaarb (for when
> Intel GPU is hidden), did anything happen with that one?

That patch was only half serious.

As said, based on my limited understanding of vgaarb.c, the intended
usage seems not to be that a lock is acquired and kept indefinitely,
but only for the brief period of time when I/O is accessed.

So what Nvidia is doing is plain wrong and people have tried to
convince them to fix their driver time and again in this forum
thread:
https://devtalk.nvidia.com/default/topic/545560/vfio-vga-arbitration-lock/

You were correct in that you noticed that I/O needs to be locked to
a device on the root bus for gmux to be accessible. However the lock
shouldn't be held indefinitely, but only briefly in the gmux_read*(),
gmux_write*() and gmux_is_indexed() functions. So my suggestion would
be to revert 4eebd5a4e726, replace it with a new patch that locks IO
only when needed, and get nvidia to do the same in their driver.

Thanks,

Lukas

> 
> > So what do we do? We need to do something because now we've got the
> > deadlock regression in 4.6 on top. :(
> > 
> > Thanks,
> > 
> > Lukas
> 
> Bruno

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
  2016-05-30 10:24   ` Lukas Wunner
@ 2016-05-30 10:48     ` Bruno Prémont
  0 siblings, 0 replies; 5+ messages in thread
From: Bruno Prémont @ 2016-05-30 10:48 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Wilfried Klaebe, dri-devel, platform-driver-x86

Hi Lucas,

On Mon, 30 May 2016 12:24:46 +0200 Lukas Wunner wrote:
> On Mon, May 30, 2016 at 11:07:53AM +0200, Bruno Prémont wrote:
> > On Fri, 27 May 2016 14:23:02 +0200 Lukas Wunner wrote:  
> > > Wilfried Klaebe has reported a deadlock in 4.6 which he bisected to
> > > my commit 704ab614ec12 ("drm/i915: Defer probe if gmux is present but
> > > its driver isn't"), but which is ultimately caused by your commit
> > > 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes").
> > > 
> > > What's happening is that your commit calls vga_tryget() in apple-gmux,
> > > which succeeds. When Xorg is launched, it opens /dev/vga_arbiter and
> > > calls vga_get(), which deadlocks. See the attachments to this bugzilla
> > > entry, in particular the stacktrace at the end of "kern.log / dmesg of
> > > non-working Linux 4.6.0":
> > > https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> > > https://bugzilla.kernel.org/attachment.cgi?id=217541  
> > 
> > Looks like here the two GPUs are visible and the right one is being
> > selected.
> > So your patch to allow including ISA devices in vga arbitration would
> > not change anything here.
> > 
> > Is it known at what time the lockup happens?
> > From the kernel trace the lockup only happens during write() and not
> > already at open time. Thus the command written to /dev/vga_arbiter would
> > be helpful for proper understanding.
> > From kernel side code I think it's a call related to radeon and not
> > intel in Xorg that calls out to vga_arbiter, but please provide
> > more details (either strace&/backtrace on Xorg or kernel with pr_debug
> > active for vgaargb and corresponding kernel log).  
> 
> When Xorg starts up, it calls xf86VGAarbiterLock() a couple of times
> from InitOutput():
> https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/common/xf86Init.c#n569
> 
> This calls pci_device_vgaarb_lock():
> https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/common/xf86VGAarbiter.c#n88
> 
> And this writes to /dev/vga_arbiter:
> https://cgit.freedesktop.org/xorg/lib/libpciaccess/tree/src/common_vgaarb.c#n287
> 
> Adding Tiago Vignatti to cc: who wrote much of this code.

Ok, and the device chose for the lock is probably the radeon one, thus
the lock taken by vgaarb conflicts with the one libpciaccess wants to
take for radeon driver.
If it was the intel one the kernel should not block as the the one to
be locked  by userspace matches the one locked by kernel.

> > From the bug report, comment #11 I don't understand this part:
> >   I boot Linux directly via rEFInd. I got the radeon xorg drivers
> >   installed, but neither intel nor fbdev or kms.
> > 
> > Does this mean that neither Intel KMS nor Radeon KMS are able to
> > initialize, nor simplefb's fbdev?
> > The kernel log though reports proper startup of radeon (KMS) and i915
> > (KMS) so only userspace gets caught...
> > So details on the userspace stack (xf86-video-intel and libdrm versions)
> > would be helpful as well.  
> 
> The kernel modules apple-gmux, i915 and radeon are present and loaded,
> but the user space Xorg driver for i915 is not installed, only radeon.
> 
> But we (one the kernel side) cannot force people to have specific
> userland drivers installed to avoid deadlocks in the kernel.

Ok, that matches above.

Sure we should not block userspace indefinitely.
On userspace a try-lock instead of lock command would be an option
tough.

It seems that commit 9f5bd30818c42c6c36a51f93b4df75a2ea2bd85e might
also have made the issue worse (as the lock is becoming
effectively uninterruptible with that commit while it was not so before
(no idea if Xorg did receive signals to escape the impossible locking
attempt)

> > > For some reason the deadlock only occurs if apple-gmux loads
> > > before i915, so it seems there's a race condition in your commit.
> > > My commit ensures that this is the case, because i915 cannot
> > > probe the panel's EDID without gmux. (The panel is switched to
> > > the radeon GPU.) Notably, your commit message says that: "It is
> > > expected to load/probe gmux prior to graphics drivers." However
> > > your commit does not take any precautions to actually ensure
> > > that. What's more, now that apple-gmux *does* load before i915,
> > > your commit breaks.  
> > 
> > I think here userspace and kernel-side parts are getting mixed-up.
> > 
> > From the failing kernel log, both i915 and radeon are loaded long
> > before the lock happens (with vgaarb being loaded yet earlier).
> > 
> > The only one having trouble is Xorg (would it be possible to get a
> > backtrace of Xorg to know what code
> > called/triggered /dev/vga_arbiter and what it intends to do?).
> >   
> > > I'm not really familiar with vgaarb, but grepping through the
> > > kernel tree I can find only 2 drivers which use it, i915 and
> > > vfio. In both cases, the kernel only briefly acquires a lock to
> > > write to VGA registers, and immediately releases the lock
> > > afterwards. So it looks to me like the intended usage is to not
> > > hold a lock over a prolonged period of time. IIUC the proprietary
> > > nvidia driver is doing exactly that, and you sought to work
> > > around this issue by also holding a lock indefinitely in
> > > apple-gmux.c.
> > > 
> > > The proper way, at least from my point of view as a complete
> > > vgaarb dimwit, seems to briefly acquire a lock whenever
> > > apple-gmux.c accesses its registers in the 0x700 IO range. And
> > > likewise nvidia ought to fix their driver to only acquire a lock
> > > whenever they actually need it. It was noble that you tried to
> > > help the user with the nvidia driver issue, but ultimately we
> > > can't workaround nvidia's bugs if it causes breakage elsewhere.
> > > They need to fix their closed source driver.
> > > 
> > > There's also this unresolved issue that your commit broke
> > > backlight control on the MacBookPro11,3 and 11,5:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=105051  
> > 
> > You had a patch allowing registering ISA devices to vgaarb (for when
> > Intel GPU is hidden), did anything happen with that one?  
> 
> That patch was only half serious.
> 
> As said, based on my limited understanding of vgaarb.c, the intended
> usage seems not to be that a lock is acquired and kept indefinitely,
> but only for the brief period of time when I/O is accessed.
> 
> So what Nvidia is doing is plain wrong and people have tried to
> convince them to fix their driver time and again in this forum
> thread:
> https://devtalk.nvidia.com/default/topic/545560/vfio-vga-arbitration-lock/
> 
> You were correct in that you noticed that I/O needs to be locked to
> a device on the root bus for gmux to be accessible. However the lock
> shouldn't be held indefinitely, but only briefly in the gmux_read*(),
> gmux_write*() and gmux_is_indexed() functions. So my suggestion would
> be to revert 4eebd5a4e726, replace it with a new patch that locks IO
> only when needed, and get nvidia to do the same in their driver.

Just changing the locking period looks great when all GPUs are visible,
though for the case when only discrete GPU is visible will need some
extra care (vgaarb can't do so properly right now unless we can cook
something with removing all decodes of discrete GPU and replicate in
private setting up decodes properly for root bus)...

Bruno

> Thanks,
> 
> Lukas
> 
> >   
> > > So what do we do? We need to do something because now we've got the
> > > deadlock regression in 4.6 on top. :(
> > > 
> > > Thanks,
> > > 
> > > Lukas  
> > 
> > Bruno  
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-30 10:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-27 12:23 Deadlock in 4.6 caused by 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes") Lukas Wunner
2016-05-30  9:07 ` Bruno Prémont
2016-05-30  9:07   ` Bruno Prémont
2016-05-30 10:24   ` Lukas Wunner
2016-05-30 10:48     ` Bruno Prémont

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.