All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
@ 2012-07-05  6:31 Henrik Rydberg
  2012-07-05  6:40   ` Ben Skeggs
  2013-06-04 20:48 ` [REGRESSION] nouveau: Resume hung after protecting against client races (MBA3,1) Henrik Rydberg
  0 siblings, 2 replies; 9+ messages in thread
From: Henrik Rydberg @ 2012-07-05  6:31 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Dave Airlie, nouveau, dri-devel, linux-kernel, Henrik Rydberg

Hi Ben, Dave,

Since 3.5-rc0, I have been experiencing occasional screen corruption
on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver
version is xf86-video-nouvea-1.0.1-1 (arch).

I do not know what the root problem is, but I have been able to
isolate the symptoms to the usage of nva3_copy.c. The patch below is
the least intrusive way I could find which kills the symptoms.

Hopefully this will sched some light on the true problem, such that a
fix can be found for 3.5.

Thanks,
Henrik

The nva3 copy engine exhibits random memory corruption in at least one
case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1.  This patch
omits creating the engine for the specific chipset, falling back to
M2MF, which kills the symptoms.
---
 drivers/gpu/drm/nouveau/nouveau_state.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c b/drivers/gpu/drm/nouveau/nouveau_state.c
index 19706f0..b466937 100644
--- a/drivers/gpu/drm/nouveau/nouveau_state.c
+++ b/drivers/gpu/drm/nouveau/nouveau_state.c
@@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev)
 			case 0xa3:
 			case 0xa5:
 			case 0xa8:
-			case 0xaf:
 				nva3_copy_create(dev);
 				break;
 			}


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
  2012-07-05  6:31 [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf Henrik Rydberg
@ 2012-07-05  6:40   ` Ben Skeggs
  2013-06-04 20:48 ` [REGRESSION] nouveau: Resume hung after protecting against client races (MBA3,1) Henrik Rydberg
  1 sibling, 0 replies; 9+ messages in thread
From: Ben Skeggs @ 2012-07-05  6:40 UTC (permalink / raw)
  To: Henrik Rydberg; +Cc: Ben Skeggs, nouveau, linux-kernel, dri-devel

On Thu, Jul 05, 2012 at 08:31:13AM +0200, Henrik Rydberg wrote:
> Hi Ben, Dave,
Hey Henrik,

> 
> Since 3.5-rc0, I have been experiencing occasional screen corruption
> on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver
> version is xf86-video-nouvea-1.0.1-1 (arch).
> 
> I do not know what the root problem is, but I have been able to
> isolate the symptoms to the usage of nva3_copy.c. The patch below is
> the least intrusive way I could find which kills the symptoms.
> 
> Hopefully this will sched some light on the true problem, such that a
> fix can be found for 3.5.
Thanks for tracking down the source of this corruption.  I don't have
any such hardware, so until someone can figure it out, I think we
should apply this patch.

Cheers,
Ben.

> 
> Thanks,
> Henrik
> 
> The nva3 copy engine exhibits random memory corruption in at least one
> case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1.  This patch
> omits creating the engine for the specific chipset, falling back to
> M2MF, which kills the symptoms.
> ---
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

>  drivers/gpu/drm/nouveau/nouveau_state.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c b/drivers/gpu/drm/nouveau/nouveau_state.c
> index 19706f0..b466937 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_state.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_state.c
> @@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev)
>  			case 0xa3:
>  			case 0xa5:
>  			case 0xa8:
> -			case 0xaf:
>  				nva3_copy_create(dev);
>  				break;
>  			}
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
@ 2012-07-05  6:40   ` Ben Skeggs
  0 siblings, 0 replies; 9+ messages in thread
From: Ben Skeggs @ 2012-07-05  6:40 UTC (permalink / raw)
  To: Henrik Rydberg; +Cc: nouveau, Ben Skeggs, dri-devel, linux-kernel

On Thu, Jul 05, 2012 at 08:31:13AM +0200, Henrik Rydberg wrote:
> Hi Ben, Dave,
Hey Henrik,

> 
> Since 3.5-rc0, I have been experiencing occasional screen corruption
> on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver
> version is xf86-video-nouvea-1.0.1-1 (arch).
> 
> I do not know what the root problem is, but I have been able to
> isolate the symptoms to the usage of nva3_copy.c. The patch below is
> the least intrusive way I could find which kills the symptoms.
> 
> Hopefully this will sched some light on the true problem, such that a
> fix can be found for 3.5.
Thanks for tracking down the source of this corruption.  I don't have
any such hardware, so until someone can figure it out, I think we
should apply this patch.

Cheers,
Ben.

> 
> Thanks,
> Henrik
> 
> The nva3 copy engine exhibits random memory corruption in at least one
> case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1.  This patch
> omits creating the engine for the specific chipset, falling back to
> M2MF, which kills the symptoms.
> ---
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

>  drivers/gpu/drm/nouveau/nouveau_state.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c b/drivers/gpu/drm/nouveau/nouveau_state.c
> index 19706f0..b466937 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_state.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_state.c
> @@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev)
>  			case 0xa3:
>  			case 0xa5:
>  			case 0xa8:
> -			case 0xaf:
>  				nva3_copy_create(dev);
>  				break;
>  			}
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
  2012-07-05  6:40   ` Ben Skeggs
  (?)
@ 2012-07-05  6:54   ` Henrik Rydberg
  2012-07-05  8:34     ` Henrik Rydberg
  -1 siblings, 1 reply; 9+ messages in thread
From: Henrik Rydberg @ 2012-07-05  6:54 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Ben Skeggs, nouveau, linux-kernel, dri-devel

> Thanks for tracking down the source of this corruption.  I don't have
> any such hardware, so until someone can figure it out, I think we
> should apply this patch.

In that case, I would have to massage the patch a bit first; it
creates a problem with suspend/resume. Might be something with
nva3_pm.c, who knows. I am really stabbing in the dark here. :-)

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
  2012-07-05  6:54   ` Henrik Rydberg
@ 2012-07-05  8:34     ` Henrik Rydberg
  2012-07-09 13:13       ` Henrik Rydberg
  0 siblings, 1 reply; 9+ messages in thread
From: Henrik Rydberg @ 2012-07-05  8:34 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Ben Skeggs, nouveau, linux-kernel, dri-devel

On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote:
> > Thanks for tracking down the source of this corruption.  I don't have
> > any such hardware, so until someone can figure it out, I think we
> > should apply this patch.
> 
> In that case, I would have to massage the patch a bit first; it
> creates a problem with suspend/resume. Might be something with
> nva3_pm.c, who knows. I am really stabbing in the dark here. :-)

It seems the suspend/resume problem is unrelated (bad systemd update),
so I am fine with applying this as is. Obviously not the best
solution, and if I have time I will continue to look for problems in
the nva3 copy code, but for now,

    Signed-off-by: Henrik Rydberg <rydberg@euromail.se>

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
  2012-07-05  8:34     ` Henrik Rydberg
@ 2012-07-09 13:13       ` Henrik Rydberg
  2012-07-09 18:27         ` Henrik Rydberg
  0 siblings, 1 reply; 9+ messages in thread
From: Henrik Rydberg @ 2012-07-09 13:13 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Ben Skeggs, nouveau, linux-kernel, dri-devel

On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote:
> On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote:
> > > Thanks for tracking down the source of this corruption.  I don't have
> > > any such hardware, so until someone can figure it out, I think we
> > > should apply this patch.
> > 
> > In that case, I would have to massage the patch a bit first; it
> > creates a problem with suspend/resume. Might be something with
> > nva3_pm.c, who knows. I am really stabbing in the dark here. :-)
> 
> It seems the suspend/resume problem is unrelated (bad systemd update),
> so I am fine with applying this as is. Obviously not the best
> solution, and if I have time I will continue to look for problems in
> the nva3 copy code, but for now,
> 
>     Signed-off-by: Henrik Rydberg <rydberg@euromail.se>

I have not encountered the problem in a long while, and I do not have
the patch applied. It is entirely possible that this was fixed by
something else. Unless you have already applied the patch, I would
suggest holding on to it to see if the problem reappears.

Sorry for the churn.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
  2012-07-09 13:13       ` Henrik Rydberg
@ 2012-07-09 18:27         ` Henrik Rydberg
  0 siblings, 0 replies; 9+ messages in thread
From: Henrik Rydberg @ 2012-07-09 18:27 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Ben Skeggs, nouveau, linux-kernel, dri-devel

On Mon, Jul 09, 2012 at 03:13:25PM +0200, Henrik Rydberg wrote:
> On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote:
> > On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote:
> > > > Thanks for tracking down the source of this corruption.  I don't have
> > > > any such hardware, so until someone can figure it out, I think we
> > > > should apply this patch.
> > > 
> > > In that case, I would have to massage the patch a bit first; it
> > > creates a problem with suspend/resume. Might be something with
> > > nva3_pm.c, who knows. I am really stabbing in the dark here. :-)
> > 
> > It seems the suspend/resume problem is unrelated (bad systemd update),
> > so I am fine with applying this as is. Obviously not the best
> > solution, and if I have time I will continue to look for problems in
> > the nva3 copy code, but for now,
> > 
> >     Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> 
> I have not encountered the problem in a long while, and I do not have
> the patch applied. It is entirely possible that this was fixed by
> something else. Unless you have already applied the patch, I would
> suggest holding on to it to see if the problem reappears.
> 
> Sorry for the churn.

... and there it was again, hours after giving up on it. Oh well.

What makes this bug particularly difficult is that as soon as the
patch is applied, the problem disappears and does not show itself
again - with or without the patch applied. Sounds very much like the
problem is a failure state that does not get reset by current
mainline, but somehow gets reset with the patch applied.

I also learnt that the problem is not in the nva3_copy code itself; I
reverted nva3_copy.c and nva3_pm.c back to v3.4, but the problem persisted.

A DMA problem elsewhere, in the drm code or in the pci layer, seems
more likely than this particular hardware having problems with this
particular copy engine. As it stands, though, applying the patch is
the only thing known to work.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [REGRESSION] nouveau: Resume hung after protecting against client races (MBA3,1)
  2012-07-05  6:31 [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf Henrik Rydberg
  2012-07-05  6:40   ` Ben Skeggs
@ 2013-06-04 20:48 ` Henrik Rydberg
  2013-06-04 21:16   ` Ilia Mirkin
  1 sibling, 1 reply; 9+ messages in thread
From: Henrik Rydberg @ 2013-06-04 20:48 UTC (permalink / raw)
  To: Henrik Rydberg; +Cc: Dave Airlie, nouveau, linux-kernel, dri-devel

Hi Ben,

The new mutexes in nvc0/nv50 (fadb17190/b509656) break resume on my
MBA3,1. A dead-lock somewhere, perhaps? Reverting fixes the problem.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] nouveau: Resume hung after protecting against client races (MBA3,1)
  2013-06-04 20:48 ` [REGRESSION] nouveau: Resume hung after protecting against client races (MBA3,1) Henrik Rydberg
@ 2013-06-04 21:16   ` Ilia Mirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Ilia Mirkin @ 2013-06-04 21:16 UTC (permalink / raw)
  To: Henrik Rydberg; +Cc: Dave Airlie, nouveau, linux-kernel, dri-devel

On Tue, Jun 4, 2013 at 4:48 PM, Henrik Rydberg <rydberg@euromail.se> wrote:
> Hi Ben,
>
> The new mutexes in nvc0/nv50 (fadb17190/b509656) break resume on my
> MBA3,1. A dead-lock somewhere, perhaps? Reverting fixes the problem.

A bunch of people saw it earlier. Fixed for nv50 (which is what I
assume you have) in
http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=e9de89adcecb7a1296f5bc4d0052f58e18edd0a8

I assume it's on its way to mainline.

  -ilia

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-06-04 21:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-05  6:31 [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf Henrik Rydberg
2012-07-05  6:40 ` Ben Skeggs
2012-07-05  6:40   ` Ben Skeggs
2012-07-05  6:54   ` Henrik Rydberg
2012-07-05  8:34     ` Henrik Rydberg
2012-07-09 13:13       ` Henrik Rydberg
2012-07-09 18:27         ` Henrik Rydberg
2013-06-04 20:48 ` [REGRESSION] nouveau: Resume hung after protecting against client races (MBA3,1) Henrik Rydberg
2013-06-04 21:16   ` Ilia Mirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.