All of lore.kernel.org
 help / color / mirror / Atom feed
* Major 2.6.38 regression ignored?
@ 2011-05-20 17:06 Luke-Jr
  2011-05-20 18:08 ` Ray Lee
  0 siblings, 1 reply; 84+ messages in thread
From: Luke-Jr @ 2011-05-20 17:06 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, LKML

I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago 
against 2.6.38. Now 2.6.39 was just released without the regression being 
addressed. This bug makes the system unusable... Some guys on IRC suggested I 
email, so here it is.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-20 17:06 Major 2.6.38 regression ignored? Luke-Jr
@ 2011-05-20 18:08 ` Ray Lee
  2011-05-20 20:24   ` Rafael J. Wysocki
  2011-05-21  8:41   ` Chris Wilson
  0 siblings, 2 replies; 84+ messages in thread
From: Ray Lee @ 2011-05-20 18:08 UTC (permalink / raw)
  To: Luke-Jr, chris, Rafael J. Wysocki; +Cc: intel-gfx, LKML, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 477 bytes --]

[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki
to the message ]

On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:

> I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago
> against 2.6.38. Now 2.6.39 was just released without the regression being
> addressed. This bug makes the system unusable... Some guys on IRC suggested
> I
> email, so here it is.
>

See the bugzilla entry for the bisection history.

~r.

[-- Attachment #1.2: Type: text/html, Size: 842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-20 18:08 ` Ray Lee
@ 2011-05-20 20:24   ` Rafael J. Wysocki
  2011-05-20 21:11     ` Ray Lee
  2011-05-21  8:41   ` Chris Wilson
  1 sibling, 1 reply; 84+ messages in thread
From: Rafael J. Wysocki @ 2011-05-20 20:24 UTC (permalink / raw)
  To: Ray Lee; +Cc: Luke-Jr, chris, intel-gfx, dri-devel, LKML

On Friday, May 20, 2011, Ray Lee wrote:
> [ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki
> to the message ]

It is on the list of known regressions from 2.6.37, but we're not tracking
them any more now that 2.6.39 is out.

Thanks,
Rafael


> On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> 
> > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago
> > against 2.6.38. Now 2.6.39 was just released without the regression being
> > addressed. This bug makes the system unusable... Some guys on IRC suggested
> > I
> > email, so here it is.
> >
> 
> See the bugzilla entry for the bisection history.
> 
> ~r.
> 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-20 20:24   ` Rafael J. Wysocki
@ 2011-05-20 21:11     ` Ray Lee
  0 siblings, 0 replies; 84+ messages in thread
From: Ray Lee @ 2011-05-20 21:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Luke-Jr, chris, intel-gfx, dri-devel, LKML

2011/5/20 Rafael J. Wysocki <rjw@sisk.pl>
> It is on the list of known regressions from 2.6.37, but we're not tracking
> them any more now that 2.6.39 is out.

Hopefully Chris is still tracking them, even if you aren't.

Chris? What other information can the affected person provide, or what
tests can he run to help close this out?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-20 18:08 ` Ray Lee
  2011-05-20 20:24   ` Rafael J. Wysocki
@ 2011-05-21  8:41   ` Chris Wilson
  2011-05-21 15:23     ` Luke-Jr
  1 sibling, 1 reply; 84+ messages in thread
From: Chris Wilson @ 2011-05-21  8:41 UTC (permalink / raw)
  To: Ray Lee, Luke-Jr, Rafael J. Wysocki; +Cc: intel-gfx, dri-devel, LKML

On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> [ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki
> to the message ]
> 
> On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> 
> > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago
> > against 2.6.38. Now 2.6.39 was just released without the regression being
> > addressed. This bug makes the system unusable... Some guys on IRC suggested
> > I
> > email, so here it is.
> >
> 
> See the bugzilla entry for the bisection history.

Which has nothing to do with Luke's bug. Considering the thousand things
that can go wrong during X starting, without a hint as to which it is nigh
on impossible to debug except by trial and error. If you set up
netconsole, does the kernel emit an OOPS with it's last dying breath?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-21  8:41   ` Chris Wilson
@ 2011-05-21 15:23     ` Luke-Jr
  2011-05-21 15:40         ` Chris Wilson
  0 siblings, 1 reply; 84+ messages in thread
From: Luke-Jr @ 2011-05-21 15:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Ray Lee, Rafael J. Wysocki, intel-gfx, dri-devel, LKML

On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> > Wysocki to the message ]
> > 
> > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> > > ago against 2.6.38. Now 2.6.39 was just released without the
> > > regression being addressed. This bug makes the system unusable... Some
> > > guys on IRC suggested I
> > > email, so here it is.
> > 
> > See the bugzilla entry for the bisection history.
> 
> Which has nothing to do with Luke's bug. Considering the thousand things
> that can go wrong during X starting, without a hint as to which it is nigh
> on impossible to debug except by trial and error. If you set up
> netconsole, does the kernel emit an OOPS with it's last dying breath?

Why assume it's a different bug? I would almost wonder if it might affect 
all Sandy Bridge GPUs. In any case, I no longer have the original 
motherboard (it was recalled, as I said in the first post), nor even the 
revision of it (it had other issues that weren't being fixed). I *assume* I 
will have the same problem with my new motherboard (Intel DQ67SW), but I 
haven't verified that yet. I'll be sure to try a netconsole when I have to 
reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, 
but at the moment it seems reasonable to address the problem bisected in the 
bug, even if it turns out to be different.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-21 15:23     ` Luke-Jr
@ 2011-05-21 15:40         ` Chris Wilson
  0 siblings, 0 replies; 84+ messages in thread
From: Chris Wilson @ 2011-05-21 15:40 UTC (permalink / raw)
  To: Luke-Jr; +Cc: Ray Lee, Rafael J. Wysocki, intel-gfx, dri-devel, LKML

On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
> On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> > > Wysocki to the message ]
> > > 
> > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> > > > ago against 2.6.38. Now 2.6.39 was just released without the
> > > > regression being addressed. This bug makes the system unusable... Some
> > > > guys on IRC suggested I
> > > > email, so here it is.
> > > 
> > > See the bugzilla entry for the bisection history.
> > 
> > Which has nothing to do with Luke's bug. Considering the thousand things
> > that can go wrong during X starting, without a hint as to which it is nigh
> > on impossible to debug except by trial and error. If you set up
> > netconsole, does the kernel emit an OOPS with it's last dying breath?
> 
> Why assume it's a different bug? I would almost wonder if it might affect 
> all Sandy Bridge GPUs. In any case, I no longer have the original 
> motherboard (it was recalled, as I said in the first post), nor even the 
> revision of it (it had other issues that weren't being fixed). I *assume* I 
> will have the same problem with my new motherboard (Intel DQ67SW), but I 
> haven't verified that yet. I'll be sure to try a netconsole when I have to 
> reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, 
> but at the moment it seems reasonable to address the problem bisected in the 
> bug, even if it turns out to be different.

The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
locking between release and IRQ and so is prone to such races as befell
Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
I can quite confidently state they are separate bugs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
@ 2011-05-21 15:40         ` Chris Wilson
  0 siblings, 0 replies; 84+ messages in thread
From: Chris Wilson @ 2011-05-21 15:40 UTC (permalink / raw)
  To: Luke-Jr; +Cc: Ray Lee, Rafael J. Wysocki, intel-gfx, LKML, dri-devel

On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
> On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> > > Wysocki to the message ]
> > > 
> > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> > > > ago against 2.6.38. Now 2.6.39 was just released without the
> > > > regression being addressed. This bug makes the system unusable... Some
> > > > guys on IRC suggested I
> > > > email, so here it is.
> > > 
> > > See the bugzilla entry for the bisection history.
> > 
> > Which has nothing to do with Luke's bug. Considering the thousand things
> > that can go wrong during X starting, without a hint as to which it is nigh
> > on impossible to debug except by trial and error. If you set up
> > netconsole, does the kernel emit an OOPS with it's last dying breath?
> 
> Why assume it's a different bug? I would almost wonder if it might affect 
> all Sandy Bridge GPUs. In any case, I no longer have the original 
> motherboard (it was recalled, as I said in the first post), nor even the 
> revision of it (it had other issues that weren't being fixed). I *assume* I 
> will have the same problem with my new motherboard (Intel DQ67SW), but I 
> haven't verified that yet. I'll be sure to try a netconsole when I have to 
> reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, 
> but at the moment it seems reasonable to address the problem bisected in the 
> bug, even if it turns out to be different.

The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
locking between release and IRQ and so is prone to such races as befell
Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
I can quite confidently state they are separate bugs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
  2011-05-21 15:40         ` Chris Wilson
@ 2011-05-21 19:33           ` Luke-Jr
  -1 siblings, 0 replies; 84+ messages in thread
From: Luke-Jr @ 2011-05-21 19:33 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Ray Lee, Rafael J. Wysocki, intel-gfx, dri-devel, LKML, perex,
	alsa-devel

On Saturday, May 21, 2011 11:40:17 AM Chris Wilson wrote:
> The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
> locking between release and IRQ and so is prone to such races as befell
> Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
> I can quite confidently state they are separate bugs.

Unfortunately, I cannot help troubleshoot that bug any further, as I no longer 
have the affected motherboard. I was unable to reproduce it on my Intel 
DQ67SW.

However, I did encounter a new regression, which I have reported as:
	https://bugzilla.kernel.org/show_bug.cgi?id=35552
This one is related to Intel HD Audio, not Graphics.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 regression ignored?
@ 2011-05-21 19:33           ` Luke-Jr
  0 siblings, 0 replies; 84+ messages in thread
From: Luke-Jr @ 2011-05-21 19:33 UTC (permalink / raw)
  To: Chris Wilson
  Cc: alsa-devel, Ray Lee, intel-gfx, LKML, dri-devel, perex,
	Rafael J. Wysocki

On Saturday, May 21, 2011 11:40:17 AM Chris Wilson wrote:
> The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
> locking between release and IRQ and so is prone to such races as befell
> Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
> I can quite confidently state they are separate bugs.

Unfortunately, I cannot help troubleshoot that bug any further, as I no longer 
have the affected motherboard. I was unable to reproduce it on my Intel 
DQ67SW.

However, I did encounter a new regression, which I have reported as:
	https://bugzilla.kernel.org/show_bug.cgi?id=35552
This one is related to Intel HD Audio, not Graphics.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 regression ignored?
  2011-05-21 15:40         ` Chris Wilson
  (?)
  (?)
@ 2011-05-28 13:19         ` Kirill Smelkov
  2011-07-12 17:17           ` [Intel-gfx] " Kirill Smelkov
  -1 siblings, 1 reply; 84+ messages in thread
From: Kirill Smelkov @ 2011-05-28 13:19 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Luke-Jr, Ray Lee, Rafael J. Wysocki, intel-gfx, LKML, dri-devel

Hello Chris, everyone,

On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
> On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
> > On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> > > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> > > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> > > > Wysocki to the message ]
> > > > 
> > > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> > > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> > > > > ago against 2.6.38. Now 2.6.39 was just released without the
> > > > > regression being addressed. This bug makes the system unusable... Some
> > > > > guys on IRC suggested I
> > > > > email, so here it is.
> > > > 
> > > > See the bugzilla entry for the bisection history.
> > > 
> > > Which has nothing to do with Luke's bug. Considering the thousand things
> > > that can go wrong during X starting, without a hint as to which it is nigh
> > > on impossible to debug except by trial and error. If you set up
> > > netconsole, does the kernel emit an OOPS with it's last dying breath?
> > 
> > Why assume it's a different bug? I would almost wonder if it might affect 
> > all Sandy Bridge GPUs. In any case, I no longer have the original 
> > motherboard (it was recalled, as I said in the first post), nor even the 
> > revision of it (it had other issues that weren't being fixed). I *assume* I 
> > will have the same problem with my new motherboard (Intel DQ67SW), but I 
> > haven't verified that yet. I'll be sure to try a netconsole when I have to 
> > reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, 
> > but at the moment it seems reasonable to address the problem bisected in the 
> > bug, even if it turns out to be different.
> 
> The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
> locking between release and IRQ and so is prone to such races as befell
> Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
> I can quite confidently state they are separate bugs.
> -Chris

I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok
on kernels < 2.6.38, and starting from 2.6.38 the system is just
unusable because X either crashes the kernel (2.6.38), or does not start
at all (2.6.39):

https://bugzilla.kernel.org/show_bug.cgi?id=36052


It's a regression. It's blocking me to upgrade to newer kernels. I've
done my homework -- digged it and came with detailed OOPS on netconsole
and bisected to single commit. Could this please be fixed?


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 regression ignored?
  2011-05-28 13:19         ` Major 2.6.38 / 2.6.39 " Kirill Smelkov
@ 2011-07-12 17:17           ` Kirill Smelkov
  2011-07-12 18:07               ` Pekka Enberg
  0 siblings, 1 reply; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-12 17:17 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Luke-Jr, intel-gfx, LKML, dri-devel, Rafael J. Wysocki, Ray Lee

On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
> Hello Chris, everyone,
> 
> On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
> > On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
> > > On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> > > > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> > > > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> > > > > Wysocki to the message ]
> > > > > 
> > > > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> > > > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> > > > > > ago against 2.6.38. Now 2.6.39 was just released without the
> > > > > > regression being addressed. This bug makes the system unusable... Some
> > > > > > guys on IRC suggested I
> > > > > > email, so here it is.
> > > > > 
> > > > > See the bugzilla entry for the bisection history.
> > > > 
> > > > Which has nothing to do with Luke's bug. Considering the thousand things
> > > > that can go wrong during X starting, without a hint as to which it is nigh
> > > > on impossible to debug except by trial and error. If you set up
> > > > netconsole, does the kernel emit an OOPS with it's last dying breath?
> > > 
> > > Why assume it's a different bug? I would almost wonder if it might affect 
> > > all Sandy Bridge GPUs. In any case, I no longer have the original 
> > > motherboard (it was recalled, as I said in the first post), nor even the 
> > > revision of it (it had other issues that weren't being fixed). I *assume* I 
> > > will have the same problem with my new motherboard (Intel DQ67SW), but I 
> > > haven't verified that yet. I'll be sure to try a netconsole when I have to 
> > > reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, 
> > > but at the moment it seems reasonable to address the problem bisected in the 
> > > bug, even if it turns out to be different.
> > 
> > The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
> > locking between release and IRQ and so is prone to such races as befell
> > Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
> > I can quite confidently state they are separate bugs.
> > -Chris
> 
> I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok
> on kernels < 2.6.38, and starting from 2.6.38 the system is just
> unusable because X either crashes the kernel (2.6.38), or does not start
> at all (2.6.39):
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=36052
> 
> 
> It's a regression. It's blocking me to upgrade to newer kernels. I've
> done my homework -- digged it and came with detailed OOPS on netconsole
> and bisected to single commit. Could this please be fixed?

Silence...

Still, reverting the bisected patch helps even for 3.0:

https://bugzilla.kernel.org/show_bug.cgi?id=36052#c4

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 regression ignored?
  2011-07-12 17:17           ` [Intel-gfx] " Kirill Smelkov
@ 2011-07-12 18:07               ` Pekka Enberg
  0 siblings, 0 replies; 84+ messages in thread
From: Pekka Enberg @ 2011-07-12 18:07 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton

On Tue, Jul 12, 2011 at 8:17 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
>> Hello Chris, everyone,
>>
>> On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
>> > On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
>> > > On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
>> > > > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
>> > > > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
>> > > > > Wysocki to the message ]
>> > > > >
>> > > > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
>> > > > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
>> > > > > > ago against 2.6.38. Now 2.6.39 was just released without the
>> > > > > > regression being addressed. This bug makes the system unusable... Some
>> > > > > > guys on IRC suggested I
>> > > > > > email, so here it is.
>> > > > >
>> > > > > See the bugzilla entry for the bisection history.
>> > > >
>> > > > Which has nothing to do with Luke's bug. Considering the thousand things
>> > > > that can go wrong during X starting, without a hint as to which it is nigh
>> > > > on impossible to debug except by trial and error. If you set up
>> > > > netconsole, does the kernel emit an OOPS with it's last dying breath?
>> > >
>> > > Why assume it's a different bug? I would almost wonder if it might affect
>> > > all Sandy Bridge GPUs. In any case, I no longer have the original
>> > > motherboard (it was recalled, as I said in the first post), nor even the
>> > > revision of it (it had other issues that weren't being fixed). I *assume* I
>> > > will have the same problem with my new motherboard (Intel DQ67SW), but I
>> > > haven't verified that yet. I'll be sure to try a netconsole when I have to
>> > > reboot next and get a chance to try the most recent 2.6.38 and .39 kernels,
>> > > but at the moment it seems reasonable to address the problem bisected in the
>> > > bug, even if it turns out to be different.
>> >
>> > The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
>> > locking between release and IRQ and so is prone to such races as befell
>> > Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
>> > I can quite confidently state they are separate bugs.
>> > -Chris
>>
>> I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok
>> on kernels < 2.6.38, and starting from 2.6.38 the system is just
>> unusable because X either crashes the kernel (2.6.38), or does not start
>> at all (2.6.39):
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=36052
>>
>>
>> It's a regression. It's blocking me to upgrade to newer kernels. I've
>> done my homework -- digged it and came with detailed OOPS on netconsole
>> and bisected to single commit. Could this please be fixed?
>
> Silence...
>
> Still, reverting the bisected patch helps even for 3.0:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=36052#c4

Keith, Chris, what's up with this regression from 2.6.38? It seems
commit e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths")
caused problems on other machines.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 regression ignored?
@ 2011-07-12 18:07               ` Pekka Enberg
  0 siblings, 0 replies; 84+ messages in thread
From: Pekka Enberg @ 2011-07-12 18:07 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton

On Tue, Jul 12, 2011 at 8:17 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
>> Hello Chris, everyone,
>>
>> On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
>> > On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
>> > > On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
>> > > > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
>> > > > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
>> > > > > Wysocki to the message ]
>> > > > >
>> > > > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
>> > > > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
>> > > > > > ago against 2.6.38. Now 2.6.39 was just released without the
>> > > > > > regression being addressed. This bug makes the system unusable... Some
>> > > > > > guys on IRC suggested I
>> > > > > > email, so here it is.
>> > > > >
>> > > > > See the bugzilla entry for the bisection history.
>> > > >
>> > > > Which has nothing to do with Luke's bug. Considering the thousand things
>> > > > that can go wrong during X starting, without a hint as to which it is nigh
>> > > > on impossible to debug except by trial and error. If you set up
>> > > > netconsole, does the kernel emit an OOPS with it's last dying breath?
>> > >
>> > > Why assume it's a different bug? I would almost wonder if it might affect
>> > > all Sandy Bridge GPUs. In any case, I no longer have the original
>> > > motherboard (it was recalled, as I said in the first post), nor even the
>> > > revision of it (it had other issues that weren't being fixed). I *assume* I
>> > > will have the same problem with my new motherboard (Intel DQ67SW), but I
>> > > haven't verified that yet. I'll be sure to try a netconsole when I have to
>> > > reboot next and get a chance to try the most recent 2.6.38 and .39 kernels,
>> > > but at the moment it seems reasonable to address the problem bisected in the
>> > > bug, even if it turns out to be different.
>> >
>> > The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
>> > locking between release and IRQ and so is prone to such races as befell
>> > Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
>> > I can quite confidently state they are separate bugs.
>> > -Chris
>>
>> I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok
>> on kernels < 2.6.38, and starting from 2.6.38 the system is just
>> unusable because X either crashes the kernel (2.6.38), or does not start
>> at all (2.6.39):
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=36052
>>
>>
>> It's a regression. It's blocking me to upgrade to newer kernels. I've
>> done my homework -- digged it and came with detailed OOPS on netconsole
>> and bisected to single commit. Could this please be fixed?
>
> Silence...
>
> Still, reverting the bisected patch helps even for 3.0:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=36052#c4

Keith, Chris, what's up with this regression from 2.6.38? It seems
commit e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths")
caused problems on other machines.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Linux 3.0 release
@ 2011-07-22  2:59               ` Linus Torvalds
  2011-07-22 11:08                   ` Kirill Smelkov
                                   ` (4 more replies)
  0 siblings, 5 replies; 84+ messages in thread
From: Linus Torvalds @ 2011-07-22  2:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.

This obviously also opens the merge window for the next kernel, which
will be 3.1. The stable team will take the third digit, so 3.0.1 will
be the first stable release based on 3.0.

As already mentioned several times, there are no special landmark
features or incompatibilities related to the version number change,
it's simply a way to drop an inconvenient numbering system in honor of
twenty years of Linux. In fact, the 3.0 merge window was calmer than
most, and apart from some excitement from RCU I'd have called it
really smooth. Which is not to say that there may not be bugs, but if
anything, there are hopefully fewer than usual, rather than the normal
".0" problems.

And as I already mentioned yesterday, I'm hoping the 3.1 merge window
will be calm too, because due to the delays the latter half of the
merge window will fall into my vacation time. I briefly considered
simply waiting two extra weeks, but quite frankly, that wouldn't
really have solved anything (it would have made the merge window
instead fall into LinuxCon and my divemaster weekends).

So I'm going to try to keep to the normal two-week merge window, but
if it ends up being too busy for me to keep up, I may end up extending
the window just so that I can merge everything. However, even if that
happens, that will *not* mean that I will accept big pull requests for
longer, it just means that I may end up delaying things to catch up
with timely merge requests.

That said, judging by past experience, the summer merge windows often
tend to be quieter, so maybe I worry needlessly. Much of Europe is
starting to go on vacation, and parts of the US are being fried to a
crisp, so maybe 3.1 will be calm too.

Anyway, what has changed since -rc7 is mainly some RCU interactions
with the scheduler, and the RCU problems should hopefully be behind
us. The pathname lookup race is also fixed. There's a few DRI fixes
(i915 modesetting, and some Radeon fixes), and Al walked through some
more esoteric VFS d_lock issues. Other than that it's really pretty
small and random.

The shortlog from -rc7 is appended, the bigger "everything since
2.6.39" list is obviously unmanageable.

                                Linus

---

Akinobu Mita (1):
      fs/libfs.c: fix simple_attr_write() on 32bit machines

Al Viro (10):
      Fix ->d_lock locking order in unlazy_walk()
      fix loop checks in d_materialise_unique()
      cifs: build_path_from_dentry() race fix
      ceph analog of cifs build_path_from_dentry() race fix
      fix exofs ->get_parent()
      ufs should use d_splice_alias()
      cramfs: get_cramfs_inode() returns ERR_PTR() on failure
      hppfs: fix dentry leak
      hppfs_lookup(): don't open-code lookup_one_len()
      Fix cifs_get_root()

Alex Deucher (6):
      drm/radeon/kms: fix regression in hotplug
      drm/radeon/kms: fix backend map typo on juniper
      drm/radeon/kms: use correct BUS_CNTL reg on rs600
      drm/radeon/kms: fix typo in read_disabled vbios code
      drm/radeon/kms/evergreen: emit SQ_LDS_RESOURCE_MGMT for blits
      drm/radeon/kms: add new NI pci ids

Andy Adamson (1):
      NFSv4.1: update nfs4_fattr_bitmap_maxsz

Axel Lin (1):
      gpio: wm831x: add a missing break in wm831x_gpio_dbg_show

Ben Greear (1):
      SUNRPC: Fix use of static variable in rpcb_getport_async

Benjamin Herrenschmidt (2):
      mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
      powerpc/mm: Fix memory_block_size_bytes() for non-pseries

Benjamin Marzinski (1):
      GFS2: force a log flush when invalidating the rindex glock

Boaz Harrosh (1):
      pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs

Chris Wilson (3):
      drm/i915/ringbuffer: Idling requires waiting for the ring to be empty
      agp/intel: Fix typo in G4x_GMCH_SIZE_VT_2M
      drm/i915: Fix unfenced alignment on pre-G33 hardware

Christian Lamparter (1):
      carl9170: add NEC WL300NU-AG usbid

Dan Rosenberg (1):
      Bluetooth: Prevent buffer overflow in l2cap config request

Daniel J Blueman (1):
      x86: Make Dell Latitude E5420 use reboot=pci

Daniel Mack (1):
      ARM: pxa/raumfeld: fix device name for codec ak4104

Darren Hart (1):
      x86, doc only: Correct real-mode kernel header offset for init_size

David S. Miller (2):
      net: Fix default in docs for tcp_orphan_retries.
      pppoe: Must flush connections when MAC address changes too.

Devin Heitmueller (1):
      [media] dvb_frontend: fix race condition in stopping/starting frontend

Greg Kroah-Hartman (1):
      hso: fix a use after free condition

Guenter Roeck (2):
      hwmon: (pmbus) Use long variables for register to data conversions
      hwmon: (adm1275) Fix coefficients per datasheet revision B

Gustavo F. Padovan (2):
      Bluetooth: Fix regression with incoming L2CAP connections
      Bluetooth: Fix regression in L2CAP connection procedure

H. Peter Anvin (1):
      x86: Make Dell Latitude E6420 use reboot=pci

Huang Ying (1):
      ACPI, APEI, HEST, Detect duplicated hardware error source ID

Ido Yariv (1):
      arm: davinci: Fix low level gpio irq handlers' argument

Ilia Kolomisnky (1):
      Bluetooth: Fix crash with incoming L2CAP connections

Jan Beulich (1):
      FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop

Jarod Wilson (2):
      [media] Revert "V4L/DVB: cx23885: Enable Message Signaled Interrupts(MSI)"
      [media] nuvoton-cir: make idle timeout more sane

Jason Wessel (1):
      sparc,kgdbts: fix compile regression with kgdb test suite

Jean Delvare (2):
      net/natsemi: Fix module parameter permissions
      hwmon: (it87) Fix label group removal

Jesse Barnes (7):
      drm/i915/dp: retry link status read 3 times on failure
      drm/i915/dp: use DP DPCD defines when looking at DPCD values
      drm/i915/dp: read more receiver capability bits on hotplug
      drm/i915/dp: try to read receiver capabilities 3 times when detecting
      drm/i915/dp: remove DPMS mode tracking from DP
      drm/i915/dp: consolidate AUX retry code
      drm/i915/dp: manage sink power state if possible

Jim Cromie (1):
      natsemi: fix another dma-debug report

Joe Perches (1):
      tulip: dmfe: Remove old log spamming pr_debugs

Johannes Berg (1):
      mac80211: fix TKIP replay vulnerability

Jon Povey (1):
      davinci: DM365 EVM: fix video input mux bits

Jonathan Cameron (1):
      pcmcia: pxa2xx/vpac270: free gpios on exist rather than requesting

Keith Packard (3):
      drm/i915: Clean up i915_driver_load failure path
      Revert "drm/i915: enable rc6 by default"
      drm/i915: Add quirk to disable SSC on Lenovo U160 LVDS

Kenneth Graunke (1):
      drm/i915: Enable GPU reset on Ivybridge.

Kuninori Morimoto (3):
      ASoC: sh: fsi-ak4642: fixup snd_soc_card name
      ASoC: sh: fsi-da7210: fixup snd_soc_card name
      ASoC: sh: fsi-hdmi: fixup snd_soc_card name

Lei Wen (2):
      ARM: pxa910: correct nand pmu setting
      ARM: pxa168: correct nand pmu setting

Lin Ming (1):
      ACPI: Fixes device power states array overflow

Linus Torvalds (3):
      vfs: fix race in rcu lookup of pruned dentry
      vfs: drop conditional inode prefetch in __do_lookup_rcu
      Linux 3.0

Linus Walleij (1):
      ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace

Luca Tettamanti (1):
      hwmon: (asus_atk0110) Fix memory leak

Luciano Coelho (2):
      cfg80211: fix deadlock with rfkill/sched_scan by adding new mutex
      mac80211: fix ie memory allocation for scheduled scans

Mark Brown (2):
      ASoC: Fix shift in WM8958 accessory detection default implementation
      ASoC: Correct WM8994 MICBIAS supply widget hookup

Matthias Rosenfelder (1):
      sparc32,leon: Added __init declaration to leon_flush_needed()

Matvejchikov Ilya (1):
      slip: fix wrong SLIP6 ifdef-endif placing

Mauro Carvalho Chehab (2):
      [media] tuner-core: fix a 2.6.39 regression with mt20xx
      si4713-i2c: avoid potential buffer overflow on si4713

Maxime Ripard (1):
      x86. reboot: Make Dell Latitude E6320 use reboot=pci

Michael Thalmeier (1):
      r6040: only disable RX interrupt if napi_schedule_prep is successful

Michal Marek (1):
      kbuild: Do not write to builddir in modules_install

Michał Mirosław (2):
      net: sctp: fix checksum marking for outgoing packets
      net: remove NETIF_F_ALL_TX_OFFLOADS

Olaf Hering (1):
      watchdog: fix hpwdt Kconfig regression in 3.0-rc

Paul E. McKenney (6):
      rcu: Prevent RCU callbacks from executing before scheduler initialized
      rcu: decrease rcu_report_exp_rnp coupling with scheduler
      rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special
      rcu: Streamline code produced by __rcu_read_unlock()
      rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
      signal: align __lock_task_sighand() irq disabling and RCU

Paul Parsons (1):
      ARM: pxa: fix PGSR register address calculation

Pavel Herrmann (1):
      hwmon: (max1111) Fix race condition causing NULL pointer exception

Pavel Roskin (2):
      ath5k: fix incorrect use of drvdata in sysfs code
      ath5k: fix incorrect use of drvdata in PCI suspend/resume code

Pavel Shilovsky (1):
      CIFS: Fix wrong length in cifs_iovec_read

Peter Hurley (1):
      Bluetooth: Fix hidp disconnect deadlocks and lost wakeup

Peter Zijlstra (6):
      sched: Fix 32bit race
      sched: Break out cpu_power from the sched_group structure
      sched: Allow for overlapping sched_domain spans
      sched: Avoid creating superfluous NUMA domains on non-NUMA systems
      sched: Add irq_{enter,exit}() to scheduler_ipi()
      softirq,rcu: Inform RCU of irq_exit() activity

Philip Rakity (1):
      mmc: core: Bus width testing needs to handle suspend/resume

Rafael J. Wysocki (2):
      ACPI: Fix lockdep false positives in acpi_power_off()
      PM / MIPS: Convert i8259.c to using syscore_ops

Rafał Miłecki (1):
      ssb: fix init regression of hostmode PCI core

Rafi Rubin (2):
      [media] mceusb: Timeout unit corrections
      [media] mceusb: increase default timeout to 100ms

Rajkumar Manoharan (1):
      ath9k: Fix tx throughput drops for AR9003 chips with AES encryption

Ralf Baechle (1):
      [media] MEDIA: Fix non-ISA_DMA_API link failure of sound code

Randy Dunlap (2):
      [media] media: fix radio-sf16fmr2 build when SND is not enabled
      watchdog: hpwdt depends on PCI

Richard Cochran (1):
      ARM: fix regression in IXP4xx clocksource

Ryusuke Konishi (1):
      nilfs2: remove resize from unsupported features list

Sage Weil (1):
      ceph: fix file mode calculation

Sangwook Lee (1):
      ARM: SAMSUNG: DMA Cleanup as per sparse

Sebastian Pöhn (1):
      gianfar: rx parser

Shaohua Li (1):
      vmscan: fix a livelock in kswapd

Shirish Pargaonkar (1):
      cifs: Fix signing failure when server mandates signing for NTLMSSP

Simon Guinot (1):
      genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd)

Steve French (2):
      [CIFS] update limit for snprintf in cifs_construct_tcon
      [CIFS] update cifs to version 1.74

Steven Rostedt (1):
      sparc/irqs: Do not trace arch_local_{*,irq_*} functions

Steven Whitehouse (2):
      GFS2: Fix race during filesystem mount
      GFS2: Resolve inode eviction and ail list interaction bug

Sven Neumann (2):
      ARM: pxa/raumfeld: adapt to upcoming hardware change
      ARM: pxa/raumfeld: display initialisation fixes

Tejun Heo (1):
      x86: Disable AMD_NUMA for 32bit for now

Thomas Graf (2):
      sctp: Enforce retransmission limit during shutdown
      sctp: ABORT if receive, reassmbly, or reodering queue is not
empty while closing socket

Todd Poynor (2):
      ARM: SAMSUNG: Check NULL return from irq_alloc_generic_chip
      ARM: davinci: Check for NULL return from irq_alloc_generic_chip

Tomas Targownik (1):
      Bluetooth: Fix memory leak under page timeouts

Trond Myklebust (1):
      SUNRPC: Fix a race between work-queue and rpc_killall_tasks

Tushar Gohad (1):
      XFRM: Fix memory leak in xfrm_state_update

WANG Cong (1):
      include/linux/sdla.h: remove the prototype of sdla()

Will Simoneau (1):
      sparc: sun4m SMP: fix wrong shift instruction in IPI handler

Wolfram Sang (1):
      arm: mach-vt8500: add forgotten irq_data conversion

Yoann DI-RUZZA (1):
      rtlwifi: rtl8192cu: Add new USB ID for Netgear WNA1000M

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22  2:59               ` Linux 3.0 release Linus Torvalds
@ 2011-07-22 11:08                   ` Kirill Smelkov
  2011-07-22 12:52                 ` Linux 3.0 release Martin Knoblauch
                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 11:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler, Keith Packard

 [ Cc'ing Florian Mickler and Keith Packard ]

On Tue, Jul 12, 2011 at 09:07:47PM +0300, Pekka Enberg wrote:
> On Tue, Jul 12, 2011 at 8:17 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
> >> Hello Chris, everyone,
> >>
> >> On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
> >> > On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
> >> > > On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> >> > > > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> >> > > > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> >> > > > > Wysocki to the message ]
> >> > > > >
> >> > > > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> >> > > > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> >> > > > > > ago against 2.6.38. Now 2.6.39 was just released without the
> >> > > > > > regression being addressed. This bug makes the system unusable... Some
> >> > > > > > guys on IRC suggested I
> >> > > > > > email, so here it is.
> >> > > > >
> >> > > > > See the bugzilla entry for the bisection history.
> >> > > >
> >> > > > Which has nothing to do with Luke's bug. Considering the thousand things
> >> > > > that can go wrong during X starting, without a hint as to which it is nigh
> >> > > > on impossible to debug except by trial and error. If you set up
> >> > > > netconsole, does the kernel emit an OOPS with it's last dying breath?
> >> > >
> >> > > Why assume it's a different bug? I would almost wonder if it might affect
> >> > > all Sandy Bridge GPUs. In any case, I no longer have the original
> >> > > motherboard (it was recalled, as I said in the first post), nor even the
> >> > > revision of it (it had other issues that weren't being fixed). I *assume* I
> >> > > will have the same problem with my new motherboard (Intel DQ67SW), but I
> >> > > haven't verified that yet. I'll be sure to try a netconsole when I have to
> >> > > reboot next and get a chance to try the most recent 2.6.38 and .39 kernels,
> >> > > but at the moment it seems reasonable to address the problem bisected in the
> >> > > bug, even if it turns out to be different.
> >> >
> >> > The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
> >> > locking between release and IRQ and so is prone to such races as befell
> >> > Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
> >> > I can quite confidently state they are separate bugs.
> >> > -Chris
> >>
> >> I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok
> >> on kernels < 2.6.38, and starting from 2.6.38 the system is just
> >> unusable because X either crashes the kernel (2.6.38), or does not start
> >> at all (2.6.39):
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=36052
> >>
> >>
> >> It's a regression. It's blocking me to upgrade to newer kernels. I've
> >> done my homework -- digged it and came with detailed OOPS on netconsole
> >> and bisected to single commit. Could this please be fixed?
> >
> > Silence...
> >
> > Still, reverting the bisected patch helps even for 3.0:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=36052#c4
> 
> Keith, Chris, what's up with this regression from 2.6.38? It seems
> commit e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths")
> caused problems on other machines.

Silence again, and not surprising -- I was ringing this bell for 3
months already:

https://bugzilla.kernel.org/show_bug.cgi?id=33662#c10
https://bugzilla.kernel.org/show_bug.cgi?id=36052
(and on the list)

with detailed logs and bisected single patch, without even single reply
from intel-gfx people.


And now after v3.0 is out, I've tested it again, and yes, like it was
broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
bad io access the system freezes completely:

    On netconsole:

    # X starts here, then

    [   45.102377] ------------[ cut here ]------------
    [   45.102402] WARNING: at lib/iomap.c:43 bad_io_access+0x3d/0x40()
    [   45.102411] Hardware name: PCISA-945GSE
    [   45.102418] Bad IO access at port 0x84 (return inl(port))
    [   45.102425] Modules linked in: 
    [   45.102438] Pid: 2846, comm: sshd Not tainted 3.0.0--NAVY #33
    [   45.102445] Call Trace:
    [   45.102460]  [<c118e9fd>] ? bad_io_access+0x3d/0x40
    [   45.102473]  [<c10287ec>] warn_slowpath_common+0x6c/0xa0
    [   45.102484]  [<c118e9fd>] ? bad_io_access+0x3d/0x40
    [   45.102495]  [<c102889e>] warn_slowpath_fmt+0x2e/0x30
    [   45.102506]  [<c118e9fd>] bad_io_access+0x3d/0x40
    [   45.102516]  [<c118edb2>] ioread32+0x22/0x40
    [   45.102528]  [<c122cc7d>] i915_driver_irq_handler+0x1ad/0x660
    [   45.102541]  [<c12c6a7e>] ? rtl8169_interrupt+0xee/0x370
    [   45.102554]  [<c105c396>] handle_irq_event_percpu+0x36/0x140
    [   45.102565]  [<c105e490>] ? handle_edge_irq+0x150/0x150
    [   45.102576]  [<c105c4d9>] handle_irq_event+0x39/0x60
    [   45.102587]  [<c105e4d5>] handle_fasteoi_irq+0x45/0xd0
    [   45.102594]  <IRQ>   [<c1003c29>] ? do_IRQ+0x39/0xb0
    [   45.102613]  [<c103c9b3>] ? start_flush_work+0xc3/0x130
    [   45.102625]  [<c13bc329>] ? common_interrupt+0x29/0x30
    [   45.102636]  [<c13bc329>] ? common_interrupt+0x29/0x30
    [   45.102648]  [<c11e007b>] ? pnpacpi_encode_resources+0x37b/0x7a0
    [   45.102659]  [<c109971e>] ? fget_light+0xe/0xf0
    [   45.102671]  [<c10a8f97>] ? do_select+0x2e7/0x680
    [   45.102685]  [<c1341998>] ? sch_direct_xmit+0x58/0x1d0
    [   45.102695]  [<c10a83e0>] ? poll_freewait+0xa0/0xa0
    [   45.102706]  [<c102df37>] ? local_bh_enable+0x47/0xa0
    [   45.102718]  [<c132e371>] ? dev_queue_xmit+0x101/0x4e0
    [   45.102729]  [<c134ffba>] ? ip_finish_output+0x10a/0x2f0
    [   45.102740]  [<c1350216>] ? ip_output+0x76/0x90
    [   45.102750]  [<c134d715>] ? ip_local_out+0x65/0x70
    [   45.102762]  [<c134fa3d>] ? ip_queue_xmit+0x1bd/0x3b0
    [   45.102775]  [<c1362af8>] ? tcp_transmit_skb+0x468/0x7d0
    [   45.102788]  [<c13215af>] ? sk_reset_timer+0xf/0x20
    [   45.102798]  [<c1362446>] ? tcp_event_new_data_sent+0x86/0xc0
    [   45.102809]  [<c1364fc1>] ? tcp_write_xmit+0x1e1/0x9a0
    [   45.102822]  [<c1326925>] ? __alloc_skb+0x55/0x100
    [   45.102838]  [<c102df37>] ? local_bh_enable+0x47/0xa0
    [   45.102849]  [<c1321246>] ? release_sock+0xd6/0x110
    [   45.102859]  [<c13657f7>] ? __tcp_push_pending_frames+0x27/0x80
    [   45.102870]  [<c13584fa>] ? tcp_sendmsg+0x64a/0xac0

    -*- and then system FREEZE -*-


For completeness `X -verbose` log is in "Appendix 1", (but who cares
anyway? I've sent lots of such logs without a reply).


And again, after reverting e8616b6 ("drm/i915: Initialise ring vfuncs
for old DRI paths") on top of v3.0, X works without any problem again.


So I wonder:

    I thought people are trying to do "no regressions" rule in kernel.
    Should we then just apply the following patch? In case Intel people
    are not responding, should it just go directly into mainline?

    Or would it be more fair to say that UMS is not supported anymore,
    is broken and just remove support for it?


Thanks,
Kirill


P.S. Sometimes people change their hardware preferences based on
software support quality. Knock, knock...


>From ef91a178e6069ae07c7a3c1e39e13eea609953cd Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@mns.spb.ru>
Date: Wed, 29 Jun 2011 14:22:49 +0400
Subject: [PATCH] Revert "drm/i915: Initialise ring vfuncs for old DRI paths"

This reverts commit e8616b6ced6137085e6657cc63bc2fe3900b8616.

See https://bugzilla.kernel.org/show_bug.cgi?id=36052

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Florian Mickler <florian@mickler.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org

---
 drivers/gpu/drm/i915/i915_dma.c         |   25 +++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c |   42 -------------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 --
 3 files changed, 18 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 296fbd6..9300d18 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -160,7 +160,7 @@ static int i915_initialize(struct drm_device * dev, drm_i915_init_t * init)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-	int ret;
+	struct intel_ring_buffer *ring = LP_RING(dev_priv);
 
 	master_priv->sarea = drm_getsarea(dev);
 	if (master_priv->sarea) {
@@ -171,22 +171,33 @@ static int i915_initialize(struct drm_device * dev, drm_i915_init_t * init)
 	}
 
 	if (init->ring_size != 0) {
-		if (LP_RING(dev_priv)->obj != NULL) {
+		if (ring->obj != NULL) {
 			i915_dma_cleanup(dev);
 			DRM_ERROR("Client tried to initialize ringbuffer in "
 				  "GEM mode\n");
 			return -EINVAL;
 		}
 
-		ret = intel_render_ring_init_dri(dev,
-						 init->ring_start,
-						 init->ring_size);
-		if (ret) {
+		ring->size = init->ring_size;
+
+		ring->map.offset = init->ring_start;
+		ring->map.size = init->ring_size;
+		ring->map.type = 0;
+		ring->map.flags = 0;
+		ring->map.mtrr = 0;
+
+		drm_core_ioremap_wc(&ring->map, dev);
+
+		if (ring->map.handle == NULL) {
 			i915_dma_cleanup(dev);
-			return ret;
+			DRM_ERROR("can not ioremap virtual address for"
+				  " ring buffer\n");
+			return -ENOMEM;
 		}
 	}
 
+	ring->virtual_start = ring->map.handle;
+
 	dev_priv->cpp = init->cpp;
 	dev_priv->back_offset = init->back_offset;
 	dev_priv->front_offset = init->front_offset;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 95c4b14..8d2f610 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1304,48 +1304,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	return intel_init_ring_buffer(dev, ring);
 }
 
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
-{
-	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
-
-	*ring = render_ring;
-	if (INTEL_INFO(dev)->gen >= 6) {
-		ring->add_request = gen6_add_request;
-		ring->irq_get = gen6_render_ring_get_irq;
-		ring->irq_put = gen6_render_ring_put_irq;
-	} else if (IS_GEN5(dev)) {
-		ring->add_request = pc_render_add_request;
-		ring->get_seqno = pc_render_get_seqno;
-	}
-
-	ring->dev = dev;
-	INIT_LIST_HEAD(&ring->active_list);
-	INIT_LIST_HEAD(&ring->request_list);
-	INIT_LIST_HEAD(&ring->gpu_write_list);
-
-	ring->size = size;
-	ring->effective_size = ring->size;
-	if (IS_I830(ring->dev))
-		ring->effective_size -= 128;
-
-	ring->map.offset = start;
-	ring->map.size = size;
-	ring->map.type = 0;
-	ring->map.flags = 0;
-	ring->map.mtrr = 0;
-
-	drm_core_ioremap_wc(&ring->map, dev);
-	if (ring->map.handle == NULL) {
-		DRM_ERROR("can not ioremap virtual address for"
-			  " ring buffer\n");
-		return -ENOMEM;
-	}
-
-	ring->virtual_start = (void __force __iomem *)ring->map.handle;
-	return 0;
-}
-
 int intel_init_bsd_ring_buffer(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39ac2b6..b6b0fd4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -197,7 +197,4 @@ static inline void i915_trace_irq_get(struct intel_ring_buffer *ring, u32 seqno)
 		ring->trace_irq_seqno = seqno;
 }
 
-/* DRI warts */
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size);
-
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
1.7.6.233.gd79bc






Appendix 1. `X -verbose` log
----------------------------

# same, starting X over ssh
navy3:~# X -verbose
_XSERVTransSocketOpenCOTSServer: Unable to open socket for inet6
_XSERVTransOpen: transport open failed for inet6/navy3:0
_XSERVTransMakeAllCOTSServerListeners: failed to open listener for inet6

X.Org X Server 1.4.2
Release Date: 11 June 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux Debian (xorg-server 2:1.4.2-10.lenny3)
Current Operating System: Linux navy3 3.0.0--NAVY #33 PREEMPT Fri Jul 22 13:56:40 MSD 2011 i68
6
Build Date: 25 September 2010  12:05:44PM
 
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat Jan 19 21:24:12 2002
(==) Using config file: "/etc/X11/xorg.conf"
(==) ServerLayout "Default Layout"
(**) |-->Screen "Default Screen" (0)
(**) |   |-->Monitor "LCD 1600x1200"
(==) No device specified for screen "Default Screen".
        Using the first device section listed.
(**) |   |-->Device "Default card"
(**) |-->Input Device "Generic Keyboard"
(**) |-->Input Device "Configured Mouse"
(==) Automatically adding devices
(==) Automatically enabling devices
(==) No FontPath specified.  Using compiled-in default.
(WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
        Entry deleted from font path.
(WW) The directory "/var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType" does not exist.
        Entry deleted from font path.
(==) FontPath set to:
        /usr/share/fonts/X11/misc,
        /usr/share/fonts/X11/cyrillic
(==) RgbPath set to "/etc/X11/rgb"
(==) ModulePath set to "/usr/lib/xorg/modules"
(II) Loading /usr/lib/xorg/modules//libpcidata.so
(II) Module pcidata: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(--) using VT number 7

(--) PCI:*(0:2:0) Intel Corporation Mobile 945GME Express Integrated Graphics Controller rev 3
, Mem @ 0xfe980000/19, 0xd0000000/28, 0xfe940000/18, I/O @ 0xbc80/3
(--) PCI: (0:2:1) Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphi
cs Controller rev 3, Mem @ 0xfe880000/19
(II) "extmod" will be loaded. This was enabled by default and also specified in the config fil
e.
(II) "dbe" will be loaded. This was enabled by default and also specified in the config file.
(II) "glx" will be loaded by default.
(II) "freetype" will be loaded. This was enabled by default and also specified in the config f
ile.
(II) "record" will be loaded. This was enabled by default and also specified in the config fil
e.
(II) "dri" will be loaded by default.
(II) Loading /usr/lib/xorg/modules/extensions//libdbe.so
(II) Module dbe: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Module "ddc" already built-in
(II) Loading /usr/lib/xorg/modules/extensions//libextmod.so
(II) Module extmod: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Loading /usr/lib/xorg/modules//fonts/libfreetype.so
(II) Module freetype: vendor="X.Org Foundation & the After X-TT Project"
        compiled for 1.4.2, module version = 2.1.0
(II) Loading /usr/lib/xorg/modules//libint10.so
(II) Module int10: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Loading /usr/lib/xorg/modules/extensions//librecord.so
(II) Module record: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.13.0
(II) Loading /usr/lib/xorg/modules//libvbe.so
(II) Module vbe: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.1.0
(II) Loading /usr/lib/xorg/modules/drivers//v4l_drv.so
(II) Module v4l: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 0.1.1
(II) Loading /usr/lib/xorg/modules/extensions//libglx.so
(II) Module glx: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(==) AIGLX enabled
(II) Loading /usr/lib/xorg/modules/extensions//libdri.so
(II) Module dri: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Matched intel from file name intel.ids in autoconfig
(==) Matched intel for the autoconfigured driver
(==) Assigned the driver to the xf86ConfigLayout
(II) Loading /usr/lib/xorg/modules/drivers//intel_drv.so
(II) Module intel: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 2.3.2
(II) Loading /usr/lib/xorg/modules/input//kbd_drv.so
(II) Module kbd: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 1.3.1
(II) Loading /usr/lib/xorg/modules/input//mouse_drv.so
(II) Module mouse: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 1.3.0
(II) v4l driver for Video4Linux
(II) intel: Driver for Intel Integrated Graphics Chipsets: i810,
        i810-dc100, i810e, i815, i830M, 845G, 852GM/855GM, 865G, 915G,
        E7221 (i915), 915GM, 945G, 945GM, 945GME, 965G, G35, 965Q, 946GZ,
        965GM, 965GME/GLE, G33, Q35, Q33,
        Mobile Intel® GM45 Express Chipset,
        Intel Integrated Graphics Device, G45/G43, Q45/Q43, G41
(--) Assigning device section with no busID to primary device
(WW) intel: No matching Device section for instance (BusID PCI:0:2:1) found
(--) Chipset 945GME found
(II) Loading /usr/lib/xorg/modules//libvgahw.so
(II) Module vgahw: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 0.1.0
(**) intel(0): Depth 16, (--) framebuffer bpp 16
(==) intel(0): RGB weight 565
(==) intel(0): Default visual is TrueColor
(II) intel(0): Integrated Graphics Chipset: Intel(R) 945GME
(--) intel(0): Chipset: "945GME"
(--) intel(0): Linear framebuffer at 0xD0000000
(--) intel(0): IO registers at addr 0xFE980000
(II) intel(0): 2 display pipes available.
(**) intel(0): Using XAA for acceleration
(II) Module "ddc" already built-in
(II) Module "i2c" already built-in
(II) intel(0): Output VGA using monitor section LCD 1600x1200
(II) intel(0): I2C bus "CRTDDC_A" initialized.
(II) intel(0): Output LVDS has no monitor section
(II) intel(0): I2C bus "LVDSDDC_C" initialized.
(II) intel(0): Attempting to determine panel fixed mode.
(II) intel(0): I2C device "LVDSDDC_C:ddc2" registered at address 0xA0.
(II) intel(0): I2C device "LVDSDDC_C:ddc2" removed.
(II) intel(0): initializing int10
(WW) intel(0): Bad V_BIOS checksum
(II) intel(0): Primary V_BIOS segment is: 0xc000
(II) intel(0): VESA BIOS detected
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOB" initialized.
(II) intel(0): I2C device "SDVOCTRL_E for SDVOB:SDVO Controller B" registered at address 0x70.
(II) intel(0): No SDVO device found on SDVOB
(II) intel(0): I2C device "SDVOCTRL_E for SDVOB:SDVO Controller B" removed.
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOB" removed.
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOC" initialized.
(II) intel(0): I2C device "SDVOCTRL_E for SDVOC:SDVO Controller C" registered at address 0x72.
(II) intel(0): No SDVO device found on SDVOC
(II) intel(0): I2C device "SDVOCTRL_E for SDVOC:SDVO Controller C" removed.
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOC" removed.
(II) intel(0): Output TV has no monitor section
(II) intel(0): I2C device "CRTDDC_A:ddc2" registered at address 0xA0.
(II) intel(0): EDID vendor "SAM", prod id 476
(II) intel(0): Using hsync ranges from config file
(II) intel(0): Using vrefresh ranges from config file
(II) intel(0): Printing DDC gathered Modelines:
(II) intel(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsy
nc +vsync (64.0 kHz)
(II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync
 (37.9 kHz)
(II) intel(0): Modeline "800x600"x0.0   36.00  800 824 896 1024  600 601 603 625 +hsync +vsync
 (35.2 kHz)
(II) intel(0): Modeline "640x480"x0.0   31.50  640 656 720 840  480 481 484 500 -hsync -vsync 
(37.5 kHz)
(II) intel(0): Modeline "640x480"x0.0   31.50  640 664 704 832  480 489 491 520 -hsync -vsync 
(37.9 kHz)
(II) intel(0): Modeline "640x480"x0.0   30.24  640 704 768 864  480 483 486 525 -hsync -vsync 
(35.0 kHz)
(II) intel(0): Modeline "640x480"x0.0   25.20  640 656 752 800  480 490 492 525 -hsync -vsync 
(31.5 kHz)
(II) intel(0): Modeline "720x400"x0.0   28.32  720 738 846 900  400 412 414 449 -hsync +vsync 
(31.5 kHz)
(II) intel(0): Modeline "1280x1024"x0.0  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsy
nc +vsync (80.0 kHz)
(II) intel(0): Modeline "1024x768"x0.0   78.80  1024 1040 1136 1312  768 769 772 800 +hsync +v
sync (60.1 kHz)
(II) intel(0): Modeline "1024x768"x0.0   75.00  1024 1048 1184 1328  768 771 777 806 -hsync -v
sync (56.5 kHz)
(II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -v
sync (48.4 kHz)
(II) intel(0): Modeline "832x624"x0.0   57.28  832 864 928 1152  624 625 628 667 -hsync -vsync
 (49.7 kHz)
(II) intel(0): Modeline "800x600"x0.0   49.50  800 816 896 1056  600 601 604 625 +hsync +vsync
 (46.9 kHz)
(II) intel(0): Modeline "800x600"x0.0   50.00  800 856 976 1040  600 637 643 666 +hsync +vsync
 (48.1 kHz)
(II) intel(0): Modeline "1152x864"x0.0  108.00  1152 1216 1344 1600  864 865 868 900 +hsync +v
sync (67.5 kHz)
(II) intel(0): Modeline "1280x1024"x59.9  109.00  1280 1368 1496 1712  1024 1027 1034 1063 -hs
ync +vsync (63.7 kHz)
(II) intel(0): Modeline "1280x960"x59.9  101.25  1280 1360 1488 1696  960 963 967 996 -hsync +
vsync (59.7 kHz)
(II) intel(0): Modeline "1152x864"x74.8  104.00  1152 1224 1344 1536  864 867 871 905 -hsync +
vsync (67.7 kHz)
(II) intel(0): EDID vendor "SAM", prod id 476
(II) intel(0): I2C device "LVDSDDC_C:ddc2" registered at address 0xA0.
(II) intel(0): I2C device "LVDSDDC_C:ddc2" removed.
(II) intel(0): Output VGA connected
(II) intel(0): Output LVDS connected
(II) intel(0): Output TV disconnected
(II) intel(0): Output VGA using initial mode 1280x1024
(II) intel(0): Output LVDS using initial mode 800x600
(II) intel(0): Monitoring connected displays enabled
(II) intel(0): detected 256 kB GTT.
(II) intel(0): detected 7932 kB stolen memory.
(==) intel(0): video overlay key set to 0x83e
(==) intel(0): Will not try to enable page flipping
(==) intel(0): Triple buffering disabled
(==) intel(0): Intel XvMC decoder disabled
(==) intel(0): Using gamma correction (1.0, 1.0, 1.0)
(**) intel(0): Display dimensions: (340, 270) mm
(**) intel(0): DPI set to (119, 150)
(II) Loading /usr/lib/xorg/modules//libfb.so
(II) Module fb: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Loading /usr/lib/xorg/modules//libxaa.so
(II) Module xaa: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.2.0
(II) Module "ramdac" already built-in
(II) intel(0): Comparing regs from server start up to After PreInit
(WW) intel(0): Register 0x61200 (PP_STATUS) changed from 0xc0000008 to 0xd000000a
(WW) intel(0): PP_STATUS before: on, ready, sequencing idle
(WW) intel(0): PP_STATUS after: on, ready, sequencing on
(WW) intel(0): Register 0x61114 (PORT_HOTPLUG_STAT) changed from 0x00000b00 to 0x00000f00
(WW) intel(0): Register 0x68000 (TV_CTL) changed from 0x10000010 to 0x000c0010
(WW) intel(0): Register 0x68010 (TV_CSC_Y) changed from 0x00000000 to 0x0332012d
(WW) intel(0): Register 0x68014 (TV_CSC_Y2) changed from 0x00000000 to 0x07d30104
(WW) intel(0): Register 0x68018 (TV_CSC_U) changed from 0x00000000 to 0x0733052d
(WW) intel(0): Register 0x6801c (TV_CSC_U2) changed from 0x00000000 to 0x05c70200
(WW) intel(0): Register 0x68020 (TV_CSC_V) changed from 0x00000000 to 0x0340030c
(WW) intel(0): Register 0x68024 (TV_CSC_V2) changed from 0x00000000 to 0x06d00200
(WW) intel(0): Register 0x68028 (TV_CLR_KNOBS) changed from 0x00000000 to 0x00606000
(WW) intel(0): Register 0x6802c (TV_CLR_LEVEL) changed from 0x00000000 to 0x010b00e1
(WW) intel(0): Register 0x68030 (TV_H_CTL_1) changed from 0x00000000 to 0x00400359
(WW) intel(0): Register 0x68034 (TV_H_CTL_2) changed from 0x00000000 to 0x80480022
(WW) intel(0): Register 0x68038 (TV_H_CTL_3) changed from 0x00000000 to 0x007c0344
(WW) intel(0): Register 0x6803c (TV_V_CTL_1) changed from 0x00000000 to 0x00f01415
(WW) intel(0): Register 0x68040 (TV_V_CTL_2) changed from 0x00000000 to 0x00060607
(WW) intel(0): Register 0x68044 (TV_V_CTL_3) changed from 0x00000000 to 0x80120001
(WW) intel(0): Register 0x68048 (TV_V_CTL_4) changed from 0x00000000 to 0x000900f0
(WW) intel(0): Register 0x6804c (TV_V_CTL_5) changed from 0x00000000 to 0x000a00f0
(WW) intel(0): Register 0x68050 (TV_V_CTL_6) changed from 0x00000000 to 0x000900f0
(WW) intel(0): Register 0x68054 (TV_V_CTL_7) changed from 0x00000000 to 0x000a00f0
(WW) intel(0): Register 0x68060 (TV_SC_CTL_1) changed from 0x00000000 to 0xc1710088
(WW) intel(0): Register 0x68064 (TV_SC_CTL_2) changed from 0x00000000 to 0x4e2d1dc8
(WW) intel(0): Register 0x68070 (TV_WIN_POS) changed from 0x00000000 to 0x00360024
(WW) intel(0): Register 0x68074 (TV_WIN_SIZE) changed from 0x00000000 to 0x02640198
(WW) intel(0): Register 0x68080 (TV_FILTER_CTL_1) changed from 0x00000000 to 0x800010bb
(WW) intel(0): Register 0x68084 (TV_FILTER_CTL_2) changed from 0x00000000 to 0x00028283
(WW) intel(0): Register 0x68088 (TV_FILTER_CTL_3) changed from 0x00000000 to 0x00014141
(WW) intel(0): Register 0x68100 (TV_H_LUMA_0) changed from 0x00000000 to 0xb1403000
(WW) intel(0): Register 0x681ec (TV_H_LUMA_59) changed from 0x00000000 to 0x0000b060
(WW) intel(0): Register 0x68200 (TV_H_CHROMA_0) changed from 0x00000000 to 0xb1403000
(WW) intel(0): Register 0x682ec (TV_H_CHROMA_59) changed from 0x00000000 to 0x0000b060
(II) intel(0): Kernel reported 107520 total, 0 used
(II) [drm] DRM interface version 1.4
(II) [drm] DRM open master succeeded.
(II) intel(0): [drm] Using the DRM lock SAREA also for drawables.
(II) intel(0): [drm] framebuffer mapped by ddx driver
(II) intel(0): [drm] added 1 reserved context for kernel
(II) intel(0): X context handle = 0x1
(II) intel(0): [drm] installed DRM signal handler
(**) intel(0): Framebuffer compression enabled
(**) intel(0): Tiling enabled
(==) intel(0): VideoRam: 262144 KB
(II) intel(0): Attempting memory allocation with tiled buffers.
(II) intel(0): Allocating 4800 scanlines for pixmap cache
(II) intel(0): Tiled allocation successful.
(II) intel(0): [drm] Registers = 0xfe980000
(II) intel(0): [drm] ring buffer = 0xd0000000
(II) intel(0): [drm] mapped front buffer at 0xd2000000, handle = 0xd2000000
(II) intel(0): [drm] mapped back buffer at 0xd0800000, handle = 0xd0800000
(II) intel(0): [drm] mapped depth buffer at 0xd1000000, handle = 0xd1000000
(II) intel(0): [drm] mapped classic textures at 0xd4000000, handle = 0xd4000000
(II) intel(0): [drm] Initialized kernel agp heap manager, 33554432
(II) intel(0): [dri] visual configs initialized
(II) intel(0): Page Flipping disabled
(==) intel(0): Write-combining range (0xd0000000,0x10000000)
(II) intel(0): Using XFree86 Acceleration Architecture (XAA)
        Screen to screen bit blits
        Solid filled rectangles
        8x8 mono pattern filled rectangles
        Indirect CPU to Screen color expansion
        Solid Horizontal and Vertical Lines
        Setting up tile and stipple cache:
                32 128x128 slots
                32 256x256 slots
                16 512x512 slots
(==) intel(0): Backing store disabled
(==) intel(0): Silken mouse enabled
(II) intel(0): Initializing HW Cursor
(II) intel(0): [DRI] installation complete
(II) intel(0): Fixed memory allocation layout:
(II) intel(0): 0x00000000-0x0001ffff: ring buffer (128 kB)
(II) intel(0): 0x00020000-0x0061ffff: compressed frame buffer (6144 kB, 0x000000001f820000 physical
)
(II) intel(0): 0x00620000-0x00620fff: compressed ll buffer (4 kB, 0x000000001fe20000 physical
)
(II) intel(0): 0x00621000-0x0062afff: HW cursors (40 kB, 0x000000001fe21000 physical
)
(II) intel(0): 0x0062b000-0x00632fff: logical 3D context (32 kB)
(II) intel(0): 0x00633000-0x00633fff: overlay registers (4 kB, 0x000000001fe33000 physical
)
(II) intel(0): 0x00634000-0x00643fff: xaa scratch (64 kB)
(II) intel(0): 0x007bf000:            end of stolen memory
(II) intel(0): 0x00800000-0x00ffffff: back buffer (6400 kB) X tiled
(II) intel(0): 0x01000000-0x017fffff: depth buffer (6400 kB) X tiled
(II) intel(0): 0x02000000-0x03ffffff: front buffer (25600 kB) X tiled
(II) intel(0): 0x04000000-0x05ffffff: classic textures (32768 kB)
(II) intel(0): 0x10000000:            end of aperture
(II) intel(0): Selecting standard 18 bit TMDS pixel format.
(II) intel(0): Output configuration:
(II) intel(0):   Pipe A is on
(II) intel(0):   Display plane A is now enabled and connected to pipe A.
(II) intel(0):   Pipe B is on
(II) intel(0):   Display plane B is now enabled and connected to pipe B.
(II) intel(0):   Output VGA is connected to pipe A
(II) intel(0):   Output LVDS is connected to pipe B
(II) intel(0):   Output TV is connected to pipe none
(II) intel(0): [drm] dma control initialized, using IRQ 16

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 11:08                   ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 11:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler, Keith Packard

 [ Cc'ing Florian Mickler and Keith Packard ]

On Tue, Jul 12, 2011 at 09:07:47PM +0300, Pekka Enberg wrote:
> On Tue, Jul 12, 2011 at 8:17 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
> >> Hello Chris, everyone,
> >>
> >> On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
> >> > On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" <luke@dashjr.org> wrote:
> >> > > On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
> >> > > > On Fri, 20 May 2011 11:08:56 -0700, Ray Lee <ray-lk@madrabbit.org> wrote:
> >> > > > > [ Adding Chris Wilson (author of the problematic patch) and Rafael
> >> > > > > Wysocki to the message ]
> >> > > > >
> >> > > > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr <luke@dashjr.org> wrote:
> >> > > > > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month
> >> > > > > > ago against 2.6.38. Now 2.6.39 was just released without the
> >> > > > > > regression being addressed. This bug makes the system unusable... Some
> >> > > > > > guys on IRC suggested I
> >> > > > > > email, so here it is.
> >> > > > >
> >> > > > > See the bugzilla entry for the bisection history.
> >> > > >
> >> > > > Which has nothing to do with Luke's bug. Considering the thousand things
> >> > > > that can go wrong during X starting, without a hint as to which it is nigh
> >> > > > on impossible to debug except by trial and error. If you set up
> >> > > > netconsole, does the kernel emit an OOPS with it's last dying breath?
> >> > >
> >> > > Why assume it's a different bug? I would almost wonder if it might affect
> >> > > all Sandy Bridge GPUs. In any case, I no longer have the original
> >> > > motherboard (it was recalled, as I said in the first post), nor even the
> >> > > revision of it (it had other issues that weren't being fixed). I *assume* I
> >> > > will have the same problem with my new motherboard (Intel DQ67SW), but I
> >> > > haven't verified that yet. I'll be sure to try a netconsole when I have to
> >> > > reboot next and get a chance to try the most recent 2.6.38 and .39 kernels,
> >> > > but at the moment it seems reasonable to address the problem bisected in the
> >> > > bug, even if it turns out to be different.
> >> >
> >> > The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate
> >> > locking between release and IRQ and so is prone to such races as befell
> >> > Kirill should not surprise anyone. As neither UMS nor DRI supported SNB,
> >> > I can quite confidently state they are separate bugs.
> >> > -Chris
> >>
> >> I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok
> >> on kernels < 2.6.38, and starting from 2.6.38 the system is just
> >> unusable because X either crashes the kernel (2.6.38), or does not start
> >> at all (2.6.39):
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=36052
> >>
> >>
> >> It's a regression. It's blocking me to upgrade to newer kernels. I've
> >> done my homework -- digged it and came with detailed OOPS on netconsole
> >> and bisected to single commit. Could this please be fixed?
> >
> > Silence...
> >
> > Still, reverting the bisected patch helps even for 3.0:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=36052#c4
> 
> Keith, Chris, what's up with this regression from 2.6.38? It seems
> commit e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths")
> caused problems on other machines.

Silence again, and not surprising -- I was ringing this bell for 3
months already:

https://bugzilla.kernel.org/show_bug.cgi?id=33662#c10
https://bugzilla.kernel.org/show_bug.cgi?id=36052
(and on the list)

with detailed logs and bisected single patch, without even single reply
from intel-gfx people.


And now after v3.0 is out, I've tested it again, and yes, like it was
broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
bad io access the system freezes completely:

    On netconsole:

    # X starts here, then

    [   45.102377] ------------[ cut here ]------------
    [   45.102402] WARNING: at lib/iomap.c:43 bad_io_access+0x3d/0x40()
    [   45.102411] Hardware name: PCISA-945GSE
    [   45.102418] Bad IO access at port 0x84 (return inl(port))
    [   45.102425] Modules linked in: 
    [   45.102438] Pid: 2846, comm: sshd Not tainted 3.0.0--NAVY #33
    [   45.102445] Call Trace:
    [   45.102460]  [<c118e9fd>] ? bad_io_access+0x3d/0x40
    [   45.102473]  [<c10287ec>] warn_slowpath_common+0x6c/0xa0
    [   45.102484]  [<c118e9fd>] ? bad_io_access+0x3d/0x40
    [   45.102495]  [<c102889e>] warn_slowpath_fmt+0x2e/0x30
    [   45.102506]  [<c118e9fd>] bad_io_access+0x3d/0x40
    [   45.102516]  [<c118edb2>] ioread32+0x22/0x40
    [   45.102528]  [<c122cc7d>] i915_driver_irq_handler+0x1ad/0x660
    [   45.102541]  [<c12c6a7e>] ? rtl8169_interrupt+0xee/0x370
    [   45.102554]  [<c105c396>] handle_irq_event_percpu+0x36/0x140
    [   45.102565]  [<c105e490>] ? handle_edge_irq+0x150/0x150
    [   45.102576]  [<c105c4d9>] handle_irq_event+0x39/0x60
    [   45.102587]  [<c105e4d5>] handle_fasteoi_irq+0x45/0xd0
    [   45.102594]  <IRQ>   [<c1003c29>] ? do_IRQ+0x39/0xb0
    [   45.102613]  [<c103c9b3>] ? start_flush_work+0xc3/0x130
    [   45.102625]  [<c13bc329>] ? common_interrupt+0x29/0x30
    [   45.102636]  [<c13bc329>] ? common_interrupt+0x29/0x30
    [   45.102648]  [<c11e007b>] ? pnpacpi_encode_resources+0x37b/0x7a0
    [   45.102659]  [<c109971e>] ? fget_light+0xe/0xf0
    [   45.102671]  [<c10a8f97>] ? do_select+0x2e7/0x680
    [   45.102685]  [<c1341998>] ? sch_direct_xmit+0x58/0x1d0
    [   45.102695]  [<c10a83e0>] ? poll_freewait+0xa0/0xa0
    [   45.102706]  [<c102df37>] ? local_bh_enable+0x47/0xa0
    [   45.102718]  [<c132e371>] ? dev_queue_xmit+0x101/0x4e0
    [   45.102729]  [<c134ffba>] ? ip_finish_output+0x10a/0x2f0
    [   45.102740]  [<c1350216>] ? ip_output+0x76/0x90
    [   45.102750]  [<c134d715>] ? ip_local_out+0x65/0x70
    [   45.102762]  [<c134fa3d>] ? ip_queue_xmit+0x1bd/0x3b0
    [   45.102775]  [<c1362af8>] ? tcp_transmit_skb+0x468/0x7d0
    [   45.102788]  [<c13215af>] ? sk_reset_timer+0xf/0x20
    [   45.102798]  [<c1362446>] ? tcp_event_new_data_sent+0x86/0xc0
    [   45.102809]  [<c1364fc1>] ? tcp_write_xmit+0x1e1/0x9a0
    [   45.102822]  [<c1326925>] ? __alloc_skb+0x55/0x100
    [   45.102838]  [<c102df37>] ? local_bh_enable+0x47/0xa0
    [   45.102849]  [<c1321246>] ? release_sock+0xd6/0x110
    [   45.102859]  [<c13657f7>] ? __tcp_push_pending_frames+0x27/0x80
    [   45.102870]  [<c13584fa>] ? tcp_sendmsg+0x64a/0xac0

    -*- and then system FREEZE -*-


For completeness `X -verbose` log is in "Appendix 1", (but who cares
anyway? I've sent lots of such logs without a reply).


And again, after reverting e8616b6 ("drm/i915: Initialise ring vfuncs
for old DRI paths") on top of v3.0, X works without any problem again.


So I wonder:

    I thought people are trying to do "no regressions" rule in kernel.
    Should we then just apply the following patch? In case Intel people
    are not responding, should it just go directly into mainline?

    Or would it be more fair to say that UMS is not supported anymore,
    is broken and just remove support for it?


Thanks,
Kirill


P.S. Sometimes people change their hardware preferences based on
software support quality. Knock, knock...


From ef91a178e6069ae07c7a3c1e39e13eea609953cd Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@mns.spb.ru>
Date: Wed, 29 Jun 2011 14:22:49 +0400
Subject: [PATCH] Revert "drm/i915: Initialise ring vfuncs for old DRI paths"

This reverts commit e8616b6ced6137085e6657cc63bc2fe3900b8616.

See https://bugzilla.kernel.org/show_bug.cgi?id=36052

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Florian Mickler <florian@mickler.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org

---
 drivers/gpu/drm/i915/i915_dma.c         |   25 +++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c |   42 -------------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 --
 3 files changed, 18 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 296fbd6..9300d18 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -160,7 +160,7 @@ static int i915_initialize(struct drm_device * dev, drm_i915_init_t * init)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-	int ret;
+	struct intel_ring_buffer *ring = LP_RING(dev_priv);
 
 	master_priv->sarea = drm_getsarea(dev);
 	if (master_priv->sarea) {
@@ -171,22 +171,33 @@ static int i915_initialize(struct drm_device * dev, drm_i915_init_t * init)
 	}
 
 	if (init->ring_size != 0) {
-		if (LP_RING(dev_priv)->obj != NULL) {
+		if (ring->obj != NULL) {
 			i915_dma_cleanup(dev);
 			DRM_ERROR("Client tried to initialize ringbuffer in "
 				  "GEM mode\n");
 			return -EINVAL;
 		}
 
-		ret = intel_render_ring_init_dri(dev,
-						 init->ring_start,
-						 init->ring_size);
-		if (ret) {
+		ring->size = init->ring_size;
+
+		ring->map.offset = init->ring_start;
+		ring->map.size = init->ring_size;
+		ring->map.type = 0;
+		ring->map.flags = 0;
+		ring->map.mtrr = 0;
+
+		drm_core_ioremap_wc(&ring->map, dev);
+
+		if (ring->map.handle == NULL) {
 			i915_dma_cleanup(dev);
-			return ret;
+			DRM_ERROR("can not ioremap virtual address for"
+				  " ring buffer\n");
+			return -ENOMEM;
 		}
 	}
 
+	ring->virtual_start = ring->map.handle;
+
 	dev_priv->cpp = init->cpp;
 	dev_priv->back_offset = init->back_offset;
 	dev_priv->front_offset = init->front_offset;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 95c4b14..8d2f610 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1304,48 +1304,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	return intel_init_ring_buffer(dev, ring);
 }
 
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
-{
-	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
-
-	*ring = render_ring;
-	if (INTEL_INFO(dev)->gen >= 6) {
-		ring->add_request = gen6_add_request;
-		ring->irq_get = gen6_render_ring_get_irq;
-		ring->irq_put = gen6_render_ring_put_irq;
-	} else if (IS_GEN5(dev)) {
-		ring->add_request = pc_render_add_request;
-		ring->get_seqno = pc_render_get_seqno;
-	}
-
-	ring->dev = dev;
-	INIT_LIST_HEAD(&ring->active_list);
-	INIT_LIST_HEAD(&ring->request_list);
-	INIT_LIST_HEAD(&ring->gpu_write_list);
-
-	ring->size = size;
-	ring->effective_size = ring->size;
-	if (IS_I830(ring->dev))
-		ring->effective_size -= 128;
-
-	ring->map.offset = start;
-	ring->map.size = size;
-	ring->map.type = 0;
-	ring->map.flags = 0;
-	ring->map.mtrr = 0;
-
-	drm_core_ioremap_wc(&ring->map, dev);
-	if (ring->map.handle == NULL) {
-		DRM_ERROR("can not ioremap virtual address for"
-			  " ring buffer\n");
-		return -ENOMEM;
-	}
-
-	ring->virtual_start = (void __force __iomem *)ring->map.handle;
-	return 0;
-}
-
 int intel_init_bsd_ring_buffer(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39ac2b6..b6b0fd4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -197,7 +197,4 @@ static inline void i915_trace_irq_get(struct intel_ring_buffer *ring, u32 seqno)
 		ring->trace_irq_seqno = seqno;
 }
 
-/* DRI warts */
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size);
-
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
1.7.6.233.gd79bc






Appendix 1. `X -verbose` log
----------------------------

# same, starting X over ssh
navy3:~# X -verbose
_XSERVTransSocketOpenCOTSServer: Unable to open socket for inet6
_XSERVTransOpen: transport open failed for inet6/navy3:0
_XSERVTransMakeAllCOTSServerListeners: failed to open listener for inet6

X.Org X Server 1.4.2
Release Date: 11 June 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux Debian (xorg-server 2:1.4.2-10.lenny3)
Current Operating System: Linux navy3 3.0.0--NAVY #33 PREEMPT Fri Jul 22 13:56:40 MSD 2011 i68
6
Build Date: 25 September 2010  12:05:44PM
 
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat Jan 19 21:24:12 2002
(==) Using config file: "/etc/X11/xorg.conf"
(==) ServerLayout "Default Layout"
(**) |-->Screen "Default Screen" (0)
(**) |   |-->Monitor "LCD 1600x1200"
(==) No device specified for screen "Default Screen".
        Using the first device section listed.
(**) |   |-->Device "Default card"
(**) |-->Input Device "Generic Keyboard"
(**) |-->Input Device "Configured Mouse"
(==) Automatically adding devices
(==) Automatically enabling devices
(==) No FontPath specified.  Using compiled-in default.
(WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
        Entry deleted from font path.
(WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
        Entry deleted from font path.
(WW) The directory "/var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType" does not exist.
        Entry deleted from font path.
(==) FontPath set to:
        /usr/share/fonts/X11/misc,
        /usr/share/fonts/X11/cyrillic
(==) RgbPath set to "/etc/X11/rgb"
(==) ModulePath set to "/usr/lib/xorg/modules"
(II) Loading /usr/lib/xorg/modules//libpcidata.so
(II) Module pcidata: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(--) using VT number 7

(--) PCI:*(0:2:0) Intel Corporation Mobile 945GME Express Integrated Graphics Controller rev 3
, Mem @ 0xfe980000/19, 0xd0000000/28, 0xfe940000/18, I/O @ 0xbc80/3
(--) PCI: (0:2:1) Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphi
cs Controller rev 3, Mem @ 0xfe880000/19
(II) "extmod" will be loaded. This was enabled by default and also specified in the config fil
e.
(II) "dbe" will be loaded. This was enabled by default and also specified in the config file.
(II) "glx" will be loaded by default.
(II) "freetype" will be loaded. This was enabled by default and also specified in the config f
ile.
(II) "record" will be loaded. This was enabled by default and also specified in the config fil
e.
(II) "dri" will be loaded by default.
(II) Loading /usr/lib/xorg/modules/extensions//libdbe.so
(II) Module dbe: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Module "ddc" already built-in
(II) Loading /usr/lib/xorg/modules/extensions//libextmod.so
(II) Module extmod: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Loading /usr/lib/xorg/modules//fonts/libfreetype.so
(II) Module freetype: vendor="X.Org Foundation & the After X-TT Project"
        compiled for 1.4.2, module version = 2.1.0
(II) Loading /usr/lib/xorg/modules//libint10.so
(II) Module int10: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Loading /usr/lib/xorg/modules/extensions//librecord.so
(II) Module record: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.13.0
(II) Loading /usr/lib/xorg/modules//libvbe.so
(II) Module vbe: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.1.0
(II) Loading /usr/lib/xorg/modules/drivers//v4l_drv.so
(II) Module v4l: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 0.1.1
(II) Loading /usr/lib/xorg/modules/extensions//libglx.so
(II) Module glx: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(==) AIGLX enabled
(II) Loading /usr/lib/xorg/modules/extensions//libdri.so
(II) Module dri: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Matched intel from file name intel.ids in autoconfig
(==) Matched intel for the autoconfigured driver
(==) Assigned the driver to the xf86ConfigLayout
(II) Loading /usr/lib/xorg/modules/drivers//intel_drv.so
(II) Module intel: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 2.3.2
(II) Loading /usr/lib/xorg/modules/input//kbd_drv.so
(II) Module kbd: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 1.3.1
(II) Loading /usr/lib/xorg/modules/input//mouse_drv.so
(II) Module mouse: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 1.3.0
(II) v4l driver for Video4Linux
(II) intel: Driver for Intel Integrated Graphics Chipsets: i810,
        i810-dc100, i810e, i815, i830M, 845G, 852GM/855GM, 865G, 915G,
        E7221 (i915), 915GM, 945G, 945GM, 945GME, 965G, G35, 965Q, 946GZ,
        965GM, 965GME/GLE, G33, Q35, Q33,
        Mobile Intel® GM45 Express Chipset,
        Intel Integrated Graphics Device, G45/G43, Q45/Q43, G41
(--) Assigning device section with no busID to primary device
(WW) intel: No matching Device section for instance (BusID PCI:0:2:1) found
(--) Chipset 945GME found
(II) Loading /usr/lib/xorg/modules//libvgahw.so
(II) Module vgahw: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 0.1.0
(**) intel(0): Depth 16, (--) framebuffer bpp 16
(==) intel(0): RGB weight 565
(==) intel(0): Default visual is TrueColor
(II) intel(0): Integrated Graphics Chipset: Intel(R) 945GME
(--) intel(0): Chipset: "945GME"
(--) intel(0): Linear framebuffer at 0xD0000000
(--) intel(0): IO registers at addr 0xFE980000
(II) intel(0): 2 display pipes available.
(**) intel(0): Using XAA for acceleration
(II) Module "ddc" already built-in
(II) Module "i2c" already built-in
(II) intel(0): Output VGA using monitor section LCD 1600x1200
(II) intel(0): I2C bus "CRTDDC_A" initialized.
(II) intel(0): Output LVDS has no monitor section
(II) intel(0): I2C bus "LVDSDDC_C" initialized.
(II) intel(0): Attempting to determine panel fixed mode.
(II) intel(0): I2C device "LVDSDDC_C:ddc2" registered at address 0xA0.
(II) intel(0): I2C device "LVDSDDC_C:ddc2" removed.
(II) intel(0): initializing int10
(WW) intel(0): Bad V_BIOS checksum
(II) intel(0): Primary V_BIOS segment is: 0xc000
(II) intel(0): VESA BIOS detected
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOB" initialized.
(II) intel(0): I2C device "SDVOCTRL_E for SDVOB:SDVO Controller B" registered at address 0x70.
(II) intel(0): No SDVO device found on SDVOB
(II) intel(0): I2C device "SDVOCTRL_E for SDVOB:SDVO Controller B" removed.
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOB" removed.
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOC" initialized.
(II) intel(0): I2C device "SDVOCTRL_E for SDVOC:SDVO Controller C" registered at address 0x72.
(II) intel(0): No SDVO device found on SDVOC
(II) intel(0): I2C device "SDVOCTRL_E for SDVOC:SDVO Controller C" removed.
(II) intel(0): I2C bus "SDVOCTRL_E for SDVOC" removed.
(II) intel(0): Output TV has no monitor section
(II) intel(0): I2C device "CRTDDC_A:ddc2" registered at address 0xA0.
(II) intel(0): EDID vendor "SAM", prod id 476
(II) intel(0): Using hsync ranges from config file
(II) intel(0): Using vrefresh ranges from config file
(II) intel(0): Printing DDC gathered Modelines:
(II) intel(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsy
nc +vsync (64.0 kHz)
(II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync
 (37.9 kHz)
(II) intel(0): Modeline "800x600"x0.0   36.00  800 824 896 1024  600 601 603 625 +hsync +vsync
 (35.2 kHz)
(II) intel(0): Modeline "640x480"x0.0   31.50  640 656 720 840  480 481 484 500 -hsync -vsync 
(37.5 kHz)
(II) intel(0): Modeline "640x480"x0.0   31.50  640 664 704 832  480 489 491 520 -hsync -vsync 
(37.9 kHz)
(II) intel(0): Modeline "640x480"x0.0   30.24  640 704 768 864  480 483 486 525 -hsync -vsync 
(35.0 kHz)
(II) intel(0): Modeline "640x480"x0.0   25.20  640 656 752 800  480 490 492 525 -hsync -vsync 
(31.5 kHz)
(II) intel(0): Modeline "720x400"x0.0   28.32  720 738 846 900  400 412 414 449 -hsync +vsync 
(31.5 kHz)
(II) intel(0): Modeline "1280x1024"x0.0  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsy
nc +vsync (80.0 kHz)
(II) intel(0): Modeline "1024x768"x0.0   78.80  1024 1040 1136 1312  768 769 772 800 +hsync +v
sync (60.1 kHz)
(II) intel(0): Modeline "1024x768"x0.0   75.00  1024 1048 1184 1328  768 771 777 806 -hsync -v
sync (56.5 kHz)
(II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -v
sync (48.4 kHz)
(II) intel(0): Modeline "832x624"x0.0   57.28  832 864 928 1152  624 625 628 667 -hsync -vsync
 (49.7 kHz)
(II) intel(0): Modeline "800x600"x0.0   49.50  800 816 896 1056  600 601 604 625 +hsync +vsync
 (46.9 kHz)
(II) intel(0): Modeline "800x600"x0.0   50.00  800 856 976 1040  600 637 643 666 +hsync +vsync
 (48.1 kHz)
(II) intel(0): Modeline "1152x864"x0.0  108.00  1152 1216 1344 1600  864 865 868 900 +hsync +v
sync (67.5 kHz)
(II) intel(0): Modeline "1280x1024"x59.9  109.00  1280 1368 1496 1712  1024 1027 1034 1063 -hs
ync +vsync (63.7 kHz)
(II) intel(0): Modeline "1280x960"x59.9  101.25  1280 1360 1488 1696  960 963 967 996 -hsync +
vsync (59.7 kHz)
(II) intel(0): Modeline "1152x864"x74.8  104.00  1152 1224 1344 1536  864 867 871 905 -hsync +
vsync (67.7 kHz)
(II) intel(0): EDID vendor "SAM", prod id 476
(II) intel(0): I2C device "LVDSDDC_C:ddc2" registered at address 0xA0.
(II) intel(0): I2C device "LVDSDDC_C:ddc2" removed.
(II) intel(0): Output VGA connected
(II) intel(0): Output LVDS connected
(II) intel(0): Output TV disconnected
(II) intel(0): Output VGA using initial mode 1280x1024
(II) intel(0): Output LVDS using initial mode 800x600
(II) intel(0): Monitoring connected displays enabled
(II) intel(0): detected 256 kB GTT.
(II) intel(0): detected 7932 kB stolen memory.
(==) intel(0): video overlay key set to 0x83e
(==) intel(0): Will not try to enable page flipping
(==) intel(0): Triple buffering disabled
(==) intel(0): Intel XvMC decoder disabled
(==) intel(0): Using gamma correction (1.0, 1.0, 1.0)
(**) intel(0): Display dimensions: (340, 270) mm
(**) intel(0): DPI set to (119, 150)
(II) Loading /usr/lib/xorg/modules//libfb.so
(II) Module fb: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.0.0
(II) Loading /usr/lib/xorg/modules//libxaa.so
(II) Module xaa: vendor="X.Org Foundation"
        compiled for 1.4.2, module version = 1.2.0
(II) Module "ramdac" already built-in
(II) intel(0): Comparing regs from server start up to After PreInit
(WW) intel(0): Register 0x61200 (PP_STATUS) changed from 0xc0000008 to 0xd000000a
(WW) intel(0): PP_STATUS before: on, ready, sequencing idle
(WW) intel(0): PP_STATUS after: on, ready, sequencing on
(WW) intel(0): Register 0x61114 (PORT_HOTPLUG_STAT) changed from 0x00000b00 to 0x00000f00
(WW) intel(0): Register 0x68000 (TV_CTL) changed from 0x10000010 to 0x000c0010
(WW) intel(0): Register 0x68010 (TV_CSC_Y) changed from 0x00000000 to 0x0332012d
(WW) intel(0): Register 0x68014 (TV_CSC_Y2) changed from 0x00000000 to 0x07d30104
(WW) intel(0): Register 0x68018 (TV_CSC_U) changed from 0x00000000 to 0x0733052d
(WW) intel(0): Register 0x6801c (TV_CSC_U2) changed from 0x00000000 to 0x05c70200
(WW) intel(0): Register 0x68020 (TV_CSC_V) changed from 0x00000000 to 0x0340030c
(WW) intel(0): Register 0x68024 (TV_CSC_V2) changed from 0x00000000 to 0x06d00200
(WW) intel(0): Register 0x68028 (TV_CLR_KNOBS) changed from 0x00000000 to 0x00606000
(WW) intel(0): Register 0x6802c (TV_CLR_LEVEL) changed from 0x00000000 to 0x010b00e1
(WW) intel(0): Register 0x68030 (TV_H_CTL_1) changed from 0x00000000 to 0x00400359
(WW) intel(0): Register 0x68034 (TV_H_CTL_2) changed from 0x00000000 to 0x80480022
(WW) intel(0): Register 0x68038 (TV_H_CTL_3) changed from 0x00000000 to 0x007c0344
(WW) intel(0): Register 0x6803c (TV_V_CTL_1) changed from 0x00000000 to 0x00f01415
(WW) intel(0): Register 0x68040 (TV_V_CTL_2) changed from 0x00000000 to 0x00060607
(WW) intel(0): Register 0x68044 (TV_V_CTL_3) changed from 0x00000000 to 0x80120001
(WW) intel(0): Register 0x68048 (TV_V_CTL_4) changed from 0x00000000 to 0x000900f0
(WW) intel(0): Register 0x6804c (TV_V_CTL_5) changed from 0x00000000 to 0x000a00f0
(WW) intel(0): Register 0x68050 (TV_V_CTL_6) changed from 0x00000000 to 0x000900f0
(WW) intel(0): Register 0x68054 (TV_V_CTL_7) changed from 0x00000000 to 0x000a00f0
(WW) intel(0): Register 0x68060 (TV_SC_CTL_1) changed from 0x00000000 to 0xc1710088
(WW) intel(0): Register 0x68064 (TV_SC_CTL_2) changed from 0x00000000 to 0x4e2d1dc8
(WW) intel(0): Register 0x68070 (TV_WIN_POS) changed from 0x00000000 to 0x00360024
(WW) intel(0): Register 0x68074 (TV_WIN_SIZE) changed from 0x00000000 to 0x02640198
(WW) intel(0): Register 0x68080 (TV_FILTER_CTL_1) changed from 0x00000000 to 0x800010bb
(WW) intel(0): Register 0x68084 (TV_FILTER_CTL_2) changed from 0x00000000 to 0x00028283
(WW) intel(0): Register 0x68088 (TV_FILTER_CTL_3) changed from 0x00000000 to 0x00014141
(WW) intel(0): Register 0x68100 (TV_H_LUMA_0) changed from 0x00000000 to 0xb1403000
(WW) intel(0): Register 0x681ec (TV_H_LUMA_59) changed from 0x00000000 to 0x0000b060
(WW) intel(0): Register 0x68200 (TV_H_CHROMA_0) changed from 0x00000000 to 0xb1403000
(WW) intel(0): Register 0x682ec (TV_H_CHROMA_59) changed from 0x00000000 to 0x0000b060
(II) intel(0): Kernel reported 107520 total, 0 used
(II) [drm] DRM interface version 1.4
(II) [drm] DRM open master succeeded.
(II) intel(0): [drm] Using the DRM lock SAREA also for drawables.
(II) intel(0): [drm] framebuffer mapped by ddx driver
(II) intel(0): [drm] added 1 reserved context for kernel
(II) intel(0): X context handle = 0x1
(II) intel(0): [drm] installed DRM signal handler
(**) intel(0): Framebuffer compression enabled
(**) intel(0): Tiling enabled
(==) intel(0): VideoRam: 262144 KB
(II) intel(0): Attempting memory allocation with tiled buffers.
(II) intel(0): Allocating 4800 scanlines for pixmap cache
(II) intel(0): Tiled allocation successful.
(II) intel(0): [drm] Registers = 0xfe980000
(II) intel(0): [drm] ring buffer = 0xd0000000
(II) intel(0): [drm] mapped front buffer at 0xd2000000, handle = 0xd2000000
(II) intel(0): [drm] mapped back buffer at 0xd0800000, handle = 0xd0800000
(II) intel(0): [drm] mapped depth buffer at 0xd1000000, handle = 0xd1000000
(II) intel(0): [drm] mapped classic textures at 0xd4000000, handle = 0xd4000000
(II) intel(0): [drm] Initialized kernel agp heap manager, 33554432
(II) intel(0): [dri] visual configs initialized
(II) intel(0): Page Flipping disabled
(==) intel(0): Write-combining range (0xd0000000,0x10000000)
(II) intel(0): Using XFree86 Acceleration Architecture (XAA)
        Screen to screen bit blits
        Solid filled rectangles
        8x8 mono pattern filled rectangles
        Indirect CPU to Screen color expansion
        Solid Horizontal and Vertical Lines
        Setting up tile and stipple cache:
                32 128x128 slots
                32 256x256 slots
                16 512x512 slots
(==) intel(0): Backing store disabled
(==) intel(0): Silken mouse enabled
(II) intel(0): Initializing HW Cursor
(II) intel(0): [DRI] installation complete
(II) intel(0): Fixed memory allocation layout:
(II) intel(0): 0x00000000-0x0001ffff: ring buffer (128 kB)
(II) intel(0): 0x00020000-0x0061ffff: compressed frame buffer (6144 kB, 0x000000001f820000 physical
)
(II) intel(0): 0x00620000-0x00620fff: compressed ll buffer (4 kB, 0x000000001fe20000 physical
)
(II) intel(0): 0x00621000-0x0062afff: HW cursors (40 kB, 0x000000001fe21000 physical
)
(II) intel(0): 0x0062b000-0x00632fff: logical 3D context (32 kB)
(II) intel(0): 0x00633000-0x00633fff: overlay registers (4 kB, 0x000000001fe33000 physical
)
(II) intel(0): 0x00634000-0x00643fff: xaa scratch (64 kB)
(II) intel(0): 0x007bf000:            end of stolen memory
(II) intel(0): 0x00800000-0x00ffffff: back buffer (6400 kB) X tiled
(II) intel(0): 0x01000000-0x017fffff: depth buffer (6400 kB) X tiled
(II) intel(0): 0x02000000-0x03ffffff: front buffer (25600 kB) X tiled
(II) intel(0): 0x04000000-0x05ffffff: classic textures (32768 kB)
(II) intel(0): 0x10000000:            end of aperture
(II) intel(0): Selecting standard 18 bit TMDS pixel format.
(II) intel(0): Output configuration:
(II) intel(0):   Pipe A is on
(II) intel(0):   Display plane A is now enabled and connected to pipe A.
(II) intel(0):   Pipe B is on
(II) intel(0):   Display plane B is now enabled and connected to pipe B.
(II) intel(0):   Output VGA is connected to pipe A
(II) intel(0):   Output LVDS is connected to pipe B
(II) intel(0):   Output TV is connected to pipe none
(II) intel(0): [drm] dma control initialized, using IRQ 16

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22  2:59               ` Linux 3.0 release Linus Torvalds
  2011-07-22 11:08                   ` Kirill Smelkov
@ 2011-07-22 12:52                 ` Martin Knoblauch
  2011-07-22 19:11                 ` David
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 84+ messages in thread
From: Martin Knoblauch @ 2011-07-22 12:52 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List

>From: Linus Torvalds <torvalds@linux-foundation.org>

>To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
>Sent: Friday, July 22, 2011 4:59 AM
>Subject: Linux 3.0 release
>
>So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
>
>This obviously also opens the merge window for the next kernel, which
>will be 3.1. The stable team will take the third digit, so 3.0.1 will
>be the first stable release based on 3.0.
>

Hi Linus, 


 congratulations to the new release !!!!  I definitely like that you kept the three digit scheme, leaving the third one to stable. Much more symmetric now:-)

 OK, I know I am going to regret this later :-) I build and installed it, going from my 2.6.38 configuration, and it works OK. The evil 3rd party driver built out of the box and the even more evil 3rd party virtualization stuff built with a already available patch. Cool.


Have a good weekend and vacation
Martin

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 11:08                   ` Kirill Smelkov
@ 2011-07-22 14:12                     ` Herbert Xu
  -1 siblings, 0 replies; 84+ messages in thread
From: Herbert Xu @ 2011-07-22 14:12 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: penberg, chris, luke, intel-gfx, linux-kernel, dri-devel, rjw,
	ray-lk, torvalds, akpm, florian, keithp

Kirill Smelkov <kirr@mns.spb.ru> wrote:
> 
>    Or would it be more fair to say that UMS is not supported anymore,
>    is broken and just remove support for it?

It's kind of ironic that the intention of the offending patch
was to avoid a UMS crash :)

Anyway I'm now using KMS (forced to due to new hardware) so I
certainly don't have any objections to reverting this patch.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 14:12                     ` Herbert Xu
  0 siblings, 0 replies; 84+ messages in thread
From: Herbert Xu @ 2011-07-22 14:12 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: penberg, chris, luke, intel-gfx, linux-kernel, dri-devel, rjw,
	ray-lk, torvalds, akpm, florian, keithp

Kirill Smelkov <kirr@mns.spb.ru> wrote:
> 
>    Or would it be more fair to say that UMS is not supported anymore,
>    is broken and just remove support for it?

It's kind of ironic that the intention of the offending patch
was to avoid a UMS crash :)

Anyway I'm now using KMS (forced to due to new hardware) so I
certainly don't have any objections to reverting this patch.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 11:08                   ` Kirill Smelkov
@ 2011-07-22 18:00                     ` Keith Packard
  -1 siblings, 0 replies; 84+ messages in thread
From: Keith Packard @ 2011-07-22 18:00 UTC (permalink / raw)
  To: Kirill Smelkov, Pekka Enberg
  Cc: Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

[-- Attachment #1: Type: text/plain, Size: 4252 bytes --]

On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:

> And now after v3.0 is out, I've tested it again, and yes, like it was
> broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> bad io access the system freezes completely:

I looked at this when I first saw it (a couple of weeks ago), and I
couldn't see any obvious reason this patch would cause this particular
problem. I didn't want to revert the patch at that point as I feared it
would cause other subtle problems. Given that you've got a work-around,
it seemed best to just push this off past 3.0.

Given the failing address passed to ioread32, this seems like it's
probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
which is an offset in 32-bit units within the hardware status page. If
the status_page.page_addr value was zero, then the computed address
would end up being 0x84.

And, it looks like status_page.page_addr *will* end up being zero as a
result of the patch in question. The patch resets the entire ring
structure contents back to the initial values, which includes smashing
the status_page structure to zero, clearing the value of
status_page.page_addr set in i915_init_phys_hws.

Here's an untested patch which moves the initialization of
status_page.page_addr into intel_render_ring_init_dri. I note that
intel_init_render_ring_buffer *already* has the setting of the
status_page.page_addr value, and so I've removed the setting of
status_page.page_addr from i915_init_phys_hws.

I suspect we could remove the memset from intel_init_render_ring_buffer;
it seems entirely superfluous given the memset in i915_init_phys_hws.

From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Fri, 22 Jul 2011 10:44:39 -0700
Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
 intel_render_ring_init_dri

Physically-addressed hardware status pages are initialized early in
the driver load process by i915_init_phys_hws. For UMS environments,
the ring structure is not initialized until the X server starts. At
that point, the entire ring structure is re-initialized with all new
values. Any values set in the ring structure (including
ring->status_page.page_addr) will be lost when the ring is
re-initialized.

This patch moves the initialization of the status_page.page_addr value
to intel_render_ring_init_dri.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
 drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 1271282..8a3942c 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
 static int i915_init_phys_hws(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct intel_ring_buffer *ring = LP_RING(dev_priv);
 
 	/* Program Hardware Status Page */
 	dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
 		DRM_ERROR("Can not allocate hardware status page\n");
 		return -ENOMEM;
 	}
-	ring->status_page.page_addr =
-		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
 
-	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
+	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
+		  0, PAGE_SIZE);
 
 	i915_write_hws_pga(dev);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e961568..47b9b27 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
 		ring->get_seqno = pc_render_get_seqno;
 	}
 
+	if (!I915_NEED_GFX_HWS(dev))
+		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
+
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
-- 
1.7.5.4

-- 
keith.packard@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 18:00                     ` Keith Packard
  0 siblings, 0 replies; 84+ messages in thread
From: Keith Packard @ 2011-07-22 18:00 UTC (permalink / raw)
  To: Kirill Smelkov, Pekka Enberg
  Cc: Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

[-- Attachment #1: Type: text/plain, Size: 4252 bytes --]

On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:

> And now after v3.0 is out, I've tested it again, and yes, like it was
> broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> bad io access the system freezes completely:

I looked at this when I first saw it (a couple of weeks ago), and I
couldn't see any obvious reason this patch would cause this particular
problem. I didn't want to revert the patch at that point as I feared it
would cause other subtle problems. Given that you've got a work-around,
it seemed best to just push this off past 3.0.

Given the failing address passed to ioread32, this seems like it's
probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
which is an offset in 32-bit units within the hardware status page. If
the status_page.page_addr value was zero, then the computed address
would end up being 0x84.

And, it looks like status_page.page_addr *will* end up being zero as a
result of the patch in question. The patch resets the entire ring
structure contents back to the initial values, which includes smashing
the status_page structure to zero, clearing the value of
status_page.page_addr set in i915_init_phys_hws.

Here's an untested patch which moves the initialization of
status_page.page_addr into intel_render_ring_init_dri. I note that
intel_init_render_ring_buffer *already* has the setting of the
status_page.page_addr value, and so I've removed the setting of
status_page.page_addr from i915_init_phys_hws.

I suspect we could remove the memset from intel_init_render_ring_buffer;
it seems entirely superfluous given the memset in i915_init_phys_hws.

From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Fri, 22 Jul 2011 10:44:39 -0700
Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
 intel_render_ring_init_dri

Physically-addressed hardware status pages are initialized early in
the driver load process by i915_init_phys_hws. For UMS environments,
the ring structure is not initialized until the X server starts. At
that point, the entire ring structure is re-initialized with all new
values. Any values set in the ring structure (including
ring->status_page.page_addr) will be lost when the ring is
re-initialized.

This patch moves the initialization of the status_page.page_addr value
to intel_render_ring_init_dri.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
 drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 1271282..8a3942c 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
 static int i915_init_phys_hws(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct intel_ring_buffer *ring = LP_RING(dev_priv);
 
 	/* Program Hardware Status Page */
 	dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
 		DRM_ERROR("Can not allocate hardware status page\n");
 		return -ENOMEM;
 	}
-	ring->status_page.page_addr =
-		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
 
-	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
+	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
+		  0, PAGE_SIZE);
 
 	i915_write_hws_pga(dev);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e961568..47b9b27 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
 		ring->get_seqno = pc_render_get_seqno;
 	}
 
+	if (!I915_NEED_GFX_HWS(dev))
+		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
+
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
-- 
1.7.5.4

-- 
keith.packard@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22  2:59               ` Linux 3.0 release Linus Torvalds
  2011-07-22 11:08                   ` Kirill Smelkov
  2011-07-22 12:52                 ` Linux 3.0 release Martin Knoblauch
@ 2011-07-22 19:11                 ` David
  2011-07-22 19:21                   ` Linus Torvalds
  2011-07-22 23:21                 ` Linux 3.0 release - btrfs possible locking deadlock Ed Tomlinson
  2011-07-24 22:04                 ` Linux 3.0 release Arnaud Lacombe
  4 siblings, 1 reply; 84+ messages in thread
From: David @ 2011-07-22 19:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, netdev

On 22/07/11 03:59, Linus Torvalds wrote:
> really smooth. Which is not to say that there may not be bugs, but if
> anything, there are hopefully fewer than usual, rather than the normal
> ".0" problems.

I'm getting the following warning at boot from 3.0, everything seems to be fine otherwise though.

Jul 22 19:40:02 server kernel: [   15.526629] ------------[ cut here ]------------
Jul 22 19:40:02 server kernel: [   15.526635] WARNING: at kernel/timer.c:1011 del_timer_sync+0x4e/0x50()
Jul 22 19:40:02 server kernel: [   15.526637] Hardware name: System Product Name
Jul 22 19:40:02 server kernel: [   15.526638] Modules linked in: xt_owner ipt_REDIRECT ipt_MASQUERADE ts_kmp xt_string ipt_REJECT xt_recent xt_state xt_multiport xt_tcpudp xt_pkttype ipt_LOG xt_limit
iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables ip6table_filter ip6_tables x_tables wctdm dahdi hisax nfsd isl6421 crc_ccitt b2c2_flexcop_pci
dvb_pll b2c2_flexcop mt352 isdn cx24123 dvb_usb_digitv cx24113 s5h1420 dvb_usb e1000e snd_hda_codec_hdmi exportfs it87 dvb_core pl2303 hwmon_vid lp snd_hda_codec_via ppdev nfs k10temp i2c_piix4
snd_hda_intel snd_hda_codec r8169 mii lockd snd_pcm snd_timer snd shpchp soundcore parport_pc snd_page_alloc parport auth_rpcgss nfs_acl sunrpc
Jul 22 19:40:02 server kernel: [   15.526667] Pid: 0, comm: kworker/0:0 Not tainted 3.0.0 #1
Jul 22 19:40:02 server kernel: [   15.526668] Call Trace:
Jul 22 19:40:02 server kernel: [   15.526670]  <IRQ>  [<ffffffff8104599a>] warn_slowpath_common+0x7a/0xb0
Jul 22 19:40:02 server kernel: [   15.526675]  [<ffffffff810459e5>] warn_slowpath_null+0x15/0x20
Jul 22 19:40:02 server kernel: [   15.526677]  [<ffffffff81054b4e>] del_timer_sync+0x4e/0x50
Jul 22 19:40:02 server kernel: [   15.526679]  [<ffffffff8145e224>] linkwatch_schedule_work+0x84/0xa0
Jul 22 19:40:02 server kernel: [   15.526681]  [<ffffffff8145e2bc>] linkwatch_fire_event+0x7c/0x100
Jul 22 19:40:02 server kernel: [   15.526684]  [<ffffffff8146b1ed>] netif_carrier_on+0x2d/0x40
Jul 22 19:40:02 server kernel: [   15.526689]  [<ffffffffa006b6fb>] __rtl8169_check_link_status+0x4b/0xc0 [r8169]
Jul 22 19:40:02 server kernel: [   15.526693]  [<ffffffffa006c016>] rtl8169_interrupt+0x166/0x3a0 [r8169]
Jul 22 19:40:02 server kernel: [   15.526696]  [<ffffffff810a4385>] handle_irq_event_percpu+0x55/0x1f0
Jul 22 19:40:02 server kernel: [   15.526698]  [<ffffffff810a4551>] handle_irq_event+0x31/0x50
Jul 22 19:40:02 server kernel: [   15.526701]  [<ffffffff8101a371>] ? ack_apic_edge+0x31/0x40
Jul 22 19:40:02 server kernel: [   15.526703]  [<ffffffff810a6eb5>] handle_edge_irq+0x65/0x120
Jul 22 19:40:02 server kernel: [   15.526706]  [<ffffffff81003f8d>] handle_irq+0x1d/0x30
Jul 22 19:40:02 server kernel: [   15.526708]  [<ffffffff81003918>] do_IRQ+0x58/0xe0
Jul 22 19:40:02 server kernel: [   15.526711]  [<ffffffff815588d3>] common_interrupt+0x13/0x13
Jul 22 19:40:02 server kernel: [   15.526712]  <EOI>  [<ffffffff8106bf5d>] ? sched_clock_local+0x1d/0x90
Jul 22 19:40:02 server kernel: [   15.526717]  [<ffffffff81009e25>] ? default_idle+0x55/0x170
Jul 22 19:40:02 server kernel: [   15.526719]  [<ffffffff81009fc3>] amd_e400_idle+0x83/0x100
Jul 22 19:40:02 server kernel: [   15.526721]  [<ffffffff8155bb55>] ? atomic_notifier_call_chain+0x15/0x20
Jul 22 19:40:02 server kernel: [   15.526723]  [<ffffffff81001fd9>] cpu_idle+0x59/0xb0
Jul 22 19:40:02 server kernel: [   15.526726]  [<ffffffff81551552>] start_secondary+0x181/0x185
Jul 22 19:40:02 server kernel: [   15.526727] ---[ end trace 5bac97729a402448 ]---

(dual core phenom 2, dmesg etc on request)

Cheers
David


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 19:11                 ` David
@ 2011-07-22 19:21                   ` Linus Torvalds
  2011-07-22 19:44                     ` Ben Greear
  0 siblings, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2011-07-22 19:21 UTC (permalink / raw)
  To: David; +Cc: Linux Kernel Mailing List, netdev

On Fri, Jul 22, 2011 at 12:11 PM, David <david@unsolicited.net> wrote:
>
> I'm getting the following warning at boot from 3.0, everything seems to be fine otherwise though.
>
> Jul 22 19:40:02 server kernel: [   15.526629] ------------[ cut here ]------------
> Jul 22 19:40:02 server kernel: [   15.526635] WARNING: at kernel/timer.c:1011 del_timer_sync+0x4e/0x50()

Hmm. That looks like a real bug: you shouldn't do a "del_timer_sync()"
from an interrupt. It probably works, but it sounds like a really bad
idea.

> Jul 22 19:40:02 server kernel: [   15.526677]  [<ffffffff81054b4e>] del_timer_sync+0x4e/0x50
> Jul 22 19:40:02 server kernel: [   15.526679]  [<ffffffff8145e224>] linkwatch_schedule_work+0x84/0xa0
> Jul 22 19:40:02 server kernel: [   15.526681]  [<ffffffff8145e2bc>] linkwatch_fire_event+0x7c/0x100
> Jul 22 19:40:02 server kernel: [   15.526684]  [<ffffffff8146b1ed>] netif_carrier_on+0x2d/0x40
> Jul 22 19:40:02 server kernel: [   15.526689]  [<ffffffffa006b6fb>] __rtl8169_check_link_status+0x4b/0xc0 [r8169]
> Jul 22 19:40:02 server kernel: [   15.526693]  [<ffffffffa006c016>] rtl8169_interrupt+0x166/0x3a0 [r8169]
> Jul 22 19:40:02 server kernel: [   15.526696]  [<ffffffff810a4385>] handle_irq_event_percpu+0x55/0x1f0
> Jul 22 19:40:02 server kernel: [   15.526698]  [<ffffffff810a4551>] handle_irq_event+0x31/0x50

I'm not seeing a lot of changes in any of these areas, though. I
wonder what made it start happen.

                     Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 19:21                   ` Linus Torvalds
@ 2011-07-22 19:44                     ` Ben Greear
  2011-07-22 20:32                       ` Stephen Hemminger
  0 siblings, 1 reply; 84+ messages in thread
From: Ben Greear @ 2011-07-22 19:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David, Linux Kernel Mailing List, netdev

On 07/22/2011 12:21 PM, Linus Torvalds wrote:
> On Fri, Jul 22, 2011 at 12:11 PM, David<david@unsolicited.net>  wrote:
>>
>> I'm getting the following warning at boot from 3.0, everything seems to be fine otherwise though.
>>
>> Jul 22 19:40:02 server kernel: [   15.526629] ------------[ cut here ]------------
>> Jul 22 19:40:02 server kernel: [   15.526635] WARNING: at kernel/timer.c:1011 del_timer_sync+0x4e/0x50()
>
> Hmm. That looks like a real bug: you shouldn't do a "del_timer_sync()"
> from an interrupt. It probably works, but it sounds like a really bad
> idea.
>
>> Jul 22 19:40:02 server kernel: [   15.526677]  [<ffffffff81054b4e>] del_timer_sync+0x4e/0x50
>> Jul 22 19:40:02 server kernel: [   15.526679]  [<ffffffff8145e224>] linkwatch_schedule_work+0x84/0xa0
>> Jul 22 19:40:02 server kernel: [   15.526681]  [<ffffffff8145e2bc>] linkwatch_fire_event+0x7c/0x100
>> Jul 22 19:40:02 server kernel: [   15.526684]  [<ffffffff8146b1ed>] netif_carrier_on+0x2d/0x40
>> Jul 22 19:40:02 server kernel: [   15.526689]  [<ffffffffa006b6fb>] __rtl8169_check_link_status+0x4b/0xc0 [r8169]
>> Jul 22 19:40:02 server kernel: [   15.526693]  [<ffffffffa006c016>] rtl8169_interrupt+0x166/0x3a0 [r8169]
>> Jul 22 19:40:02 server kernel: [   15.526696]  [<ffffffff810a4385>] handle_irq_event_percpu+0x55/0x1f0
>> Jul 22 19:40:02 server kernel: [   15.526698]  [<ffffffff810a4551>] handle_irq_event+0x31/0x50
>
> I'm not seeing a lot of changes in any of these areas, though. I
> wonder what made it start happen.

This has been around since at least 2.6.38.  I haven't tested recently
on my rtl8169 system, but I don't recall seeing any attempts to fix
it...

http://permalink.gmane.org/gmane.linux.network/193565
http://lists.openwall.net/netdev/2011/05/04/183

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 18:00                     ` Keith Packard
@ 2011-07-22 20:23                       ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 20:23 UTC (permalink / raw)
  To: Keith Packard
  Cc: Pekka Enberg, Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

Keith,

first of all thanks for your prompt reply. Then...

On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> 
> > And now after v3.0 is out, I've tested it again, and yes, like it was
> > broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> > bad io access the system freezes completely:
> 
> I looked at this when I first saw it (a couple of weeks ago), and I
> couldn't see any obvious reason this patch would cause this particular
> problem. I didn't want to revert the patch at that point as I feared it
> would cause other subtle problems. Given that you've got a work-around,
> it seemed best to just push this off past 3.0.

What kind of a workaround are you talking about? Sorry, to me it all
looked like "UMS is being ignored forever". Anyway, let's move on to try
to solve the issue.


> Given the failing address passed to ioread32, this seems like it's
> probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
> which is an offset in 32-bit units within the hardware status page. If
> the status_page.page_addr value was zero, then the computed address
> would end up being 0x84.
> 
> And, it looks like status_page.page_addr *will* end up being zero as a
> result of the patch in question. The patch resets the entire ring
> structure contents back to the initial values, which includes smashing
> the status_page structure to zero, clearing the value of
> status_page.page_addr set in i915_init_phys_hws.
> 
> Here's an untested patch which moves the initialization of
> status_page.page_addr into intel_render_ring_init_dri. I note that
> intel_init_render_ring_buffer *already* has the setting of the
> status_page.page_addr value, and so I've removed the setting of
> status_page.page_addr from i915_init_phys_hws.
> 
> I suspect we could remove the memset from intel_init_render_ring_buffer;
> it seems entirely superfluous given the memset in i915_init_phys_hws.
> 
> From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
> From: Keith Packard <keithp@keithp.com>
> Date: Fri, 22 Jul 2011 10:44:39 -0700
> Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
>  intel_render_ring_init_dri
> 
> Physically-addressed hardware status pages are initialized early in
> the driver load process by i915_init_phys_hws. For UMS environments,
> the ring structure is not initialized until the X server starts. At
> that point, the entire ring structure is re-initialized with all new
> values. Any values set in the ring structure (including
> ring->status_page.page_addr) will be lost when the ring is
> re-initialized.
> 
> This patch moves the initialization of the status_page.page_addr value
> to intel_render_ring_init_dri.
> 
> Signed-off-by: Keith Packard <keithp@keithp.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1271282..8a3942c 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
>  static int i915_init_phys_hws(struct drm_device *dev)
>  {
>  	drm_i915_private_t *dev_priv = dev->dev_private;
> -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
>  
>  	/* Program Hardware Status Page */
>  	dev_priv->status_page_dmah =
> @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
>  		DRM_ERROR("Can not allocate hardware status page\n");
>  		return -ENOMEM;
>  	}
> -	ring->status_page.page_addr =
> -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
>  
> -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> +	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
> +		  0, PAGE_SIZE);
>  
>  	i915_write_hws_pga(dev);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index e961568..47b9b27 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
>  		ring->get_seqno = pc_render_get_seqno;
>  	}
>  
> +	if (!I915_NEED_GFX_HWS(dev))
> +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> +
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);

I can't tell whether this is correct, because intel gfx driver is
unknown to me, but from the first glance your description sounds reasonable.

I'm out of office till ~ next week's tuesday, and on return I'll try
to test it on the hardware in question.


Thanks again,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 20:23                       ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 20:23 UTC (permalink / raw)
  To: Keith Packard
  Cc: Pekka Enberg, Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

Keith,

first of all thanks for your prompt reply. Then...

On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> 
> > And now after v3.0 is out, I've tested it again, and yes, like it was
> > broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> > bad io access the system freezes completely:
> 
> I looked at this when I first saw it (a couple of weeks ago), and I
> couldn't see any obvious reason this patch would cause this particular
> problem. I didn't want to revert the patch at that point as I feared it
> would cause other subtle problems. Given that you've got a work-around,
> it seemed best to just push this off past 3.0.

What kind of a workaround are you talking about? Sorry, to me it all
looked like "UMS is being ignored forever". Anyway, let's move on to try
to solve the issue.


> Given the failing address passed to ioread32, this seems like it's
> probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
> which is an offset in 32-bit units within the hardware status page. If
> the status_page.page_addr value was zero, then the computed address
> would end up being 0x84.
> 
> And, it looks like status_page.page_addr *will* end up being zero as a
> result of the patch in question. The patch resets the entire ring
> structure contents back to the initial values, which includes smashing
> the status_page structure to zero, clearing the value of
> status_page.page_addr set in i915_init_phys_hws.
> 
> Here's an untested patch which moves the initialization of
> status_page.page_addr into intel_render_ring_init_dri. I note that
> intel_init_render_ring_buffer *already* has the setting of the
> status_page.page_addr value, and so I've removed the setting of
> status_page.page_addr from i915_init_phys_hws.
> 
> I suspect we could remove the memset from intel_init_render_ring_buffer;
> it seems entirely superfluous given the memset in i915_init_phys_hws.
> 
> From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
> From: Keith Packard <keithp@keithp.com>
> Date: Fri, 22 Jul 2011 10:44:39 -0700
> Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
>  intel_render_ring_init_dri
> 
> Physically-addressed hardware status pages are initialized early in
> the driver load process by i915_init_phys_hws. For UMS environments,
> the ring structure is not initialized until the X server starts. At
> that point, the entire ring structure is re-initialized with all new
> values. Any values set in the ring structure (including
> ring->status_page.page_addr) will be lost when the ring is
> re-initialized.
> 
> This patch moves the initialization of the status_page.page_addr value
> to intel_render_ring_init_dri.
> 
> Signed-off-by: Keith Packard <keithp@keithp.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1271282..8a3942c 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
>  static int i915_init_phys_hws(struct drm_device *dev)
>  {
>  	drm_i915_private_t *dev_priv = dev->dev_private;
> -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
>  
>  	/* Program Hardware Status Page */
>  	dev_priv->status_page_dmah =
> @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
>  		DRM_ERROR("Can not allocate hardware status page\n");
>  		return -ENOMEM;
>  	}
> -	ring->status_page.page_addr =
> -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
>  
> -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> +	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
> +		  0, PAGE_SIZE);
>  
>  	i915_write_hws_pga(dev);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index e961568..47b9b27 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
>  		ring->get_seqno = pc_render_get_seqno;
>  	}
>  
> +	if (!I915_NEED_GFX_HWS(dev))
> +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> +
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);

I can't tell whether this is correct, because intel gfx driver is
unknown to me, but from the first glance your description sounds reasonable.

I'm out of office till ~ next week's tuesday, and on return I'll try
to test it on the hardware in question.


Thanks again,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 19:44                     ` Ben Greear
@ 2011-07-22 20:32                       ` Stephen Hemminger
  2011-07-22 20:35                         ` Linus Torvalds
  2011-07-22 21:26                         ` Francois Romieu
  0 siblings, 2 replies; 84+ messages in thread
From: Stephen Hemminger @ 2011-07-22 20:32 UTC (permalink / raw)
  To: Ben Greear, Linus Torvalds, David, Tejun Heo
  Cc: Linux Kernel Mailing List, netdev

This a regression which probably began with

commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
Author: Tejun Heo <tj@kernel.org>
Date:   Tue Jun 29 10:07:14 2010 +0200

    workqueue: implement concurrency managed dynamic worker pool

Before that it was perfectly legal for link watch code to
call schedule_delayed_work from IRQ. This should be allowable;
the code to manage the worker pool should handle it.

Network devices call netif_carrier_on/off from IRQ all the
time. The problem is that the new pool code breaks this
if link watch tries to schedule work before there enough worker
threads.

The workqueue code should have a fallback and not try and
do anything if being called from IRQ.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 20:32                       ` Stephen Hemminger
@ 2011-07-22 20:35                         ` Linus Torvalds
  2011-07-23  2:27                           ` Tejun Heo
  2011-07-22 21:26                         ` Francois Romieu
  1 sibling, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2011-07-22 20:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ben Greear, David, Tejun Heo, Linux Kernel Mailing List, netdev

On Fri, Jul 22, 2011 at 1:32 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
>
> The workqueue code should have a fallback and not try and
> do anything if being called from IRQ.

Fair enough. Especially since one of the *points* of workqueues is
indeed to schedule stuff from irqs and that cannot be done
immediately.

Tejun?

                  Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 20:23                       ` Kirill Smelkov
@ 2011-07-22 20:50                         ` Keith Packard
  -1 siblings, 0 replies; 84+ messages in thread
From: Keith Packard @ 2011-07-22 20:50 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Pekka Enberg, Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

On Sat, 23 Jul 2011 00:23:36 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:

> What kind of a workaround are you talking about?

Just reverting the commit -- that makes your machine work, even if it's
wrong for other machines.

> Sorry, to me it all looked like "UMS is being ignored forever".

You're right, of course -- UMS is a huge wart on the kernel driver at
this point, keeping it working while also adding new functionality
continues to cause challenges. We tend to expect that most people will
run reasonably contemporaneous kernel and user space code, and so three
years after the switch, it continues to surprise us when someone
actually tries UMS.

> I'm out of office till ~ next week's tuesday, and on return I'll try
> to test it on the hardware in question.

Let me know; I've pushed this patch to my drm-intel-fixes tree on
kernel.org in the meantime; if it does solve the problem, I'd like to
add your Tested-by: line.

-- 
keith.packard@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 20:50                         ` Keith Packard
  0 siblings, 0 replies; 84+ messages in thread
From: Keith Packard @ 2011-07-22 20:50 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Pekka Enberg, Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

On Sat, 23 Jul 2011 00:23:36 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:

> What kind of a workaround are you talking about?

Just reverting the commit -- that makes your machine work, even if it's
wrong for other machines.

> Sorry, to me it all looked like "UMS is being ignored forever".

You're right, of course -- UMS is a huge wart on the kernel driver at
this point, keeping it working while also adding new functionality
continues to cause challenges. We tend to expect that most people will
run reasonably contemporaneous kernel and user space code, and so three
years after the switch, it continues to surprise us when someone
actually tries UMS.

> I'm out of office till ~ next week's tuesday, and on return I'll try
> to test it on the hardware in question.

Let me know; I've pushed this patch to my drm-intel-fixes tree on
kernel.org in the meantime; if it does solve the problem, I'd like to
add your Tested-by: line.

-- 
keith.packard@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 20:50                         ` Keith Packard
@ 2011-07-22 21:08                           ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 21:08 UTC (permalink / raw)
  To: Keith Packard
  Cc: Pekka Enberg, Chris Wilson, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Herbert Xu, Linus Torvalds,
	Andrew Morton, Florian Mickler

On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
> On Sat, 23 Jul 2011 00:23:36 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> 
> > What kind of a workaround are you talking about?
> 
> Just reverting the commit -- that makes your machine work, even if it's
> wrong for other machines.

Yes, I could revert it. But since the driver is reasonably complex, it
is better to know what I'm doing and that the change makes sense,
especially when it's not "my machine", but lots of target boards located
all over the country.

That's why I wanted, and imho reasonably, because I did the homework,
your feedback - to be not on my own, alone.


> > Sorry, to me it all looked like "UMS is being ignored forever".
> 
> You're right, of course -- UMS is a huge wart on the kernel driver at
> this point, keeping it working while also adding new functionality
> continues to cause challenges. We tend to expect that most people will
> run reasonably contemporaneous kernel and user space code, and so three
> years after the switch, it continues to surprise us when someone
> actually tries UMS.

We are planning upgrade to KMS too. The kernel is upgraded more often
compared to userspace, because of already mentioned (thanks!) "no
regression" rule. Userspace is more complex and more work in my context,
so it is lagging, but eventually we'll get there.

So I hope some day, when everyone upgrades, UMS support could be
"cleaned up" out from the driver.


> > I'm out of office till ~ next week's tuesday, and on return I'll try
> > to test it on the hardware in question.
> 
> Let me know; I've pushed this patch to my drm-intel-fixes tree on
> kernel.org in the meantime; if it does solve the problem, I'd like to
> add your Tested-by: line.

Yes, sure, I'll let you know the results.


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 21:08                           ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 21:08 UTC (permalink / raw)
  To: Keith Packard
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
> On Sat, 23 Jul 2011 00:23:36 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> 
> > What kind of a workaround are you talking about?
> 
> Just reverting the commit -- that makes your machine work, even if it's
> wrong for other machines.

Yes, I could revert it. But since the driver is reasonably complex, it
is better to know what I'm doing and that the change makes sense,
especially when it's not "my machine", but lots of target boards located
all over the country.

That's why I wanted, and imho reasonably, because I did the homework,
your feedback - to be not on my own, alone.


> > Sorry, to me it all looked like "UMS is being ignored forever".
> 
> You're right, of course -- UMS is a huge wart on the kernel driver at
> this point, keeping it working while also adding new functionality
> continues to cause challenges. We tend to expect that most people will
> run reasonably contemporaneous kernel and user space code, and so three
> years after the switch, it continues to surprise us when someone
> actually tries UMS.

We are planning upgrade to KMS too. The kernel is upgraded more often
compared to userspace, because of already mentioned (thanks!) "no
regression" rule. Userspace is more complex and more work in my context,
so it is lagging, but eventually we'll get there.

So I hope some day, when everyone upgrades, UMS support could be
"cleaned up" out from the driver.


> > I'm out of office till ~ next week's tuesday, and on return I'll try
> > to test it on the hardware in question.
> 
> Let me know; I've pushed this patch to my drm-intel-fixes tree on
> kernel.org in the meantime; if it does solve the problem, I'd like to
> add your Tested-by: line.

Yes, sure, I'll let you know the results.


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 20:32                       ` Stephen Hemminger
  2011-07-22 20:35                         ` Linus Torvalds
@ 2011-07-22 21:26                         ` Francois Romieu
  2011-07-22 22:09                           ` Stephen Hemminger
  1 sibling, 1 reply; 84+ messages in thread
From: Francois Romieu @ 2011-07-22 21:26 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ben Greear, Linus Torvalds, David, Tejun Heo,
	Linux Kernel Mailing List, netdev

Stephen Hemminger <shemminger@vyatta.com> :
> This a regression which probably began with
> 
> commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
> Author: Tejun Heo <tj@kernel.org>
> Date:   Tue Jun 29 10:07:14 2010 +0200
> 
>     workqueue: implement concurrency managed dynamic worker pool
> 
> Before that it was perfectly legal for link watch code to
> call schedule_delayed_work from IRQ. This should be allowable;
> the code to manage the worker pool should handle it.

I beg to differ: see Ben's first report
(http://lists.openwall.net/netdev/2011/05/04/183).
                                       ^^

One of the code path in the netif_carrier code leads it to try and disable
a late workqueue to reenable it immediately (mod_workqueue anyone ?):
netif_carrier_on
-> linkwatch_fire_event
   -> linkwatch_schedule_work
      -> cancel_delayed_work
         -> del_timer_sync

The del_timer_sync has been here for ages. Afaiks it is not a new pool code
nor a schedule_delayed_work only problem.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 21:08                           ` Kirill Smelkov
@ 2011-07-22 21:31                             ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 21:31 UTC (permalink / raw)
  To: Keith Packard
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
> On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:

> > You're right, of course -- UMS is a huge wart on the kernel driver at
> > this point, keeping it working while also adding new functionality
> > continues to cause challenges. We tend to expect that most people will
> > run reasonably contemporaneous kernel and user space code, and so three
> > years after the switch, it continues to surprise us when someone
> > actually tries UMS.
> 
> We are planning upgrade to KMS too. The kernel is upgraded more often
> compared to userspace, because of already mentioned (thanks!) "no
> regression" rule. Userspace is more complex and more work in my context,
> so it is lagging, but eventually we'll get there.

Also wanted to say, that if whole X could be built, like the kernel, from one
repo without multirepo-setup tool, with 100% reliable working
incremental rebuild, etc... it would be a bit easier to upgrade X too.

Sorry for being a bit offtopic, could not resist. I was keeping that
though in my head for ~ 2 years already, and now had a chance to mention it.



Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-22 21:31                             ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-22 21:31 UTC (permalink / raw)
  To: Keith Packard
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
> On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:

> > You're right, of course -- UMS is a huge wart on the kernel driver at
> > this point, keeping it working while also adding new functionality
> > continues to cause challenges. We tend to expect that most people will
> > run reasonably contemporaneous kernel and user space code, and so three
> > years after the switch, it continues to surprise us when someone
> > actually tries UMS.
> 
> We are planning upgrade to KMS too. The kernel is upgraded more often
> compared to userspace, because of already mentioned (thanks!) "no
> regression" rule. Userspace is more complex and more work in my context,
> so it is lagging, but eventually we'll get there.

Also wanted to say, that if whole X could be built, like the kernel, from one
repo without multirepo-setup tool, with 100% reliable working
incremental rebuild, etc... it would be a bit easier to upgrade X too.

Sorry for being a bit offtopic, could not resist. I was keeping that
though in my head for ~ 2 years already, and now had a chance to mention it.



Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 21:26                         ` Francois Romieu
@ 2011-07-22 22:09                           ` Stephen Hemminger
  2011-07-22 22:53                             ` [PATCH] net: allow netif_carrier to be called safely from IRQ Stephen Hemminger
  0 siblings, 1 reply; 84+ messages in thread
From: Stephen Hemminger @ 2011-07-22 22:09 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Ben Greear, Linus Torvalds, David, Tejun Heo,
	Linux Kernel Mailing List, netdev

On Fri, 22 Jul 2011 23:26:29 +0200
Francois Romieu <romieu@fr.zoreil.com> wrote:

> Stephen Hemminger <shemminger@vyatta.com> :
> > This a regression which probably began with
> > 
> > commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
> > Author: Tejun Heo <tj@kernel.org>
> > Date:   Tue Jun 29 10:07:14 2010 +0200
> > 
> >     workqueue: implement concurrency managed dynamic worker pool
> > 
> > Before that it was perfectly legal for link watch code to
> > call schedule_delayed_work from IRQ. This should be allowable;
> > the code to manage the worker pool should handle it.
> 
> I beg to differ: see Ben's first report
> (http://lists.openwall.net/netdev/2011/05/04/183).
>                                        ^^
> 
> One of the code path in the netif_carrier code leads it to try and disable
> a late workqueue to reenable it immediately (mod_workqueue anyone ?):
> netif_carrier_on
> -> linkwatch_fire_event
>    -> linkwatch_schedule_work
>       -> cancel_delayed_work
>          -> del_timer_sync
> 
> The del_timer_sync has been here for ages. Afaiks it is not a new pool code
> nor a schedule_delayed_work only problem.
> 

That path can be fixed by calling _cancel_delayed_work() instead.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH] net: allow netif_carrier to be called safely from IRQ
  2011-07-22 22:09                           ` Stephen Hemminger
@ 2011-07-22 22:53                             ` Stephen Hemminger
  2011-07-23  0:16                               ` David Miller
  0 siblings, 1 reply; 84+ messages in thread
From: Stephen Hemminger @ 2011-07-22 22:53 UTC (permalink / raw)
  To: David Miller
  Cc: Francois Romieu, Ben Greear, Linus Torvalds, David, Tejun Heo,
	Linux Kernel Mailing List, netdev

As reported by Ben Greer and Froncois Romieu. The code path in 
the netif_carrier code leads it to try and disable
a late workqueue to reenable it immediately
netif_carrier_on
-> linkwatch_fire_event
   -> linkwatch_schedule_work
      -> cancel_delayed_work
         -> del_timer_sync  

If __cancel_delayed_work is used instead then there is no
problem of waiting for running linkwatch_event.

There is a race between linkwatch_event running re-scheduling
but it is harmless to schedule an extra scan of the linkwatch queue.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/core/link_watch.c	2011-07-22 15:25:31.027533604 -0700
+++ b/net/core/link_watch.c	2011-07-22 15:31:27.531520028 -0700
@@ -126,7 +126,7 @@ static void linkwatch_schedule_work(int
 		return;
 
 	/* It's already running which is good enough. */
-	if (!cancel_delayed_work(&linkwatch_work))
+	if (!__cancel_delayed_work(&linkwatch_work))
 		return;
 
 	/* Otherwise we reschedule it again for immediate execution. */


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release - btrfs possible locking deadlock
  2011-07-22  2:59               ` Linux 3.0 release Linus Torvalds
                                   ` (2 preceding siblings ...)
  2011-07-22 19:11                 ` David
@ 2011-07-22 23:21                 ` Ed Tomlinson
  2011-07-25 19:49                   ` Chris Mason
  2011-07-24 22:04                 ` Linux 3.0 release Arnaud Lacombe
  4 siblings, 1 reply; 84+ messages in thread
From: Ed Tomlinson @ 2011-07-22 23:21 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linus Torvalds, Chris Mason, linux-btrfs, Josef Bacik

On Thursday 21 July 2011 22:59:53 Linus Torvalds wrote:
> So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
> 

Hi,

Managed to get this with btrfs rsync(ing) from ext4 to a btrfs fs with three partitions using raid1.

[16018.211493] device fsid f7186eeb-60df-4b1a-890a-4a1eb42f81fe devid 1 transid 10 /dev/sdd4
[16018.230643] btrfs: use lzo compression
[16018.234619] btrfs: enabling disk space caching
[25949.414011] 
[25949.414011] =======================================================
[25949.416549] [ INFO: possible circular locking dependency detected ]
[25949.423187] 3.0.0-crc+ #348
[25949.423187] -------------------------------------------------------
[25949.423187] rsync/20237 is trying to acquire lock:
[25949.423187]  (btrfs-extent-01){+.+...}, at: [<ffffffffa047ce88>] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187] 
[25949.423187] but task is already holding lock:
[25949.423187]  (&(&eb->lock)->rlock){+.+...}, at: [<ffffffffa047cee2>] btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
[25949.423187] 
[25949.423187] which lock already depends on the new lock.
[25949.423187] 
[25949.423187] 
[25949.423187] the existing dependency chain (in reverse order) is:
[25949.423187] 
[25949.423187] -> #1 (&(&eb->lock)->rlock){+.+...}:
[25949.423187]        [<ffffffff8108bb75>] lock_acquire+0x95/0x140
[25949.423187]        [<ffffffff815792eb>] _raw_spin_lock+0x3b/0x50
[25949.423187]        [<ffffffffa047ce88>] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]        [<ffffffffa0427959>] btrfs_search_slot+0x2e9/0x800 [btrfs]
[25949.423187]        [<ffffffffa0433bee>] lookup_inline_extent_backref+0xbe/0x490 [btrfs]
[25949.423187]        [<ffffffffa0434cbb>] __btrfs_free_extent+0x13b/0x900 [btrfs]
[25949.423187]        [<ffffffffa0435ca3>] run_clustered_refs+0x823/0xaf0 [btrfs]
[25949.423187]        [<ffffffffa043603d>] btrfs_run_delayed_refs+0xcd/0x290 [btrfs]
[25949.423187]        [<ffffffffa0445ecb>] btrfs_commit_transaction+0x8b/0x9d0 [btrfs]
[25949.423187]        [<ffffffffa0440c06>] transaction_kthread+0x2b6/0x2e0 [btrfs]
[25949.423187]        [<ffffffff81071536>] kthread+0xb6/0xc0
[25949.423187]        [<ffffffff81582314>] kernel_thread_helper+0x4/0x10
[25949.423187] 
[25949.423187] -> #0 (btrfs-extent-01){+.+...}:
[25949.423187]        [<ffffffff8108b468>] __lock_acquire+0x1588/0x16a0
[25949.423187]        [<ffffffff8108bb75>] lock_acquire+0x95/0x140
[25949.423187]        [<ffffffff815792eb>] _raw_spin_lock+0x3b/0x50
[25949.423187]        [<ffffffffa047ce88>] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]        [<ffffffffa0427959>] btrfs_search_slot+0x2e9/0x800 [btrfs]
[25949.423187]        [<ffffffffa0439dd2>] btrfs_lookup_dir_item+0x82/0x120 [btrfs]
[25949.423187]        [<ffffffffa04532a5>] btrfs_lookup_dentry+0xc5/0x4c0 [btrfs]
[25949.423187]        [<ffffffffa04536c4>] btrfs_lookup+0x24/0x70 [btrfs]
[25949.423187]        [<ffffffff8115a863>] d_alloc_and_lookup+0xc3/0x100
[25949.423187]        [<ffffffff8115cfa0>] do_lookup+0x260/0x480
[25949.423187]        [<ffffffff8115d540>] walk_component+0x60/0x1f0
[25949.423187]        [<ffffffff8115e7aa>] path_lookupat+0xea/0x620
[25949.423187]        [<ffffffff8115ed15>] do_path_lookup+0x35/0x1c0
[25949.423187]        [<ffffffff8115fc38>] user_path_at+0x98/0xe0
[25949.423187]        [<ffffffff81153fac>] vfs_fstatat+0x4c/0x90
[25949.423187]        [<ffffffff8115405e>] vfs_lstat+0x1e/0x20
[25949.423187]        [<ffffffff81154084>] sys_newlstat+0x24/0x50
[25949.423187]        [<ffffffff815814eb>] system_call_fastpath+0x16/0x1b
[25949.423187] 
[25949.423187] other info that might help us debug this:
[25949.423187] 
[25949.423187]  Possible unsafe locking scenario:
[25949.423187] 
[25949.423187]        CPU0                    CPU1
[25949.423187]        ----                    ----
[25949.423187]   lock(&(&eb->lock)->rlock);
[25949.423187]                                lock(btrfs-extent-01);
[25949.423187]                                lock(&(&eb->lock)->rlock);
[25949.423187]   lock(btrfs-extent-01);
[25949.423187] 
[25949.423187]  *** DEADLOCK ***
[25949.423187] 
[25949.423187] 2 locks held by rsync/20237:
[25949.423187]  #0:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff8115cf5a>] do_lookup+0x21a/0x480
[25949.423187]  #1:  (&(&eb->lock)->rlock){+.+...}, at: [<ffffffffa047cee2>] btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
[25949.423187] 
[25949.423187] stack backtrace:
[25949.423187] Pid: 20237, comm: rsync Not tainted 3.0.0-crc+ #348
[25949.423187] Call Trace:
[25949.423187]  [<ffffffff810887de>] print_circular_bug+0x20e/0x2f0
[25949.423187]  [<ffffffff8108b468>] __lock_acquire+0x1588/0x16a0
[25949.423187]  [<ffffffffa0441ebb>] ? verify_parent_transid+0xcb/0x290 [btrfs]
[25949.423187]  [<ffffffffa047ce88>] ? btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]  [<ffffffff8108bb75>] lock_acquire+0x95/0x140
[25949.423187]  [<ffffffffa047ce88>] ? btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]  [<ffffffff815792eb>] _raw_spin_lock+0x3b/0x50
[25949.423187]  [<ffffffffa047ce88>] ? btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]  [<ffffffffa047ce88>] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]  [<ffffffffa0427959>] btrfs_search_slot+0x2e9/0x800 [btrfs]
[25949.423187]  [<ffffffff8108a0ca>] ? __lock_acquire+0x1ea/0x16a0
[25949.423187]  [<ffffffffa0439dd2>] btrfs_lookup_dir_item+0x82/0x120 [btrfs]
[25949.423187]  [<ffffffff8114186e>] ? kmem_cache_alloc+0xde/0x1e0
[25949.423187]  [<ffffffffa04532a5>] btrfs_lookup_dentry+0xc5/0x4c0 [btrfs]
[25949.423187]  [<ffffffff812924fe>] ? do_raw_spin_lock+0xde/0x1c0
[25949.423187]  [<ffffffff8157d541>] ? sub_preempt_count+0x51/0x60
[25949.423187]  [<ffffffffa04536c4>] btrfs_lookup+0x24/0x70 [btrfs]
[25949.423187]  [<ffffffff8115a863>] d_alloc_and_lookup+0xc3/0x100
[25949.423187]  [<ffffffff8115cfa0>] do_lookup+0x260/0x480
[25949.423187]  [<ffffffff8115d540>] walk_component+0x60/0x1f0
[25949.423187]  [<ffffffff8115e7aa>] path_lookupat+0xea/0x620
[25949.423187]  [<ffffffff8111a3a3>] ? might_fault+0x53/0xb0
[25949.423187]  [<ffffffff8115ed15>] do_path_lookup+0x35/0x1c0
[25949.423187]  [<ffffffff8115fc38>] user_path_at+0x98/0xe0
[25949.423187]  [<ffffffff8111a3ec>] ? might_fault+0x9c/0xb0
[25949.423187]  [<ffffffff8111a3a3>] ? might_fault+0x53/0xb0
[25949.423187]  [<ffffffff81153d78>] ? cp_new_stat+0xf8/0x110
[25949.423187]  [<ffffffff81153fac>] vfs_fstatat+0x4c/0x90
[25949.423187]  [<ffffffff8115405e>] vfs_lstat+0x1e/0x20
[25949.423187]  [<ffffffff81154084>] sys_newlstat+0x24/0x50
[25949.423187]  [<ffffffff81089c3d>] ? trace_hardirqs_on_caller+0x14d/0x190
[25949.423187]  [<ffffffff8128c23e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[25949.423187]  [<ffffffff815814eb>] system_call_fastpath+0x16/0x1b

Kernel is 3.0.0 without any extras.

Ideas?
Ed 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH] net: allow netif_carrier to be called safely from IRQ
  2011-07-22 22:53                             ` [PATCH] net: allow netif_carrier to be called safely from IRQ Stephen Hemminger
@ 2011-07-23  0:16                               ` David Miller
  0 siblings, 0 replies; 84+ messages in thread
From: David Miller @ 2011-07-23  0:16 UTC (permalink / raw)
  To: shemminger; +Cc: romieu, greearb, torvalds, david, tj, linux-kernel, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 22 Jul 2011 15:53:56 -0700

> As reported by Ben Greer and Froncois Romieu. The code path in 
> the netif_carrier code leads it to try and disable
> a late workqueue to reenable it immediately
> netif_carrier_on
> -> linkwatch_fire_event
>    -> linkwatch_schedule_work
>       -> cancel_delayed_work
>          -> del_timer_sync  
> 
> If __cancel_delayed_work is used instead then there is no
> problem of waiting for running linkwatch_event.
> 
> There is a race between linkwatch_event running re-scheduling
> but it is harmless to schedule an extra scan of the linkwatch queue.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22 20:35                         ` Linus Torvalds
@ 2011-07-23  2:27                           ` Tejun Heo
  2011-07-23  2:30                             ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2011-07-23  2:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen Hemminger, Ben Greear, David, Linux Kernel Mailing List, netdev

Hello, Stephen, Linus.

On Fri, Jul 22, 2011 at 01:35:16PM -0700, Linus Torvalds wrote:
> On Fri, Jul 22, 2011 at 1:32 PM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
> >
> > The workqueue code should have a fallback and not try and
> > do anything if being called from IRQ.
> 
> Fair enough. Especially since one of the *points* of workqueues is
> indeed to schedule stuff from irqs and that cannot be done
> immediately.
> 
> Tejun?

It seems to have been already tracked down but, just to be clear.
Nothing changed regarding synchronization requirements for all the
queue, flush and cancel functions.  If it worked before cmwq, it
should work with cmwq.

While on the topic, we do have some workqueue API problems.  The
delayed ones are a bit screwy.  e.g. requeueing an already pending
delayed work item should probably update the timer but it doesn't andp
we have a bunch of users doing cancel/requeue or using separate timers
for that.  Also, the cancel/flush[_sync] variants are subtly different
making using the correct one difficult, which has possibility of
introducing bugs which are extremely difficult to reproduce.

Again, most of these had accumulated well before cmwq came into the
picture.  I think we need to make workqueue simpler and easier to use.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-23  2:27                           ` Tejun Heo
@ 2011-07-23  2:30                             ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2011-07-23  2:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen Hemminger, Ben Greear, David, Linux Kernel Mailing List, netdev

On Sat, Jul 23, 2011 at 04:27:15AM +0200, Tejun Heo wrote:
> While on the topic, we do have some workqueue API problems.  The
> delayed ones are a bit screwy.  e.g. requeueing an already pending
> delayed work item should probably update the timer but it doesn't andp
> we have a bunch of users doing cancel/requeue or using separate timers
> for that.

(after reading the other branch of the thread) Ooh, bingo, this
actually was the issue which triggered the problem reported here. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 21:31                             ` Kirill Smelkov
@ 2011-07-23 15:10                               ` Alex Deucher
  -1 siblings, 0 replies; 84+ messages in thread
From: Alex Deucher @ 2011-07-23 15:10 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Keith Packard, Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx,
	LKML, dri-devel, Rafael J. Wysocki, Ray Lee, Andrew Morton,
	Linus Torvalds

On Fri, Jul 22, 2011 at 5:31 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
>> On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
>
>> > You're right, of course -- UMS is a huge wart on the kernel driver at
>> > this point, keeping it working while also adding new functionality
>> > continues to cause challenges. We tend to expect that most people will
>> > run reasonably contemporaneous kernel and user space code, and so three
>> > years after the switch, it continues to surprise us when someone
>> > actually tries UMS.
>>
>> We are planning upgrade to KMS too. The kernel is upgraded more often
>> compared to userspace, because of already mentioned (thanks!) "no
>> regression" rule. Userspace is more complex and more work in my context,
>> so it is lagging, but eventually we'll get there.
>
> Also wanted to say, that if whole X could be built, like the kernel, from one
> repo without multirepo-setup tool, with 100% reliable working
> incremental rebuild, etc... it would be a bit easier to upgrade X too.
>
> Sorry for being a bit offtopic, could not resist. I was keeping that
> though in my head for ~ 2 years already, and now had a chance to mention it.

You don't have to rebuild all of X to use KMS.  In most cases, you
just need to update the ddx for your card.

Alex

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-23 15:10                               ` Alex Deucher
  0 siblings, 0 replies; 84+ messages in thread
From: Alex Deucher @ 2011-07-23 15:10 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Fri, Jul 22, 2011 at 5:31 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
>> On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
>
>> > You're right, of course -- UMS is a huge wart on the kernel driver at
>> > this point, keeping it working while also adding new functionality
>> > continues to cause challenges. We tend to expect that most people will
>> > run reasonably contemporaneous kernel and user space code, and so three
>> > years after the switch, it continues to surprise us when someone
>> > actually tries UMS.
>>
>> We are planning upgrade to KMS too. The kernel is upgraded more often
>> compared to userspace, because of already mentioned (thanks!) "no
>> regression" rule. Userspace is more complex and more work in my context,
>> so it is lagging, but eventually we'll get there.
>
> Also wanted to say, that if whole X could be built, like the kernel, from one
> repo without multirepo-setup tool, with 100% reliable working
> incremental rebuild, etc... it would be a bit easier to upgrade X too.
>
> Sorry for being a bit offtopic, could not resist. I was keeping that
> though in my head for ~ 2 years already, and now had a chance to mention it.

You don't have to rebuild all of X to use KMS.  In most cases, you
just need to update the ddx for your card.

Alex

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 20:50                         ` Keith Packard
  (?)
  (?)
@ 2011-07-23 15:55                         ` Pekka Enberg
  2011-07-25  4:29                           ` Keith Packard
  -1 siblings, 1 reply; 84+ messages in thread
From: Pekka Enberg @ 2011-07-23 15:55 UTC (permalink / raw)
  To: Keith Packard
  Cc: Kirill Smelkov, Chris Wilson, Luke-Jr, intel-gfx, LKML,
	dri-devel, Rafael J. Wysocki, Ray Lee, Herbert Xu,
	Linus Torvalds, Andrew Morton, Florian Mickler

Hi Keith,

On Fri, 22 Jul 2011, Keith Packard wrote:
>> Sorry, to me it all looked like "UMS is being ignored forever".
>
> You're right, of course -- UMS is a huge wart on the kernel driver at
> this point, keeping it working while also adding new functionality
> continues to cause challenges. We tend to expect that most people will
> run reasonably contemporaneous kernel and user space code, and so three
> years after the switch, it continues to surprise us when someone
> actually tries UMS.

I know I sound like a broken record but I really wish you i915 devs were 
little more eager to revert broken patches early rather than late. I mean, 
this particular breakage was already bisected but nobody said or 
did anything - and it's not like it's the first time either!

I suppose I need to bribe Linus somehow to be more strict with you folks.

 			Pekka

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-23 15:10                               ` Alex Deucher
@ 2011-07-23 18:19                                 ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-23 18:19 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Keith Packard, Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx,
	LKML, dri-devel, Rafael J. Wysocki, Ray Lee, Andrew Morton,
	Linus Torvalds

On Sat, Jul 23, 2011 at 11:10:53AM -0400, Alex Deucher wrote:
> On Fri, Jul 22, 2011 at 5:31 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
> >> On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
> >
> >> > You're right, of course -- UMS is a huge wart on the kernel driver at
> >> > this point, keeping it working while also adding new functionality
> >> > continues to cause challenges. We tend to expect that most people will
> >> > run reasonably contemporaneous kernel and user space code, and so three
> >> > years after the switch, it continues to surprise us when someone
> >> > actually tries UMS.
> >>
> >> We are planning upgrade to KMS too. The kernel is upgraded more often
> >> compared to userspace, because of already mentioned (thanks!) "no
> >> regression" rule. Userspace is more complex and more work in my context,
> >> so it is lagging, but eventually we'll get there.
> >
> > Also wanted to say, that if whole X could be built, like the kernel, from one
> > repo without multirepo-setup tool, with 100% reliable working
> > incremental rebuild, etc... it would be a bit easier to upgrade X too.
> >
> > Sorry for being a bit offtopic, could not resist. I was keeping that
> > though in my head for ~ 2 years already, and now had a chance to mention it.
> 
> You don't have to rebuild all of X to use KMS.  In most cases, you
> just need to update the ddx for your card.

I meant the rebuilt not to use KMS, but general case. To me the kernel
has one of the great advantage of being lots of self-consistent code
because of being maintained in one repo + good build system + good
development process. And as the result it is (relatively) easy to
upgrade.

Anyway, this is just a note from both kernel and X stranger, so
whatever...


Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-23 18:19                                 ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-23 18:19 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Sat, Jul 23, 2011 at 11:10:53AM -0400, Alex Deucher wrote:
> On Fri, Jul 22, 2011 at 5:31 PM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
> >> On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
> >
> >> > You're right, of course -- UMS is a huge wart on the kernel driver at
> >> > this point, keeping it working while also adding new functionality
> >> > continues to cause challenges. We tend to expect that most people will
> >> > run reasonably contemporaneous kernel and user space code, and so three
> >> > years after the switch, it continues to surprise us when someone
> >> > actually tries UMS.
> >>
> >> We are planning upgrade to KMS too. The kernel is upgraded more often
> >> compared to userspace, because of already mentioned (thanks!) "no
> >> regression" rule. Userspace is more complex and more work in my context,
> >> so it is lagging, but eventually we'll get there.
> >
> > Also wanted to say, that if whole X could be built, like the kernel, from one
> > repo without multirepo-setup tool, with 100% reliable working
> > incremental rebuild, etc... it would be a bit easier to upgrade X too.
> >
> > Sorry for being a bit offtopic, could not resist. I was keeping that
> > though in my head for ~ 2 years already, and now had a chance to mention it.
> 
> You don't have to rebuild all of X to use KMS.  In most cases, you
> just need to update the ddx for your card.

I meant the rebuilt not to use KMS, but general case. To me the kernel
has one of the great advantage of being lots of self-consistent code
because of being maintained in one repo + good build system + good
development process. And as the result it is (relatively) easy to
upgrade.

Anyway, this is just a note from both kernel and X stranger, so
whatever...


Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-22  2:59               ` Linux 3.0 release Linus Torvalds
                                   ` (3 preceding siblings ...)
  2011-07-22 23:21                 ` Linux 3.0 release - btrfs possible locking deadlock Ed Tomlinson
@ 2011-07-24 22:04                 ` Arnaud Lacombe
  2011-07-25  2:21                   ` Yoshinori Sato
  4 siblings, 1 reply; 84+ messages in thread
From: Arnaud Lacombe @ 2011-07-24 22:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Greg KH, Yoshinori Sato

Hi,

On Thu, Jul 21, 2011 at 10:59 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
> [...]
For the record, if anybody cares, arch/h8300 no longer configure since 2.6.38:

% make ARCH=h8300 menuconfig
make: h8300-elf-gcc: Command not found
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/lxdialog/checklist.o
  HOSTCC  scripts/kconfig/lxdialog/inputbox.o
  HOSTCC  scripts/kconfig/lxdialog/menubox.o
  HOSTCC  scripts/kconfig/lxdialog/textbox.o
  HOSTCC  scripts/kconfig/lxdialog/util.o
  HOSTCC  scripts/kconfig/lxdialog/yesno.o
  HOSTCC  scripts/kconfig/mconf.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/lex.zconf.c
  SHIPPED scripts/kconfig/zconf.hash.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/mconf
scripts/kconfig/mconf Kconfig
arch/h8300/Kconfig:198: can't open file "drivers/serial/Kconfig"
make[1]: *** [menuconfig] Error 1
make: *** [menuconfig] Error 2

Yes, I know I have no h8300-elf-gcc, but it does not change the fact
that the arch tries to include a non-existent file. It's been broken
by:

commit ab4382d27412e7e3e7c936e8d50d8888dfac3df8
Author: Greg Kroah-Hartman <gregkh@suse.de>
Date:   Thu Jan 13 12:10:18 2011 -0800

    tty: move drivers/serial/ to drivers/tty/serial/

    The serial drivers are really just tty drivers, so move them to
    drivers/tty/ to make things a bit neater overall.
    This is part of the tty/serial driver movement proceedure as proposed by
    Arnd Bergmann and approved by everyone involved a number of months ago.

    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Rogier Wolff <R.E.Wolff@bitwizard.nl>
    Cc: Michael H. Warfield <mhw@wittsend.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

 - Arnaud

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-24 22:04                 ` Linux 3.0 release Arnaud Lacombe
@ 2011-07-25  2:21                   ` Yoshinori Sato
  2011-07-25 15:50                     ` Arnaud Lacombe
  0 siblings, 1 reply; 84+ messages in thread
From: Yoshinori Sato @ 2011-07-25  2:21 UTC (permalink / raw)
  To: Arnaud Lacombe; +Cc: Linus Torvalds, Linux Kernel Mailing List, Greg KH

At Sun, 24 Jul 2011 18:04:59 -0400,
Arnaud Lacombe wrote:
> 
> Hi,
> 
> On Thu, Jul 21, 2011 at 10:59 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
> > [...]
> For the record, if anybody cares, arch/h8300 no longer configure since 2.6.38:
> 
> % make ARCH=h8300 menuconfig
> make: h8300-elf-gcc: Command not found
>   HOSTCC  scripts/basic/fixdep
>   HOSTCC  scripts/kconfig/conf.o
>   HOSTCC  scripts/kconfig/lxdialog/checklist.o
>   HOSTCC  scripts/kconfig/lxdialog/inputbox.o
>   HOSTCC  scripts/kconfig/lxdialog/menubox.o
>   HOSTCC  scripts/kconfig/lxdialog/textbox.o
>   HOSTCC  scripts/kconfig/lxdialog/util.o
>   HOSTCC  scripts/kconfig/lxdialog/yesno.o
>   HOSTCC  scripts/kconfig/mconf.o
>   SHIPPED scripts/kconfig/zconf.tab.c
>   SHIPPED scripts/kconfig/lex.zconf.c
>   SHIPPED scripts/kconfig/zconf.hash.c
>   HOSTCC  scripts/kconfig/zconf.tab.o
>   HOSTLD  scripts/kconfig/mconf
> scripts/kconfig/mconf Kconfig
> arch/h8300/Kconfig:198: can't open file "drivers/serial/Kconfig"
> make[1]: *** [menuconfig] Error 1
> make: *** [menuconfig] Error 2
> 
> Yes, I know I have no h8300-elf-gcc, but it does not change the fact
> that the arch tries to include a non-existent file. It's been broken
> by:
> 
> commit ab4382d27412e7e3e7c936e8d50d8888dfac3df8
> Author: Greg Kroah-Hartman <gregkh@suse.de>
> Date:   Thu Jan 13 12:10:18 2011 -0800
> 
>     tty: move drivers/serial/ to drivers/tty/serial/
> 
>     The serial drivers are really just tty drivers, so move them to
>     drivers/tty/ to make things a bit neater overall.
>     This is part of the tty/serial driver movement proceedure as proposed by
>     Arnd Bergmann and approved by everyone involved a number of months ago.
> 
>     Cc: Arnd Bergmann <arnd@arndb.de>
>     Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
>     Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>     Cc: Rogier Wolff <R.E.Wolff@bitwizard.nl>
>     Cc: Michael H. Warfield <mhw@wittsend.com>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
>  - Arnaud

Please try this fix.
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>

diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index 091ed61..910e5ad 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -89,125 +89,7 @@ endmenu
 
 source "net/Kconfig"
 
-source "drivers/base/Kconfig"
-
-source "drivers/mtd/Kconfig"
-
-source "drivers/block/Kconfig"
-
-source "drivers/ide/Kconfig"
-
-source "arch/h8300/Kconfig.ide"
-
-source "drivers/net/Kconfig"
-
-#
-# input - input/joystick depends on it. As does USB.
-#
-source "drivers/input/Kconfig"
-
-menu "Character devices"
-
-config VT
-	bool "Virtual terminal"
-	---help---
-	  If you say Y here, you will get support for terminal devices with
-	  display and keyboard devices. These are called "virtual" because you
-	  can run several virtual terminals (also called virtual consoles) on
-	  one physical terminal. This is rather useful, for example one
-	  virtual terminal can collect system messages and warnings, another
-	  one can be used for a text-mode user session, and a third could run
-	  an X session, all in parallel. Switching between virtual terminals
-	  is done with certain key combinations, usually Alt-<function key>.
-
-	  The setterm command ("man setterm") can be used to change the
-	  properties (such as colors or beeping) of a virtual terminal. The
-	  man page console_codes(4) ("man console_codes") contains the special
-	  character sequences that can be used to change those properties
-	  directly. The fonts used on virtual terminals can be changed with
-	  the setfont ("man setfont") command and the key bindings are defined
-	  with the loadkeys ("man loadkeys") command.
-
-	  You need at least one virtual terminal device in order to make use
-	  of your keyboard and monitor. Therefore, only people configuring an
-	  embedded system would want to say N here in order to save some
-	  memory; the only way to log into such a system is then via a serial
-	  or network connection.
-
-	  If unsure, say Y, or else you won't be able to do much with your new
-	  shiny Linux system :-)
-
-config VT_CONSOLE
-	bool "Support for console on virtual terminal"
-	depends on VT
-	---help---
-	  The system console is the device which receives all kernel messages
-	  and warnings and which allows logins in single user mode. If you
-	  answer Y here, a virtual terminal (the device used to interact with
-	  a physical terminal) can be used as system console. This is the most
-	  common mode of operations, so you should say Y here unless you want
-	  the kernel messages be output only to a serial port (in which case
-	  you should say Y to "Console on serial port", below).
-
-	  If you do say Y here, by default the currently visible virtual
-	  terminal (/dev/tty0) will be used as system console. You can change
-	  that with a kernel command line option such as "console=tty3" which
-	  would use the third virtual terminal as system console. (Try "man
-	  bootparam" or see the documentation of your boot loader (lilo or
-	  loadlin) about how to pass options to the kernel at boot time.)
-
-	  If unsure, say Y.
-
-config HW_CONSOLE
-	bool
-	depends on VT && !S390 && !UM
-	default y
-
-comment "Unix98 PTY support"
-
-config UNIX98_PTYS
-	bool "Unix98 PTY support"
-	---help---
-	  A pseudo terminal (PTY) is a software device consisting of two
-	  halves: a master and a slave. The slave device behaves identical to
-	  a physical terminal; the master device is used by a process to
-	  read data from and write data to the slave, thereby emulating a
-	  terminal. Typical programs for the master side are telnet servers
-	  and xterms.
-
-	  Linux has traditionally used the BSD-like names /dev/ptyxx for
-	  masters and /dev/ttyxx for slaves of pseudo terminals. This scheme
-	  has a number of problems. The GNU C library glibc 2.1 and later,
-	  however, supports the Unix98 naming standard: in order to acquire a
-	  pseudo terminal, a process opens /dev/ptmx; the number of the pseudo
-	  terminal is then made available to the process and the pseudo
-	  terminal slave can be accessed as /dev/pts/<number>. What was
-	  traditionally /dev/ttyp2 will then be /dev/pts/2, for example.
-
-	  The entries in /dev/pts/ are created on the fly by a virtual
-	  file system; therefore, if you say Y here you should say Y to
-	  "/dev/pts file system for Unix98 PTYs" as well.
-
-	  If you want to say Y here, you need to have the C library glibc 2.1
-	  or later (equal to libc-6.1, check with "ls -l /lib/libc.so.*").
-	  Read the instructions in <file:Documentation/Changes> pertaining to
-	  pseudo terminals. It's safe to say N.
-
-source "drivers/char/pcmcia/Kconfig"
-
-source "drivers/serial/Kconfig"
-
-source "drivers/i2c/Kconfig"
-
-source "drivers/hwmon/Kconfig"
-
-source "drivers/usb/Kconfig"
-
-source "drivers/uwb/Kconfig"
-
-endmenu
-
-source "drivers/staging/Kconfig"
+source "drivers/Kconfig"
 
 source "fs/Kconfig"
 
diff --git a/arch/h8300/include/asm/types.h b/arch/h8300/include/asm/types.h
index bb2c91a..b9e79bc 100644
--- a/arch/h8300/include/asm/types.h
+++ b/arch/h8300/include/asm/types.h
@@ -1,29 +1 @@
-#ifndef _H8300_TYPES_H
-#define _H8300_TYPES_H
-
-#include <asm-generic/int-ll64.h>
-
-#if !defined(__ASSEMBLY__)
-
-/*
- * This file is never included by application software unless
- * explicitly requested (e.g., via linux/types.h) in which case the
- * application is Linux specific so (user-) name space pollution is
- * not a major issue.  However, for interoperability, libraries still
- * need to be careful to avoid a name clashes.
- */
-
-typedef unsigned short umode_t;
-
-/*
- * These aren't exported outside the kernel to avoid name space clashes
- */
-#ifdef __KERNEL__
-
-#define BITS_PER_LONG 32
-
-#endif /* __KERNEL__ */
-
-#endif /* __ASSEMBLY__ */
-
-#endif /* _H8300_TYPES_H */
+#include <asm-generic/types.h>
diff --git a/arch/h8300/include/asm/unistd.h b/arch/h8300/include/asm/unistd.h
index 2c3f8e6..7cdb4ea 100644
--- a/arch/h8300/include/asm/unistd.h
+++ b/arch/h8300/include/asm/unistd.h
@@ -325,11 +325,37 @@
 #define __NR_move_pages		317
 #define __NR_getcpu		318
 #define __NR_epoll_pwait	319
-#define __NR_setns		320
+#define __NR_utimensat		320
+#define __NR_signalfd		321
+#define __NR_timerfd_create	322
+#define __NR_eventfd		323
+#define __NR_fallocate		324
+#define __NR_timerfd_settime	325
+#define __NR_timerfd_gettime	326
+#define __NR_signalfd4		327
+#define __NR_eventfd2		328
+#define __NR_epoll_create1	329
+#define __NR_dup3		330
+#define __NR_pipe2		331
+#define __NR_inotify_init1	332
+#define __NR_preadv		333
+#define __NR_pwritev		334
+#define __NR_rt_tgsigqueueinfo	335
+#define __NR_perf_event_open	336
+#define __NR_recvmmsg		337
+#define __NR_fanotify_init	338
+#define __NR_fanotify_mark	339
+#define __NR_prlimit64		340
+#define __NR_name_to_handle_at	341
+#define __NR_open_by_handle_at  342
+#define __NR_clock_adjtime	343
+#define __NR_syncfs             344
+#define __NR_sendmmsg		345
+#define __NR_setns		346
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 321
+#define NR_syscalls 347
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/h8300/kernel/syscalls.S b/arch/h8300/kernel/syscalls.S
index f4b2e67..4cfe56c 100644
--- a/arch/h8300/kernel/syscalls.S
+++ b/arch/h8300/kernel/syscalls.S
@@ -333,8 +333,34 @@ SYMBOL_NAME_LABEL(sys_call_table)
 	.long SYMBOL_NAME(sys_ni_syscall)	/* sys_move_pages */
 	.long SYMBOL_NAME(sys_getcpu)
 	.long SYMBOL_NAME(sys_ni_syscall)	/* sys_epoll_pwait */
-	.long SYMBOL_NAME(sys_setns)		/* 320 */
-
+	.long SYMBOL_NAME(sys_utimensat)		/* 320 */
+	.long SYMBOL_NAME(sys_signalfd)
+	.long SYMBOL_NAME(sys_timerfd_create)
+	.long SYMBOL_NAME(sys_eventfd)
+	.long SYMBOL_NAME(sys_fallocate)
+	.long SYMBOL_NAME(sys_timerfd_settime)	/* 325 */
+	.long SYMBOL_NAME(sys_timerfd_gettime)
+	.long SYMBOL_NAME(sys_signalfd4)
+	.long SYMBOL_NAME(sys_eventfd2)
+	.long SYMBOL_NAME(sys_epoll_create1)
+	.long SYMBOL_NAME(sys_dup3)			/* 330 */
+	.long SYMBOL_NAME(sys_pipe2)
+	.long SYMBOL_NAME(sys_inotify_init1)
+	.long SYMBOL_NAME(sys_preadv)
+	.long SYMBOL_NAME(sys_pwritev)
+	.long SYMBOL_NAME(sys_rt_tgsigqueueinfo)	/* 335 */
+	.long SYMBOL_NAME(sys_perf_event_open)
+	.long SYMBOL_NAME(sys_recvmmsg)
+	.long SYMBOL_NAME(sys_fanotify_init)
+	.long SYMBOL_NAME(sys_fanotify_mark)
+	.long SYMBOL_NAME(sys_prlimit64)		/* 340 */
+	.long SYMBOL_NAME(sys_name_to_handle_at)
+	.long SYMBOL_NAME(sys_open_by_handle_at)
+	.long SYMBOL_NAME(sys_clock_adjtime)
+	.long SYMBOL_NAME(sys_syncfs)
+	.long SYMBOL_NAME(sys_sendmmsg)
+	.long SYMBOL_NAME(sys_setns)
+	
 	.macro	call_sp addr
 	mov.l	#SYMBOL_NAME(\addr),er6
 	bra	SYMBOL_NAME(syscall_trampoline):8

-- 
Yoshinori Sato
<ysato@users.sourceforge.jp>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-23 15:55                         ` Pekka Enberg
@ 2011-07-25  4:29                           ` Keith Packard
  0 siblings, 0 replies; 84+ messages in thread
From: Keith Packard @ 2011-07-25  4:29 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Kirill Smelkov, Chris Wilson, Luke-Jr, intel-gfx, LKML,
	dri-devel, Rafael J. Wysocki, Ray Lee, Herbert Xu,
	Linus Torvalds, Andrew Morton, Florian Mickler

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

On Sat, 23 Jul 2011 18:55:48 +0300 (EEST), Pekka Enberg <penberg@kernel.org> wrote:

> I know I sound like a broken record but I really wish you i915 devs were 
> little more eager to revert broken patches early rather than late. I mean, 
> this particular breakage was already bisected but nobody said or 
> did anything - and it's not like it's the first time either!

We've switched processes starting with 2.6.39 and I think we're doing
better in this regard. For this particular issue, the regression came
with 2.6.38, and the revert was too large for me to consider merging
just before 3.0 shipped -- I knew reverting it *would* cause problems
for anyone using UMS on newer hardware.

> I suppose I need to bribe Linus somehow to be more strict with you
> folks.

He nicely delivered the message for you a few months ago in person.

In any case, I'm hoping that my smaller fix will resolve the problem and
also not cause regressions for other users.

-- 
keith.packard@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-25  2:21                   ` Yoshinori Sato
@ 2011-07-25 15:50                     ` Arnaud Lacombe
  2011-07-27 15:22                       ` Yoshinori Sato
  0 siblings, 1 reply; 84+ messages in thread
From: Arnaud Lacombe @ 2011-07-25 15:50 UTC (permalink / raw)
  To: Yoshinori Sato; +Cc: Linus Torvalds, Linux Kernel Mailing List, Greg KH

Hi,

On Sun, Jul 24, 2011 at 10:21 PM, Yoshinori Sato
<ysato@users.sourceforge.jp> wrote:
> At Sun, 24 Jul 2011 18:04:59 -0400,
> Arnaud Lacombe wrote:
>>
>> Hi,
>>
>> On Thu, Jul 21, 2011 at 10:59 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
>> > [...]
>> For the record, if anybody cares, arch/h8300 no longer configure since 2.6.38:
>>
>> % make ARCH=h8300 menuconfig
>> make: h8300-elf-gcc: Command not found
>>   HOSTCC  scripts/basic/fixdep
>>   HOSTCC  scripts/kconfig/conf.o
>>   HOSTCC  scripts/kconfig/lxdialog/checklist.o
>>   HOSTCC  scripts/kconfig/lxdialog/inputbox.o
>>   HOSTCC  scripts/kconfig/lxdialog/menubox.o
>>   HOSTCC  scripts/kconfig/lxdialog/textbox.o
>>   HOSTCC  scripts/kconfig/lxdialog/util.o
>>   HOSTCC  scripts/kconfig/lxdialog/yesno.o
>>   HOSTCC  scripts/kconfig/mconf.o
>>   SHIPPED scripts/kconfig/zconf.tab.c
>>   SHIPPED scripts/kconfig/lex.zconf.c
>>   SHIPPED scripts/kconfig/zconf.hash.c
>>   HOSTCC  scripts/kconfig/zconf.tab.o
>>   HOSTLD  scripts/kconfig/mconf
>> scripts/kconfig/mconf Kconfig
>> arch/h8300/Kconfig:198: can't open file "drivers/serial/Kconfig"
>> make[1]: *** [menuconfig] Error 1
>> make: *** [menuconfig] Error 2
>>
>> Yes, I know I have no h8300-elf-gcc, but it does not change the fact
>> that the arch tries to include a non-existent file. It's been broken
>> by:
>>
>> commit ab4382d27412e7e3e7c936e8d50d8888dfac3df8
>> Author: Greg Kroah-Hartman <gregkh@suse.de>
>> Date:   Thu Jan 13 12:10:18 2011 -0800
>>
>>     tty: move drivers/serial/ to drivers/tty/serial/
>>
>>     The serial drivers are really just tty drivers, so move them to
>>     drivers/tty/ to make things a bit neater overall.
>>     This is part of the tty/serial driver movement proceedure as proposed by
>>     Arnd Bergmann and approved by everyone involved a number of months ago.
>>
>>     Cc: Arnd Bergmann <arnd@arndb.de>
>>     Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
>>     Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>>     Cc: Rogier Wolff <R.E.Wolff@bitwizard.nl>
>>     Cc: Michael H. Warfield <mhw@wittsend.com>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>>
>>  - Arnaud
>
> Please try this fix.
> Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
>
> diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
> index 091ed61..910e5ad 100644
> --- a/arch/h8300/Kconfig
> +++ b/arch/h8300/Kconfig
> @@ -89,125 +89,7 @@ endmenu
>
>  source "net/Kconfig"
>
> -source "drivers/base/Kconfig"
> -
> -source "drivers/mtd/Kconfig"
> -
> -source "drivers/block/Kconfig"
> -
> -source "drivers/ide/Kconfig"
> -
> -source "arch/h8300/Kconfig.ide"
> -
> -source "drivers/net/Kconfig"
> -
> -#
> -# input - input/joystick depends on it. As does USB.
> -#
> -source "drivers/input/Kconfig"
> -
> -menu "Character devices"
> -
> -config VT
> -       bool "Virtual terminal"
> -       ---help---
> -         If you say Y here, you will get support for terminal devices with
> -         display and keyboard devices. These are called "virtual" because you
> -         can run several virtual terminals (also called virtual consoles) on
> -         one physical terminal. This is rather useful, for example one
> -         virtual terminal can collect system messages and warnings, another
> -         one can be used for a text-mode user session, and a third could run
> -         an X session, all in parallel. Switching between virtual terminals
> -         is done with certain key combinations, usually Alt-<function key>.
> -
> -         The setterm command ("man setterm") can be used to change the
> -         properties (such as colors or beeping) of a virtual terminal. The
> -         man page console_codes(4) ("man console_codes") contains the special
> -         character sequences that can be used to change those properties
> -         directly. The fonts used on virtual terminals can be changed with
> -         the setfont ("man setfont") command and the key bindings are defined
> -         with the loadkeys ("man loadkeys") command.
> -
> -         You need at least one virtual terminal device in order to make use
> -         of your keyboard and monitor. Therefore, only people configuring an
> -         embedded system would want to say N here in order to save some
> -         memory; the only way to log into such a system is then via a serial
> -         or network connection.
> -
> -         If unsure, say Y, or else you won't be able to do much with your new
> -         shiny Linux system :-)
> -
> -config VT_CONSOLE
> -       bool "Support for console on virtual terminal"
> -       depends on VT
> -       ---help---
> -         The system console is the device which receives all kernel messages
> -         and warnings and which allows logins in single user mode. If you
> -         answer Y here, a virtual terminal (the device used to interact with
> -         a physical terminal) can be used as system console. This is the most
> -         common mode of operations, so you should say Y here unless you want
> -         the kernel messages be output only to a serial port (in which case
> -         you should say Y to "Console on serial port", below).
> -
> -         If you do say Y here, by default the currently visible virtual
> -         terminal (/dev/tty0) will be used as system console. You can change
> -         that with a kernel command line option such as "console=tty3" which
> -         would use the third virtual terminal as system console. (Try "man
> -         bootparam" or see the documentation of your boot loader (lilo or
> -         loadlin) about how to pass options to the kernel at boot time.)
> -
> -         If unsure, say Y.
> -
> -config HW_CONSOLE
> -       bool
> -       depends on VT && !S390 && !UM
> -       default y
> -
> -comment "Unix98 PTY support"
> -
> -config UNIX98_PTYS
> -       bool "Unix98 PTY support"
> -       ---help---
> -         A pseudo terminal (PTY) is a software device consisting of two
> -         halves: a master and a slave. The slave device behaves identical to
> -         a physical terminal; the master device is used by a process to
> -         read data from and write data to the slave, thereby emulating a
> -         terminal. Typical programs for the master side are telnet servers
> -         and xterms.
> -
> -         Linux has traditionally used the BSD-like names /dev/ptyxx for
> -         masters and /dev/ttyxx for slaves of pseudo terminals. This scheme
> -         has a number of problems. The GNU C library glibc 2.1 and later,
> -         however, supports the Unix98 naming standard: in order to acquire a
> -         pseudo terminal, a process opens /dev/ptmx; the number of the pseudo
> -         terminal is then made available to the process and the pseudo
> -         terminal slave can be accessed as /dev/pts/<number>. What was
> -         traditionally /dev/ttyp2 will then be /dev/pts/2, for example.
> -
> -         The entries in /dev/pts/ are created on the fly by a virtual
> -         file system; therefore, if you say Y here you should say Y to
> -         "/dev/pts file system for Unix98 PTYs" as well.
> -
> -         If you want to say Y here, you need to have the C library glibc 2.1
> -         or later (equal to libc-6.1, check with "ls -l /lib/libc.so.*").
> -         Read the instructions in <file:Documentation/Changes> pertaining to
> -         pseudo terminals. It's safe to say N.
> -
> -source "drivers/char/pcmcia/Kconfig"
> -
> -source "drivers/serial/Kconfig"
> -
> -source "drivers/i2c/Kconfig"
> -
> -source "drivers/hwmon/Kconfig"
> -
> -source "drivers/usb/Kconfig"
> -
> -source "drivers/uwb/Kconfig"
> -
> -endmenu
> -
> -source "drivers/staging/Kconfig"
> +source "drivers/Kconfig"
>
>  source "fs/Kconfig"
>
> diff --git a/arch/h8300/include/asm/types.h b/arch/h8300/include/asm/types.h
> index bb2c91a..b9e79bc 100644
> --- a/arch/h8300/include/asm/types.h
> +++ b/arch/h8300/include/asm/types.h
> @@ -1,29 +1 @@
> -#ifndef _H8300_TYPES_H
> -#define _H8300_TYPES_H
> -
> -#include <asm-generic/int-ll64.h>
> -
> -#if !defined(__ASSEMBLY__)
> -
> -/*
> - * This file is never included by application software unless
> - * explicitly requested (e.g., via linux/types.h) in which case the
> - * application is Linux specific so (user-) name space pollution is
> - * not a major issue.  However, for interoperability, libraries still
> - * need to be careful to avoid a name clashes.
> - */
> -
> -typedef unsigned short umode_t;
> -
> -/*
> - * These aren't exported outside the kernel to avoid name space clashes
> - */
> -#ifdef __KERNEL__
> -
> -#define BITS_PER_LONG 32
> -
> -#endif /* __KERNEL__ */
> -
> -#endif /* __ASSEMBLY__ */
> -
> -#endif /* _H8300_TYPES_H */
> +#include <asm-generic/types.h>
> diff --git a/arch/h8300/include/asm/unistd.h b/arch/h8300/include/asm/unistd.h
> index 2c3f8e6..7cdb4ea 100644
> --- a/arch/h8300/include/asm/unistd.h
> +++ b/arch/h8300/include/asm/unistd.h
> @@ -325,11 +325,37 @@
>  #define __NR_move_pages                317
>  #define __NR_getcpu            318
>  #define __NR_epoll_pwait       319
> -#define __NR_setns             320
> +#define __NR_utimensat         320
> +#define __NR_signalfd          321
> +#define __NR_timerfd_create    322
> +#define __NR_eventfd           323
> +#define __NR_fallocate         324
> +#define __NR_timerfd_settime   325
> +#define __NR_timerfd_gettime   326
> +#define __NR_signalfd4         327
> +#define __NR_eventfd2          328
> +#define __NR_epoll_create1     329
> +#define __NR_dup3              330
> +#define __NR_pipe2             331
> +#define __NR_inotify_init1     332
> +#define __NR_preadv            333
> +#define __NR_pwritev           334
> +#define __NR_rt_tgsigqueueinfo 335
> +#define __NR_perf_event_open   336
> +#define __NR_recvmmsg          337
> +#define __NR_fanotify_init     338
> +#define __NR_fanotify_mark     339
> +#define __NR_prlimit64         340
> +#define __NR_name_to_handle_at 341
> +#define __NR_open_by_handle_at  342
> +#define __NR_clock_adjtime     343
> +#define __NR_syncfs             344
> +#define __NR_sendmmsg          345
> +#define __NR_setns             346
>
>  #ifdef __KERNEL__
>
> -#define NR_syscalls 321
> +#define NR_syscalls 347
>
>  #define __ARCH_WANT_IPC_PARSE_VERSION
>  #define __ARCH_WANT_OLD_READDIR
> diff --git a/arch/h8300/kernel/syscalls.S b/arch/h8300/kernel/syscalls.S
> index f4b2e67..4cfe56c 100644
> --- a/arch/h8300/kernel/syscalls.S
> +++ b/arch/h8300/kernel/syscalls.S
> @@ -333,8 +333,34 @@ SYMBOL_NAME_LABEL(sys_call_table)
>        .long SYMBOL_NAME(sys_ni_syscall)       /* sys_move_pages */
>        .long SYMBOL_NAME(sys_getcpu)
>        .long SYMBOL_NAME(sys_ni_syscall)       /* sys_epoll_pwait */
> -       .long SYMBOL_NAME(sys_setns)            /* 320 */
> -
> +       .long SYMBOL_NAME(sys_utimensat)                /* 320 */
> +       .long SYMBOL_NAME(sys_signalfd)
> +       .long SYMBOL_NAME(sys_timerfd_create)
> +       .long SYMBOL_NAME(sys_eventfd)
> +       .long SYMBOL_NAME(sys_fallocate)
> +       .long SYMBOL_NAME(sys_timerfd_settime)  /* 325 */
> +       .long SYMBOL_NAME(sys_timerfd_gettime)
> +       .long SYMBOL_NAME(sys_signalfd4)
> +       .long SYMBOL_NAME(sys_eventfd2)
> +       .long SYMBOL_NAME(sys_epoll_create1)
> +       .long SYMBOL_NAME(sys_dup3)                     /* 330 */
> +       .long SYMBOL_NAME(sys_pipe2)
> +       .long SYMBOL_NAME(sys_inotify_init1)
> +       .long SYMBOL_NAME(sys_preadv)
> +       .long SYMBOL_NAME(sys_pwritev)
> +       .long SYMBOL_NAME(sys_rt_tgsigqueueinfo)        /* 335 */
> +       .long SYMBOL_NAME(sys_perf_event_open)
> +       .long SYMBOL_NAME(sys_recvmmsg)
> +       .long SYMBOL_NAME(sys_fanotify_init)
> +       .long SYMBOL_NAME(sys_fanotify_mark)
> +       .long SYMBOL_NAME(sys_prlimit64)                /* 340 */
> +       .long SYMBOL_NAME(sys_name_to_handle_at)
> +       .long SYMBOL_NAME(sys_open_by_handle_at)
> +       .long SYMBOL_NAME(sys_clock_adjtime)
> +       .long SYMBOL_NAME(sys_syncfs)
> +       .long SYMBOL_NAME(sys_sendmmsg)
> +       .long SYMBOL_NAME(sys_setns)
> +
>        .macro  call_sp addr
>        mov.l   #SYMBOL_NAME(\addr),er6
>        bra     SYMBOL_NAME(syscall_trampoline):8
>
With this patch, it configures, at least, but build fails with:

In file included from /src/linux/linux/include/linux/mempolicy.h:70:0,
                 from /src/linux/linux/init/main.c:49:
/src/linux/linux/include/linux/pagemap.h: In function 'fault_in_pages_readable':
/src/linux/linux/include/linux/pagemap.h:444:2: error: assignment of
read-only variable '__gu_val'
/src/linux/linux/include/linux/pagemap.h:450:5: error: assignment of
read-only variable '__gu_val'
make[2]: *** [init/main.o] Error 1
make[1]: *** [init] Error 2
make: *** [sub-make] Error 2

Cross-toolchain is baremetal binutils and gcc for their respective trunk:

$ /src/h8300/obj/destdir/bin/h8300-elf-gcc -v
Using built-in specs.
COLLECT_GCC=/src/h8300/obj/destdir/bin/h8300-elf-gcc
COLLECT_LTO_WRAPPER=/src/h8300/obj/destdir/libexec/gcc/h8300-elf/4.7.0/lto-wrapper
Target: h8300-elf
Configured with: ../gcc/configure --prefix=/src/h8300/obj/destdir
--target=h8300-elf --enable-languages=c
Thread model: single
gcc version 4.7.0 20110609 (experimental) (GCC)

 - Arnaud

> --
> Yoshinori Sato
> <ysato@users.sourceforge.jp>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release - btrfs possible locking deadlock
  2011-07-22 23:21                 ` Linux 3.0 release - btrfs possible locking deadlock Ed Tomlinson
@ 2011-07-25 19:49                   ` Chris Mason
  2011-07-26  0:22                     ` Ed Tomlinson
  0 siblings, 1 reply; 84+ messages in thread
From: Chris Mason @ 2011-07-25 19:49 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: Linux Kernel Mailing List, Linus Torvalds, linux-btrfs, Josef Bacik

Excerpts from Ed Tomlinson's message of 2011-07-22 19:21:00 -0400:
> On Thursday 21 July 2011 22:59:53 Linus Torvalds wrote:
> > So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
> > 
> 
> Hi,
> 
> Managed to get this with btrfs rsync(ing) from ext4 to a btrfs fs with three partitions using raid1.
> 
> [16018.211493] device fsid f7186eeb-60df-4b1a-890a-4a1eb42f81fe devid 1 transid 10 /dev/sdd4
> [16018.230643] btrfs: use lzo compression
> [16018.234619] btrfs: enabling disk space caching
> [25949.414011] 
> [25949.414011] =======================================================
> [25949.416549] [ INFO: possible circular locking dependency detected ]
> [25949.423187] 3.0.0-crc+ #348
> [25949.423187] -------------------------------------------------------
> [25949.423187] rsync/20237 is trying to acquire lock:
> [25949.423187]  (btrfs-extent-01){+.+...}, at: [<ffffffffa047ce88>] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
> [25949.423187] 
> [25949.423187] but task is already holding lock:
> [25949.423187]  (&(&eb->lock)->rlock){+.+...}, at: [<ffffffffa047cee2>] btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
> [25949.423187] 
> [25949.423187] which lock already depends on the new lock.
> 
> Kernel is 3.0.0 without any extras.
> 
> Ideas?

Did this actually deadlock?  lockdep has issues with the btrfs
clear_lock_blocking code, and I need to redo the annotations a bit.  The
problem is that we have the same lock class representing unrelated locks from
different trees.

-chris

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release - btrfs possible locking deadlock
  2011-07-25 19:49                   ` Chris Mason
@ 2011-07-26  0:22                     ` Ed Tomlinson
  0 siblings, 0 replies; 84+ messages in thread
From: Ed Tomlinson @ 2011-07-26  0:22 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linux Kernel Mailing List, Linus Torvalds, linux-btrfs, Josef Bacik

On Monday 25 July 2011 15:49:37 Chris Mason wrote:
> Excerpts from Ed Tomlinson's message of 2011-07-22 19:21:00 -0400:
> > On Thursday 21 July 2011 22:59:53 Linus Torvalds wrote:
> > > So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
> > > 
> > 
> > Hi,
> > 
> > Managed to get this with btrfs rsync(ing) from ext4 to a btrfs fs with three partitions using raid1.
> > 
> > [16018.211493] device fsid f7186eeb-60df-4b1a-890a-4a1eb42f81fe devid 1 transid 10 /dev/sdd4
> > [16018.230643] btrfs: use lzo compression
> > [16018.234619] btrfs: enabling disk space caching
> > [25949.414011] 
> > [25949.414011] =======================================================
> > [25949.416549] [ INFO: possible circular locking dependency detected ]
> > [25949.423187] 3.0.0-crc+ #348
> > [25949.423187] -------------------------------------------------------
> > [25949.423187] rsync/20237 is trying to acquire lock:
> > [25949.423187]  (btrfs-extent-01){+.+...}, at: [<ffffffffa047ce88>] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
> > [25949.423187] 
> > [25949.423187] but task is already holding lock:
> > [25949.423187]  (&(&eb->lock)->rlock){+.+...}, at: [<ffffffffa047cee2>] btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
> > [25949.423187] 
> > [25949.423187] which lock already depends on the new lock.
> > 
> > Kernel is 3.0.0 without any extras.
> > 
> > Ideas?
> 
> Did this actually deadlock?  lockdep has issues with the btrfs
> clear_lock_blocking code, and I need to redo the annotations a bit.  The
> problem is that we have the same lock class representing unrelated locks from
> different trees.

It did not stop any processes that I could see and the rsync did complete ok.  Thats why I said possible.
Figured it might be something you needed to see and/or fix though.

Thanks
Ed

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-22 20:23                       ` Kirill Smelkov
@ 2011-07-26 13:48                         ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-26 13:48 UTC (permalink / raw)
  To: Keith Packard
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> Keith,
> 
> first of all thanks for your prompt reply. Then...
> 
> On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > 
> > > And now after v3.0 is out, I've tested it again, and yes, like it was
> > > broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> > > bad io access the system freezes completely:
> > 
> > I looked at this when I first saw it (a couple of weeks ago), and I
> > couldn't see any obvious reason this patch would cause this particular
> > problem. I didn't want to revert the patch at that point as I feared it
> > would cause other subtle problems. Given that you've got a work-around,
> > it seemed best to just push this off past 3.0.
> 
> What kind of a workaround are you talking about? Sorry, to me it all
> looked like "UMS is being ignored forever". Anyway, let's move on to try
> to solve the issue.
> 
> 
> > Given the failing address passed to ioread32, this seems like it's
> > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
> > which is an offset in 32-bit units within the hardware status page. If
> > the status_page.page_addr value was zero, then the computed address
> > would end up being 0x84.
> > 
> > And, it looks like status_page.page_addr *will* end up being zero as a
> > result of the patch in question. The patch resets the entire ring
> > structure contents back to the initial values, which includes smashing
> > the status_page structure to zero, clearing the value of
> > status_page.page_addr set in i915_init_phys_hws.
> > 
> > Here's an untested patch which moves the initialization of
> > status_page.page_addr into intel_render_ring_init_dri. I note that
> > intel_init_render_ring_buffer *already* has the setting of the
> > status_page.page_addr value, and so I've removed the setting of
> > status_page.page_addr from i915_init_phys_hws.
> > 
> > I suspect we could remove the memset from intel_init_render_ring_buffer;
> > it seems entirely superfluous given the memset in i915_init_phys_hws.
> > 
> > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
> > From: Keith Packard <keithp@keithp.com>
> > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> >  intel_render_ring_init_dri
> > 
> > Physically-addressed hardware status pages are initialized early in
> > the driver load process by i915_init_phys_hws. For UMS environments,
> > the ring structure is not initialized until the X server starts. At
> > that point, the entire ring structure is re-initialized with all new
> > values. Any values set in the ring structure (including
> > ring->status_page.page_addr) will be lost when the ring is
> > re-initialized.
> > 
> > This patch moves the initialization of the status_page.page_addr value
> > to intel_render_ring_init_dri.
> > 
> > Signed-off-by: Keith Packard <keithp@keithp.com>
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index 1271282..8a3942c 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
> >  static int i915_init_phys_hws(struct drm_device *dev)
> >  {
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> >  
> >  	/* Program Hardware Status Page */
> >  	dev_priv->status_page_dmah =
> > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
> >  		DRM_ERROR("Can not allocate hardware status page\n");
> >  		return -ENOMEM;
> >  	}
> > -	ring->status_page.page_addr =
> > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> >  
> > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > +	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
> > +		  0, PAGE_SIZE);
> >  
> >  	i915_write_hws_pga(dev);
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index e961568..47b9b27 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> >  		ring->get_seqno = pc_render_get_seqno;
> >  	}
> >  
> > +	if (!I915_NEED_GFX_HWS(dev))
> > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > +
> >  	ring->dev = dev;
> >  	INIT_LIST_HEAD(&ring->active_list);
> >  	INIT_LIST_HEAD(&ring->request_list);
> 
> I can't tell whether this is correct, because intel gfx driver is
> unknown to me, but from the first glance your description sounds reasonable.
> 
> I'm out of office till ~ next week's tuesday, and on return I'll try
> to test it on the hardware in question.

Keith, thanks again for the patch. As promised I've tested it on the
hardware in question and yes, bad_access is gone and X seems to work,
so thank you, but...


I see there are more such bugs in introduced-in-guilty-patch
intel_render_ring_init_dri(). For example ring->irq_queue is
left uninitialized and also ring->irq_lock etc...

I'm X newbie, so if here is something stupid X-wise, please don't
beat me too hard, but to me the gist of the problem is the original
patch, where Chris does

( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 03e3370..51fbc5e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>         return intel_init_ring_buffer(dev, ring);
>  }
>  
> +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> +{
> +       drm_i915_private_t *dev_priv = dev->dev_private;
> +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> +
> +       *ring = render_ring;
          ^^^^^^^^^^^^^^^^^^^
          here resets

> +       if (INTEL_INFO(dev)->gen >= 6) {
> +               ring->add_request = gen6_add_request;
> +               ring->irq_get = gen6_render_ring_get_irq;
> +               ring->irq_put = gen6_render_ring_put_irq;
> +       } else if (IS_GEN5(dev)) {
> +               ring->add_request = pc_render_add_request;
> +               ring->get_seqno = pc_render_get_seqno;
> +       }

and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer():

> +       ring->dev = dev;
> +       INIT_LIST_HEAD(&ring->active_list);
> +       INIT_LIST_HEAD(&ring->request_list);
> +       INIT_LIST_HEAD(&ring->gpu_write_list);
> +
> +       ring->size = size;
> +       ring->effective_size = ring->size;
> +       if (IS_I830(ring->dev))
> +               ring->effective_size -= 128;
> +
> +       ring->map.offset = start;
> +       ring->map.size = size;
> +       ring->map.type = 0;
> +       ring->map.flags = 0;
> +       ring->map.mtrr = 0;
...

where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
ring->effective_size tweak even stripped original comment:

# original version from intel_init_ring_buffer():
        /* Workaround an erratum on the i830 which causes a hang if
         * the TAIL pointer points to within the last 2 cachelines
         * of the buffer.
         */
        ring->effective_size = ring->size;
        if (IS_I830(ring->dev))
                ring->effective_size -= 128;

...


The line marked "here resets" resets all the fields, and maybe it's not a good
idea to re-initialize them all afterwards (missing some as this thread show),
or at least if it is really needed, share initialization code between
intel_render_ring_init_dri() and intel_init_ring_buffer() ?

>From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to
re-do it properly...


Thanks again,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-07-26 13:48                         ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-07-26 13:48 UTC (permalink / raw)
  To: Keith Packard
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> Keith,
> 
> first of all thanks for your prompt reply. Then...
> 
> On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > 
> > > And now after v3.0 is out, I've tested it again, and yes, like it was
> > > broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> > > bad io access the system freezes completely:
> > 
> > I looked at this when I first saw it (a couple of weeks ago), and I
> > couldn't see any obvious reason this patch would cause this particular
> > problem. I didn't want to revert the patch at that point as I feared it
> > would cause other subtle problems. Given that you've got a work-around,
> > it seemed best to just push this off past 3.0.
> 
> What kind of a workaround are you talking about? Sorry, to me it all
> looked like "UMS is being ignored forever". Anyway, let's move on to try
> to solve the issue.
> 
> 
> > Given the failing address passed to ioread32, this seems like it's
> > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
> > which is an offset in 32-bit units within the hardware status page. If
> > the status_page.page_addr value was zero, then the computed address
> > would end up being 0x84.
> > 
> > And, it looks like status_page.page_addr *will* end up being zero as a
> > result of the patch in question. The patch resets the entire ring
> > structure contents back to the initial values, which includes smashing
> > the status_page structure to zero, clearing the value of
> > status_page.page_addr set in i915_init_phys_hws.
> > 
> > Here's an untested patch which moves the initialization of
> > status_page.page_addr into intel_render_ring_init_dri. I note that
> > intel_init_render_ring_buffer *already* has the setting of the
> > status_page.page_addr value, and so I've removed the setting of
> > status_page.page_addr from i915_init_phys_hws.
> > 
> > I suspect we could remove the memset from intel_init_render_ring_buffer;
> > it seems entirely superfluous given the memset in i915_init_phys_hws.
> > 
> > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
> > From: Keith Packard <keithp@keithp.com>
> > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> >  intel_render_ring_init_dri
> > 
> > Physically-addressed hardware status pages are initialized early in
> > the driver load process by i915_init_phys_hws. For UMS environments,
> > the ring structure is not initialized until the X server starts. At
> > that point, the entire ring structure is re-initialized with all new
> > values. Any values set in the ring structure (including
> > ring->status_page.page_addr) will be lost when the ring is
> > re-initialized.
> > 
> > This patch moves the initialization of the status_page.page_addr value
> > to intel_render_ring_init_dri.
> > 
> > Signed-off-by: Keith Packard <keithp@keithp.com>
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index 1271282..8a3942c 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
> >  static int i915_init_phys_hws(struct drm_device *dev)
> >  {
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> >  
> >  	/* Program Hardware Status Page */
> >  	dev_priv->status_page_dmah =
> > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
> >  		DRM_ERROR("Can not allocate hardware status page\n");
> >  		return -ENOMEM;
> >  	}
> > -	ring->status_page.page_addr =
> > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> >  
> > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > +	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
> > +		  0, PAGE_SIZE);
> >  
> >  	i915_write_hws_pga(dev);
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index e961568..47b9b27 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> >  		ring->get_seqno = pc_render_get_seqno;
> >  	}
> >  
> > +	if (!I915_NEED_GFX_HWS(dev))
> > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > +
> >  	ring->dev = dev;
> >  	INIT_LIST_HEAD(&ring->active_list);
> >  	INIT_LIST_HEAD(&ring->request_list);
> 
> I can't tell whether this is correct, because intel gfx driver is
> unknown to me, but from the first glance your description sounds reasonable.
> 
> I'm out of office till ~ next week's tuesday, and on return I'll try
> to test it on the hardware in question.

Keith, thanks again for the patch. As promised I've tested it on the
hardware in question and yes, bad_access is gone and X seems to work,
so thank you, but...


I see there are more such bugs in introduced-in-guilty-patch
intel_render_ring_init_dri(). For example ring->irq_queue is
left uninitialized and also ring->irq_lock etc...

I'm X newbie, so if here is something stupid X-wise, please don't
beat me too hard, but to me the gist of the problem is the original
patch, where Chris does

( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 03e3370..51fbc5e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>         return intel_init_ring_buffer(dev, ring);
>  }
>  
> +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> +{
> +       drm_i915_private_t *dev_priv = dev->dev_private;
> +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> +
> +       *ring = render_ring;
          ^^^^^^^^^^^^^^^^^^^
          here resets

> +       if (INTEL_INFO(dev)->gen >= 6) {
> +               ring->add_request = gen6_add_request;
> +               ring->irq_get = gen6_render_ring_get_irq;
> +               ring->irq_put = gen6_render_ring_put_irq;
> +       } else if (IS_GEN5(dev)) {
> +               ring->add_request = pc_render_add_request;
> +               ring->get_seqno = pc_render_get_seqno;
> +       }

and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer():

> +       ring->dev = dev;
> +       INIT_LIST_HEAD(&ring->active_list);
> +       INIT_LIST_HEAD(&ring->request_list);
> +       INIT_LIST_HEAD(&ring->gpu_write_list);
> +
> +       ring->size = size;
> +       ring->effective_size = ring->size;
> +       if (IS_I830(ring->dev))
> +               ring->effective_size -= 128;
> +
> +       ring->map.offset = start;
> +       ring->map.size = size;
> +       ring->map.type = 0;
> +       ring->map.flags = 0;
> +       ring->map.mtrr = 0;
...

where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
ring->effective_size tweak even stripped original comment:

# original version from intel_init_ring_buffer():
        /* Workaround an erratum on the i830 which causes a hang if
         * the TAIL pointer points to within the last 2 cachelines
         * of the buffer.
         */
        ring->effective_size = ring->size;
        if (IS_I830(ring->dev))
                ring->effective_size -= 128;

...


The line marked "here resets" resets all the fields, and maybe it's not a good
idea to re-initialize them all afterwards (missing some as this thread show),
or at least if it is really needed, share initialization code between
intel_render_ring_init_dri() and intel_init_ring_buffer() ?

>From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to
re-do it properly...


Thanks again,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-25 15:50                     ` Arnaud Lacombe
@ 2011-07-27 15:22                       ` Yoshinori Sato
  2011-07-27 17:29                         ` Arnaud Lacombe
  2011-07-28  2:08                         ` Arnaud Lacombe
  0 siblings, 2 replies; 84+ messages in thread
From: Yoshinori Sato @ 2011-07-27 15:22 UTC (permalink / raw)
  To: Arnaud Lacombe; +Cc: Linus Torvalds, Linux Kernel Mailing List, Greg KH

At Mon, 25 Jul 2011 11:50:43 -0400,
Arnaud Lacombe wrote:
> 
> Hi,
> 
> On Sun, Jul 24, 2011 at 10:21 PM, Yoshinori Sato
> <ysato@users.sourceforge.jp> wrote:
> > At Sun, 24 Jul 2011 18:04:59 -0400,
> > Arnaud Lacombe wrote:
> >>
> >> Hi,
> >>
> >> On Thu, Jul 21, 2011 at 10:59 PM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> >
> >> > So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
> >> > [...]
> >> For the record, if anybody cares, arch/h8300 no longer configure since 2.6.38:
> >>
> >> % make ARCH=h8300 menuconfig
> >> make: h8300-elf-gcc: Command not found
> >>   HOSTCC  scripts/basic/fixdep
> >>   HOSTCC  scripts/kconfig/conf.o
> >>   HOSTCC  scripts/kconfig/lxdialog/checklist.o
> >>   HOSTCC  scripts/kconfig/lxdialog/inputbox.o
> >>   HOSTCC  scripts/kconfig/lxdialog/menubox.o
> >>   HOSTCC  scripts/kconfig/lxdialog/textbox.o
> >>   HOSTCC  scripts/kconfig/lxdialog/util.o
> >>   HOSTCC  scripts/kconfig/lxdialog/yesno.o
> >>   HOSTCC  scripts/kconfig/mconf.o
> >>   SHIPPED scripts/kconfig/zconf.tab.c
> >>   SHIPPED scripts/kconfig/lex.zconf.c
> >>   SHIPPED scripts/kconfig/zconf.hash.c
> >>   HOSTCC  scripts/kconfig/zconf.tab.o
> >>   HOSTLD  scripts/kconfig/mconf
> >> scripts/kconfig/mconf Kconfig
> >> arch/h8300/Kconfig:198: can't open file "drivers/serial/Kconfig"
> >> make[1]: *** [menuconfig] Error 1
> >> make: *** [menuconfig] Error 2
> >>
> >> Yes, I know I have no h8300-elf-gcc, but it does not change the fact
> >> that the arch tries to include a non-existent file. It's been broken
> >> by:
> >>
> >> commit ab4382d27412e7e3e7c936e8d50d8888dfac3df8
> >> Author: Greg Kroah-Hartman <gregkh@suse.de>
> >> Date:   Thu Jan 13 12:10:18 2011 -0800
> >>
> >>     tty: move drivers/serial/ to drivers/tty/serial/
> >>
> >>     The serial drivers are really just tty drivers, so move them to
> >>     drivers/tty/ to make things a bit neater overall.
> >>     This is part of the tty/serial driver movement proceedure as proposed by
> >>     Arnd Bergmann and approved by everyone involved a number of months ago.
> >>
> >>     Cc: Arnd Bergmann <arnd@arndb.de>
> >>     Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> >>     Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> >>     Cc: Rogier Wolff <R.E.Wolff@bitwizard.nl>
> >>     Cc: Michael H. Warfield <mhw@wittsend.com>
> >>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> >>
> >>  - Arnaud
> >
> > Please try this fix.
> > Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
> >
> > diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
> > index 091ed61..910e5ad 100644
> > --- a/arch/h8300/Kconfig
> > +++ b/arch/h8300/Kconfig
> > @@ -89,125 +89,7 @@ endmenu
> >
> >  source "net/Kconfig"
> >
> > -source "drivers/base/Kconfig"
> > -
> > -source "drivers/mtd/Kconfig"
> > -
> > -source "drivers/block/Kconfig"
> > -
> > -source "drivers/ide/Kconfig"
> > -
> > -source "arch/h8300/Kconfig.ide"
> > -
> > -source "drivers/net/Kconfig"
> > -
> > -#
> > -# input - input/joystick depends on it. As does USB.
> > -#
> > -source "drivers/input/Kconfig"
> > -
> > -menu "Character devices"
> > -
> > -config VT
> > -       bool "Virtual terminal"
> > -       ---help---
> > -         If you say Y here, you will get support for terminal devices with
> > -         display and keyboard devices. These are called "virtual" because you
> > -         can run several virtual terminals (also called virtual consoles) on
> > -         one physical terminal. This is rather useful, for example one
> > -         virtual terminal can collect system messages and warnings, another
> > -         one can be used for a text-mode user session, and a third could run
> > -         an X session, all in parallel. Switching between virtual terminals
> > -         is done with certain key combinations, usually Alt-<function key>.
> > -
> > -         The setterm command ("man setterm") can be used to change the
> > -         properties (such as colors or beeping) of a virtual terminal. The
> > -         man page console_codes(4) ("man console_codes") contains the special
> > -         character sequences that can be used to change those properties
> > -         directly. The fonts used on virtual terminals can be changed with
> > -         the setfont ("man setfont") command and the key bindings are defined
> > -         with the loadkeys ("man loadkeys") command.
> > -
> > -         You need at least one virtual terminal device in order to make use
> > -         of your keyboard and monitor. Therefore, only people configuring an
> > -         embedded system would want to say N here in order to save some
> > -         memory; the only way to log into such a system is then via a serial
> > -         or network connection.
> > -
> > -         If unsure, say Y, or else you won't be able to do much with your new
> > -         shiny Linux system :-)
> > -
> > -config VT_CONSOLE
> > -       bool "Support for console on virtual terminal"
> > -       depends on VT
> > -       ---help---
> > -         The system console is the device which receives all kernel messages
> > -         and warnings and which allows logins in single user mode. If you
> > -         answer Y here, a virtual terminal (the device used to interact with
> > -         a physical terminal) can be used as system console. This is the most
> > -         common mode of operations, so you should say Y here unless you want
> > -         the kernel messages be output only to a serial port (in which case
> > -         you should say Y to "Console on serial port", below).
> > -
> > -         If you do say Y here, by default the currently visible virtual
> > -         terminal (/dev/tty0) will be used as system console. You can change
> > -         that with a kernel command line option such as "console=tty3" which
> > -         would use the third virtual terminal as system console. (Try "man
> > -         bootparam" or see the documentation of your boot loader (lilo or
> > -         loadlin) about how to pass options to the kernel at boot time.)
> > -
> > -         If unsure, say Y.
> > -
> > -config HW_CONSOLE
> > -       bool
> > -       depends on VT && !S390 && !UM
> > -       default y
> > -
> > -comment "Unix98 PTY support"
> > -
> > -config UNIX98_PTYS
> > -       bool "Unix98 PTY support"
> > -       ---help---
> > -         A pseudo terminal (PTY) is a software device consisting of two
> > -         halves: a master and a slave. The slave device behaves identical to
> > -         a physical terminal; the master device is used by a process to
> > -         read data from and write data to the slave, thereby emulating a
> > -         terminal. Typical programs for the master side are telnet servers
> > -         and xterms.
> > -
> > -         Linux has traditionally used the BSD-like names /dev/ptyxx for
> > -         masters and /dev/ttyxx for slaves of pseudo terminals. This scheme
> > -         has a number of problems. The GNU C library glibc 2.1 and later,
> > -         however, supports the Unix98 naming standard: in order to acquire a
> > -         pseudo terminal, a process opens /dev/ptmx; the number of the pseudo
> > -         terminal is then made available to the process and the pseudo
> > -         terminal slave can be accessed as /dev/pts/<number>. What was
> > -         traditionally /dev/ttyp2 will then be /dev/pts/2, for example.
> > -
> > -         The entries in /dev/pts/ are created on the fly by a virtual
> > -         file system; therefore, if you say Y here you should say Y to
> > -         "/dev/pts file system for Unix98 PTYs" as well.
> > -
> > -         If you want to say Y here, you need to have the C library glibc 2.1
> > -         or later (equal to libc-6.1, check with "ls -l /lib/libc.so.*").
> > -         Read the instructions in <file:Documentation/Changes> pertaining to
> > -         pseudo terminals. It's safe to say N.
> > -
> > -source "drivers/char/pcmcia/Kconfig"
> > -
> > -source "drivers/serial/Kconfig"
> > -
> > -source "drivers/i2c/Kconfig"
> > -
> > -source "drivers/hwmon/Kconfig"
> > -
> > -source "drivers/usb/Kconfig"
> > -
> > -source "drivers/uwb/Kconfig"
> > -
> > -endmenu
> > -
> > -source "drivers/staging/Kconfig"
> > +source "drivers/Kconfig"
> >
> >  source "fs/Kconfig"
> >
> > diff --git a/arch/h8300/include/asm/types.h b/arch/h8300/include/asm/types.h
> > index bb2c91a..b9e79bc 100644
> > --- a/arch/h8300/include/asm/types.h
> > +++ b/arch/h8300/include/asm/types.h
> > @@ -1,29 +1 @@
> > -#ifndef _H8300_TYPES_H
> > -#define _H8300_TYPES_H
> > -
> > -#include <asm-generic/int-ll64.h>
> > -
> > -#if !defined(__ASSEMBLY__)
> > -
> > -/*
> > - * This file is never included by application software unless
> > - * explicitly requested (e.g., via linux/types.h) in which case the
> > - * application is Linux specific so (user-) name space pollution is
> > - * not a major issue.  However, for interoperability, libraries still
> > - * need to be careful to avoid a name clashes.
> > - */
> > -
> > -typedef unsigned short umode_t;
> > -
> > -/*
> > - * These aren't exported outside the kernel to avoid name space clashes
> > - */
> > -#ifdef __KERNEL__
> > -
> > -#define BITS_PER_LONG 32
> > -
> > -#endif /* __KERNEL__ */
> > -
> > -#endif /* __ASSEMBLY__ */
> > -
> > -#endif /* _H8300_TYPES_H */
> > +#include <asm-generic/types.h>
> > diff --git a/arch/h8300/include/asm/unistd.h b/arch/h8300/include/asm/unistd.h
> > index 2c3f8e6..7cdb4ea 100644
> > --- a/arch/h8300/include/asm/unistd.h
> > +++ b/arch/h8300/include/asm/unistd.h
> > @@ -325,11 +325,37 @@
> >  #define __NR_move_pages                317
> >  #define __NR_getcpu            318
> >  #define __NR_epoll_pwait       319
> > -#define __NR_setns             320
> > +#define __NR_utimensat         320
> > +#define __NR_signalfd          321
> > +#define __NR_timerfd_create    322
> > +#define __NR_eventfd           323
> > +#define __NR_fallocate         324
> > +#define __NR_timerfd_settime   325
> > +#define __NR_timerfd_gettime   326
> > +#define __NR_signalfd4         327
> > +#define __NR_eventfd2          328
> > +#define __NR_epoll_create1     329
> > +#define __NR_dup3              330
> > +#define __NR_pipe2             331
> > +#define __NR_inotify_init1     332
> > +#define __NR_preadv            333
> > +#define __NR_pwritev           334
> > +#define __NR_rt_tgsigqueueinfo 335
> > +#define __NR_perf_event_open   336
> > +#define __NR_recvmmsg          337
> > +#define __NR_fanotify_init     338
> > +#define __NR_fanotify_mark     339
> > +#define __NR_prlimit64         340
> > +#define __NR_name_to_handle_at 341
> > +#define __NR_open_by_handle_at  342
> > +#define __NR_clock_adjtime     343
> > +#define __NR_syncfs             344
> > +#define __NR_sendmmsg          345
> > +#define __NR_setns             346
> >
> >  #ifdef __KERNEL__
> >
> > -#define NR_syscalls 321
> > +#define NR_syscalls 347
> >
> >  #define __ARCH_WANT_IPC_PARSE_VERSION
> >  #define __ARCH_WANT_OLD_READDIR
> > diff --git a/arch/h8300/kernel/syscalls.S b/arch/h8300/kernel/syscalls.S
> > index f4b2e67..4cfe56c 100644
> > --- a/arch/h8300/kernel/syscalls.S
> > +++ b/arch/h8300/kernel/syscalls.S
> > @@ -333,8 +333,34 @@ SYMBOL_NAME_LABEL(sys_call_table)
> >        .long SYMBOL_NAME(sys_ni_syscall)       /* sys_move_pages */
> >        .long SYMBOL_NAME(sys_getcpu)
> >        .long SYMBOL_NAME(sys_ni_syscall)       /* sys_epoll_pwait */
> > -       .long SYMBOL_NAME(sys_setns)            /* 320 */
> > -
> > +       .long SYMBOL_NAME(sys_utimensat)                /* 320 */
> > +       .long SYMBOL_NAME(sys_signalfd)
> > +       .long SYMBOL_NAME(sys_timerfd_create)
> > +       .long SYMBOL_NAME(sys_eventfd)
> > +       .long SYMBOL_NAME(sys_fallocate)
> > +       .long SYMBOL_NAME(sys_timerfd_settime)  /* 325 */
> > +       .long SYMBOL_NAME(sys_timerfd_gettime)
> > +       .long SYMBOL_NAME(sys_signalfd4)
> > +       .long SYMBOL_NAME(sys_eventfd2)
> > +       .long SYMBOL_NAME(sys_epoll_create1)
> > +       .long SYMBOL_NAME(sys_dup3)                     /* 330 */
> > +       .long SYMBOL_NAME(sys_pipe2)
> > +       .long SYMBOL_NAME(sys_inotify_init1)
> > +       .long SYMBOL_NAME(sys_preadv)
> > +       .long SYMBOL_NAME(sys_pwritev)
> > +       .long SYMBOL_NAME(sys_rt_tgsigqueueinfo)        /* 335 */
> > +       .long SYMBOL_NAME(sys_perf_event_open)
> > +       .long SYMBOL_NAME(sys_recvmmsg)
> > +       .long SYMBOL_NAME(sys_fanotify_init)
> > +       .long SYMBOL_NAME(sys_fanotify_mark)
> > +       .long SYMBOL_NAME(sys_prlimit64)                /* 340 */
> > +       .long SYMBOL_NAME(sys_name_to_handle_at)
> > +       .long SYMBOL_NAME(sys_open_by_handle_at)
> > +       .long SYMBOL_NAME(sys_clock_adjtime)
> > +       .long SYMBOL_NAME(sys_syncfs)
> > +       .long SYMBOL_NAME(sys_sendmmsg)
> > +       .long SYMBOL_NAME(sys_setns)
> > +
> >        .macro  call_sp addr
> >        mov.l   #SYMBOL_NAME(\addr),er6
> >        bra     SYMBOL_NAME(syscall_trampoline):8
> >
> With this patch, it configures, at least, but build fails with:
> 
> In file included from /src/linux/linux/include/linux/mempolicy.h:70:0,
>                  from /src/linux/linux/init/main.c:49:
> /src/linux/linux/include/linux/pagemap.h: In function 'fault_in_pages_readable':
> /src/linux/linux/include/linux/pagemap.h:444:2: error: assignment of
> read-only variable '__gu_val'
> /src/linux/linux/include/linux/pagemap.h:450:5: error: assignment of
> read-only variable '__gu_val'
> make[2]: *** [init/main.o] Error 1
> make[1]: *** [init] Error 2
> make: *** [sub-make] Error 2

OK.
I pushing latest code in here.
git.kernel.org/pub/scm/linux/kernel/git/ysato/h8300.git
Please try it.
I using gcc is v4.5.3
 
> Cross-toolchain is baremetal binutils and gcc for their respective trunk:
> 
> $ /src/h8300/obj/destdir/bin/h8300-elf-gcc -v
> Using built-in specs.
> COLLECT_GCC=/src/h8300/obj/destdir/bin/h8300-elf-gcc
> COLLECT_LTO_WRAPPER=/src/h8300/obj/destdir/libexec/gcc/h8300-elf/4.7.0/lto-wrapper
> Target: h8300-elf
> Configured with: ../gcc/configure --prefix=/src/h8300/obj/destdir
> --target=h8300-elf --enable-languages=c
> Thread model: single
> gcc version 4.7.0 20110609 (experimental) (GCC)
> 
>  - Arnaud
> 
> > --
> > Yoshinori Sato
> > <ysato@users.sourceforge.jp>
> >

-- 
Yoshinori Sato
<ysato@users.sourceforge.jp>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-27 15:22                       ` Yoshinori Sato
@ 2011-07-27 17:29                         ` Arnaud Lacombe
  2011-07-28  2:08                         ` Arnaud Lacombe
  1 sibling, 0 replies; 84+ messages in thread
From: Arnaud Lacombe @ 2011-07-27 17:29 UTC (permalink / raw)
  To: Yoshinori Sato; +Cc: Linus Torvalds, Linux Kernel Mailing List, Greg KH

Hi,

On Wed, Jul 27, 2011 at 11:22 AM, Yoshinori Sato
<ysato@users.sourceforge.jp> wrote:
> At Mon, 25 Jul 2011 11:50:43 -0400,
>> [...]
>> With this patch, it configures, at least, but build fails with:
>>
>> In file included from /src/linux/linux/include/linux/mempolicy.h:70:0,
>>                  from /src/linux/linux/init/main.c:49:
>> /src/linux/linux/include/linux/pagemap.h: In function 'fault_in_pages_readable':
>> /src/linux/linux/include/linux/pagemap.h:444:2: error: assignment of
>> read-only variable '__gu_val'
>> /src/linux/linux/include/linux/pagemap.h:450:5: error: assignment of
>> read-only variable '__gu_val'
>> make[2]: *** [init/main.o] Error 1
>> make[1]: *** [init] Error 2
>> make: *** [sub-make] Error 2
>
> OK.
> I pushing latest code in here.
> git.kernel.org/pub/scm/linux/kernel/git/ysato/h8300.git
> Please try it.
> I using gcc is v4.5.3
>
hum, official gcc seem to have plenty of instability with h8300. gcc
4.5.4 (20110726) goes further in the build, but triggers an ICE while
building `fs/read_write.c'.

Reported to upstream as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49874.

 - Arnaud

>> Cross-toolchain is baremetal binutils and gcc for their respective trunk:
>>
>> $ /src/h8300/obj/destdir/bin/h8300-elf-gcc -v
>> Using built-in specs.
>> COLLECT_GCC=/src/h8300/obj/destdir/bin/h8300-elf-gcc
>> COLLECT_LTO_WRAPPER=/src/h8300/obj/destdir/libexec/gcc/h8300-elf/4.7.0/lto-wrapper
>> Target: h8300-elf
>> Configured with: ../gcc/configure --prefix=/src/h8300/obj/destdir
>> --target=h8300-elf --enable-languages=c
>> Thread model: single
>> gcc version 4.7.0 20110609 (experimental) (GCC)
>>
>>  - Arnaud
>>
>> > --
>> > Yoshinori Sato
>> > <ysato@users.sourceforge.jp>
>> >
>
> --
> Yoshinori Sato
> <ysato@users.sourceforge.jp>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Linux 3.0 release
  2011-07-27 15:22                       ` Yoshinori Sato
  2011-07-27 17:29                         ` Arnaud Lacombe
@ 2011-07-28  2:08                         ` Arnaud Lacombe
  1 sibling, 0 replies; 84+ messages in thread
From: Arnaud Lacombe @ 2011-07-28  2:08 UTC (permalink / raw)
  To: Yoshinori Sato; +Cc: Linus Torvalds, Linux Kernel Mailing List, Greg KH

Hi,

On Wed, Jul 27, 2011 at 11:22 AM, Yoshinori Sato
<ysato@users.sourceforge.jp> wrote:
> At Mon, 25 Jul 2011 11:50:43 -0400,
> Arnaud Lacombe wrote:
>>
>> Hi,
>>
>> On Sun, Jul 24, 2011 at 10:21 PM, Yoshinori Sato
>> <ysato@users.sourceforge.jp> wrote:
>> > At Sun, 24 Jul 2011 18:04:59 -0400,
>> > Arnaud Lacombe wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Thu, Jul 21, 2011 at 10:59 PM, Linus Torvalds
>> >> <torvalds@linux-foundation.org> wrote:
>> >> >
>> >> > So there it is. Gone are the 2.6.<bignum> days, and 3.0 is out.
>> >> > [...]
>> >> For the record, if anybody cares, arch/h8300 no longer configure since 2.6.38:
>> >>
>> >> % make ARCH=h8300 menuconfig
>> >> make: h8300-elf-gcc: Command not found
>> >>   HOSTCC  scripts/basic/fixdep
>> >>   HOSTCC  scripts/kconfig/conf.o
>> >>   HOSTCC  scripts/kconfig/lxdialog/checklist.o
>> >>   HOSTCC  scripts/kconfig/lxdialog/inputbox.o
>> >>   HOSTCC  scripts/kconfig/lxdialog/menubox.o
>> >>   HOSTCC  scripts/kconfig/lxdialog/textbox.o
>> >>   HOSTCC  scripts/kconfig/lxdialog/util.o
>> >>   HOSTCC  scripts/kconfig/lxdialog/yesno.o
>> >>   HOSTCC  scripts/kconfig/mconf.o
>> >>   SHIPPED scripts/kconfig/zconf.tab.c
>> >>   SHIPPED scripts/kconfig/lex.zconf.c
>> >>   SHIPPED scripts/kconfig/zconf.hash.c
>> >>   HOSTCC  scripts/kconfig/zconf.tab.o
>> >>   HOSTLD  scripts/kconfig/mconf
>> >> scripts/kconfig/mconf Kconfig
>> >> arch/h8300/Kconfig:198: can't open file "drivers/serial/Kconfig"
>> >> make[1]: *** [menuconfig] Error 1
>> >> make: *** [menuconfig] Error 2
>> >>
>> >> Yes, I know I have no h8300-elf-gcc, but it does not change the fact
>> >> that the arch tries to include a non-existent file. It's been broken
>> >> by:
>> >>
>> >> commit ab4382d27412e7e3e7c936e8d50d8888dfac3df8
>> >> Author: Greg Kroah-Hartman <gregkh@suse.de>
>> >> Date:   Thu Jan 13 12:10:18 2011 -0800
>> >>
>> >>     tty: move drivers/serial/ to drivers/tty/serial/
>> >>
>> >>     The serial drivers are really just tty drivers, so move them to
>> >>     drivers/tty/ to make things a bit neater overall.
>> >>     This is part of the tty/serial driver movement proceedure as proposed by
>> >>     Arnd Bergmann and approved by everyone involved a number of months ago.
>> >>
>> >>     Cc: Arnd Bergmann <arnd@arndb.de>
>> >>     Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
>> >>     Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>> >>     Cc: Rogier Wolff <R.E.Wolff@bitwizard.nl>
>> >>     Cc: Michael H. Warfield <mhw@wittsend.com>
>> >>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>> >>
>> >>  - Arnaud
>> >
>> > Please try this fix.
>> > Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
>> >
>> > diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
>> > index 091ed61..910e5ad 100644
>> > --- a/arch/h8300/Kconfig
>> > +++ b/arch/h8300/Kconfig
>> > @@ -89,125 +89,7 @@ endmenu
>> >
>> >  source "net/Kconfig"
>> >
>> > -source "drivers/base/Kconfig"
>> > -
>> > -source "drivers/mtd/Kconfig"
>> > -
>> > -source "drivers/block/Kconfig"
>> > -
>> > -source "drivers/ide/Kconfig"
>> > -
>> > -source "arch/h8300/Kconfig.ide"
>> > -
>> > -source "drivers/net/Kconfig"
>> > -
>> > -#
>> > -# input - input/joystick depends on it. As does USB.
>> > -#
>> > -source "drivers/input/Kconfig"
>> > -
>> > -menu "Character devices"
>> > -
>> > -config VT
>> > -       bool "Virtual terminal"
>> > -       ---help---
>> > -         If you say Y here, you will get support for terminal devices with
>> > -         display and keyboard devices. These are called "virtual" because you
>> > -         can run several virtual terminals (also called virtual consoles) on
>> > -         one physical terminal. This is rather useful, for example one
>> > -         virtual terminal can collect system messages and warnings, another
>> > -         one can be used for a text-mode user session, and a third could run
>> > -         an X session, all in parallel. Switching between virtual terminals
>> > -         is done with certain key combinations, usually Alt-<function key>.
>> > -
>> > -         The setterm command ("man setterm") can be used to change the
>> > -         properties (such as colors or beeping) of a virtual terminal. The
>> > -         man page console_codes(4) ("man console_codes") contains the special
>> > -         character sequences that can be used to change those properties
>> > -         directly. The fonts used on virtual terminals can be changed with
>> > -         the setfont ("man setfont") command and the key bindings are defined
>> > -         with the loadkeys ("man loadkeys") command.
>> > -
>> > -         You need at least one virtual terminal device in order to make use
>> > -         of your keyboard and monitor. Therefore, only people configuring an
>> > -         embedded system would want to say N here in order to save some
>> > -         memory; the only way to log into such a system is then via a serial
>> > -         or network connection.
>> > -
>> > -         If unsure, say Y, or else you won't be able to do much with your new
>> > -         shiny Linux system :-)
>> > -
>> > -config VT_CONSOLE
>> > -       bool "Support for console on virtual terminal"
>> > -       depends on VT
>> > -       ---help---
>> > -         The system console is the device which receives all kernel messages
>> > -         and warnings and which allows logins in single user mode. If you
>> > -         answer Y here, a virtual terminal (the device used to interact with
>> > -         a physical terminal) can be used as system console. This is the most
>> > -         common mode of operations, so you should say Y here unless you want
>> > -         the kernel messages be output only to a serial port (in which case
>> > -         you should say Y to "Console on serial port", below).
>> > -
>> > -         If you do say Y here, by default the currently visible virtual
>> > -         terminal (/dev/tty0) will be used as system console. You can change
>> > -         that with a kernel command line option such as "console=tty3" which
>> > -         would use the third virtual terminal as system console. (Try "man
>> > -         bootparam" or see the documentation of your boot loader (lilo or
>> > -         loadlin) about how to pass options to the kernel at boot time.)
>> > -
>> > -         If unsure, say Y.
>> > -
>> > -config HW_CONSOLE
>> > -       bool
>> > -       depends on VT && !S390 && !UM
>> > -       default y
>> > -
>> > -comment "Unix98 PTY support"
>> > -
>> > -config UNIX98_PTYS
>> > -       bool "Unix98 PTY support"
>> > -       ---help---
>> > -         A pseudo terminal (PTY) is a software device consisting of two
>> > -         halves: a master and a slave. The slave device behaves identical to
>> > -         a physical terminal; the master device is used by a process to
>> > -         read data from and write data to the slave, thereby emulating a
>> > -         terminal. Typical programs for the master side are telnet servers
>> > -         and xterms.
>> > -
>> > -         Linux has traditionally used the BSD-like names /dev/ptyxx for
>> > -         masters and /dev/ttyxx for slaves of pseudo terminals. This scheme
>> > -         has a number of problems. The GNU C library glibc 2.1 and later,
>> > -         however, supports the Unix98 naming standard: in order to acquire a
>> > -         pseudo terminal, a process opens /dev/ptmx; the number of the pseudo
>> > -         terminal is then made available to the process and the pseudo
>> > -         terminal slave can be accessed as /dev/pts/<number>. What was
>> > -         traditionally /dev/ttyp2 will then be /dev/pts/2, for example.
>> > -
>> > -         The entries in /dev/pts/ are created on the fly by a virtual
>> > -         file system; therefore, if you say Y here you should say Y to
>> > -         "/dev/pts file system for Unix98 PTYs" as well.
>> > -
>> > -         If you want to say Y here, you need to have the C library glibc 2.1
>> > -         or later (equal to libc-6.1, check with "ls -l /lib/libc.so.*").
>> > -         Read the instructions in <file:Documentation/Changes> pertaining to
>> > -         pseudo terminals. It's safe to say N.
>> > -
>> > -source "drivers/char/pcmcia/Kconfig"
>> > -
>> > -source "drivers/serial/Kconfig"
>> > -
>> > -source "drivers/i2c/Kconfig"
>> > -
>> > -source "drivers/hwmon/Kconfig"
>> > -
>> > -source "drivers/usb/Kconfig"
>> > -
>> > -source "drivers/uwb/Kconfig"
>> > -
>> > -endmenu
>> > -
>> > -source "drivers/staging/Kconfig"
>> > +source "drivers/Kconfig"
>> >
>> >  source "fs/Kconfig"
>> >
>> > diff --git a/arch/h8300/include/asm/types.h b/arch/h8300/include/asm/types.h
>> > index bb2c91a..b9e79bc 100644
>> > --- a/arch/h8300/include/asm/types.h
>> > +++ b/arch/h8300/include/asm/types.h
>> > @@ -1,29 +1 @@
>> > -#ifndef _H8300_TYPES_H
>> > -#define _H8300_TYPES_H
>> > -
>> > -#include <asm-generic/int-ll64.h>
>> > -
>> > -#if !defined(__ASSEMBLY__)
>> > -
>> > -/*
>> > - * This file is never included by application software unless
>> > - * explicitly requested (e.g., via linux/types.h) in which case the
>> > - * application is Linux specific so (user-) name space pollution is
>> > - * not a major issue.  However, for interoperability, libraries still
>> > - * need to be careful to avoid a name clashes.
>> > - */
>> > -
>> > -typedef unsigned short umode_t;
>> > -
>> > -/*
>> > - * These aren't exported outside the kernel to avoid name space clashes
>> > - */
>> > -#ifdef __KERNEL__
>> > -
>> > -#define BITS_PER_LONG 32
>> > -
>> > -#endif /* __KERNEL__ */
>> > -
>> > -#endif /* __ASSEMBLY__ */
>> > -
>> > -#endif /* _H8300_TYPES_H */
>> > +#include <asm-generic/types.h>
>> > diff --git a/arch/h8300/include/asm/unistd.h b/arch/h8300/include/asm/unistd.h
>> > index 2c3f8e6..7cdb4ea 100644
>> > --- a/arch/h8300/include/asm/unistd.h
>> > +++ b/arch/h8300/include/asm/unistd.h
>> > @@ -325,11 +325,37 @@
>> >  #define __NR_move_pages                317
>> >  #define __NR_getcpu            318
>> >  #define __NR_epoll_pwait       319
>> > -#define __NR_setns             320
>> > +#define __NR_utimensat         320
>> > +#define __NR_signalfd          321
>> > +#define __NR_timerfd_create    322
>> > +#define __NR_eventfd           323
>> > +#define __NR_fallocate         324
>> > +#define __NR_timerfd_settime   325
>> > +#define __NR_timerfd_gettime   326
>> > +#define __NR_signalfd4         327
>> > +#define __NR_eventfd2          328
>> > +#define __NR_epoll_create1     329
>> > +#define __NR_dup3              330
>> > +#define __NR_pipe2             331
>> > +#define __NR_inotify_init1     332
>> > +#define __NR_preadv            333
>> > +#define __NR_pwritev           334
>> > +#define __NR_rt_tgsigqueueinfo 335
>> > +#define __NR_perf_event_open   336
>> > +#define __NR_recvmmsg          337
>> > +#define __NR_fanotify_init     338
>> > +#define __NR_fanotify_mark     339
>> > +#define __NR_prlimit64         340
>> > +#define __NR_name_to_handle_at 341
>> > +#define __NR_open_by_handle_at  342
>> > +#define __NR_clock_adjtime     343
>> > +#define __NR_syncfs             344
>> > +#define __NR_sendmmsg          345
>> > +#define __NR_setns             346
>> >
>> >  #ifdef __KERNEL__
>> >
>> > -#define NR_syscalls 321
>> > +#define NR_syscalls 347
>> >
>> >  #define __ARCH_WANT_IPC_PARSE_VERSION
>> >  #define __ARCH_WANT_OLD_READDIR
>> > diff --git a/arch/h8300/kernel/syscalls.S b/arch/h8300/kernel/syscalls.S
>> > index f4b2e67..4cfe56c 100644
>> > --- a/arch/h8300/kernel/syscalls.S
>> > +++ b/arch/h8300/kernel/syscalls.S
>> > @@ -333,8 +333,34 @@ SYMBOL_NAME_LABEL(sys_call_table)
>> >        .long SYMBOL_NAME(sys_ni_syscall)       /* sys_move_pages */
>> >        .long SYMBOL_NAME(sys_getcpu)
>> >        .long SYMBOL_NAME(sys_ni_syscall)       /* sys_epoll_pwait */
>> > -       .long SYMBOL_NAME(sys_setns)            /* 320 */
>> > -
>> > +       .long SYMBOL_NAME(sys_utimensat)                /* 320 */
>> > +       .long SYMBOL_NAME(sys_signalfd)
>> > +       .long SYMBOL_NAME(sys_timerfd_create)
>> > +       .long SYMBOL_NAME(sys_eventfd)
>> > +       .long SYMBOL_NAME(sys_fallocate)
>> > +       .long SYMBOL_NAME(sys_timerfd_settime)  /* 325 */
>> > +       .long SYMBOL_NAME(sys_timerfd_gettime)
>> > +       .long SYMBOL_NAME(sys_signalfd4)
>> > +       .long SYMBOL_NAME(sys_eventfd2)
>> > +       .long SYMBOL_NAME(sys_epoll_create1)
>> > +       .long SYMBOL_NAME(sys_dup3)                     /* 330 */
>> > +       .long SYMBOL_NAME(sys_pipe2)
>> > +       .long SYMBOL_NAME(sys_inotify_init1)
>> > +       .long SYMBOL_NAME(sys_preadv)
>> > +       .long SYMBOL_NAME(sys_pwritev)
>> > +       .long SYMBOL_NAME(sys_rt_tgsigqueueinfo)        /* 335 */
>> > +       .long SYMBOL_NAME(sys_perf_event_open)
>> > +       .long SYMBOL_NAME(sys_recvmmsg)
>> > +       .long SYMBOL_NAME(sys_fanotify_init)
>> > +       .long SYMBOL_NAME(sys_fanotify_mark)
>> > +       .long SYMBOL_NAME(sys_prlimit64)                /* 340 */
>> > +       .long SYMBOL_NAME(sys_name_to_handle_at)
>> > +       .long SYMBOL_NAME(sys_open_by_handle_at)
>> > +       .long SYMBOL_NAME(sys_clock_adjtime)
>> > +       .long SYMBOL_NAME(sys_syncfs)
>> > +       .long SYMBOL_NAME(sys_sendmmsg)
>> > +       .long SYMBOL_NAME(sys_setns)
>> > +
>> >        .macro  call_sp addr
>> >        mov.l   #SYMBOL_NAME(\addr),er6
>> >        bra     SYMBOL_NAME(syscall_trampoline):8
>> >
>> With this patch, it configures, at least, but build fails with:
>>
>> In file included from /src/linux/linux/include/linux/mempolicy.h:70:0,
>>                  from /src/linux/linux/init/main.c:49:
>> /src/linux/linux/include/linux/pagemap.h: In function 'fault_in_pages_readable':
>> /src/linux/linux/include/linux/pagemap.h:444:2: error: assignment of
>> read-only variable '__gu_val'
>> /src/linux/linux/include/linux/pagemap.h:450:5: error: assignment of
>> read-only variable '__gu_val'
>> make[2]: *** [init/main.o] Error 1
>> make[1]: *** [init] Error 2
>> make: *** [sub-make] Error 2
>
> OK.
> I pushing latest code in here.
> git.kernel.org/pub/scm/linux/kernel/git/ysato/h8300.git
> Please try it.
> I using gcc is v4.5.3
>
just some upgrade. As stated on my previous mail, I got gcc 4.5.5 to
ICE on fs/dcache.c.

Beside that I'm getting many of the following build error :

make[2]: *** [drivers/tty] Error 2
In file included from /src/linux/linux/include/linux/selection.h:11:0,
                 from /src/linux/linux/drivers/video/console/vgacon.c:45:
/src/linux/linux/include/linux/vt_buffer.h:18:21: fatal error:
asm/vga.h: No such file or directory

with the defconfig.

Moreover, I gave gcc 4.7.0 another try. The previous error got fixed,
but I'm getting 3 different ICE during the full build with today gcc's
source. Two of them related to the one I got on gcc 4.5.5. All have
been reported upstream.

 - Arnaud

ps: I do not really have a need of an h8300 kernel, all this is rather
to probe the state of the arch :) If there is to be more discussion
about this, maybe we should move on a dedicated thread.

>> Cross-toolchain is baremetal binutils and gcc for their respective trunk:
>>
>> $ /src/h8300/obj/destdir/bin/h8300-elf-gcc -v
>> Using built-in specs.
>> COLLECT_GCC=/src/h8300/obj/destdir/bin/h8300-elf-gcc
>> COLLECT_LTO_WRAPPER=/src/h8300/obj/destdir/libexec/gcc/h8300-elf/4.7.0/lto-wrapper
>> Target: h8300-elf
>> Configured with: ../gcc/configure --prefix=/src/h8300/obj/destdir
>> --target=h8300-elf --enable-languages=c
>> Thread model: single
>> gcc version 4.7.0 20110609 (experimental) (GCC)
>>
>>  - Arnaud
>>
>> > --
>> > Yoshinori Sato
>> > <ysato@users.sourceforge.jp>
>> >
>
> --
> Yoshinori Sato
> <ysato@users.sourceforge.jp>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-07-26 13:48                         ` Kirill Smelkov
@ 2011-08-09 12:08                           ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 12:08 UTC (permalink / raw)
  To: Keith Packard
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > Keith,
> > 
> > first of all thanks for your prompt reply. Then...
> > 
> > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > > 
> > > > And now after v3.0 is out, I've tested it again, and yes, like it was
> > > > broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> > > > bad io access the system freezes completely:
> > > 
> > > I looked at this when I first saw it (a couple of weeks ago), and I
> > > couldn't see any obvious reason this patch would cause this particular
> > > problem. I didn't want to revert the patch at that point as I feared it
> > > would cause other subtle problems. Given that you've got a work-around,
> > > it seemed best to just push this off past 3.0.
> > 
> > What kind of a workaround are you talking about? Sorry, to me it all
> > looked like "UMS is being ignored forever". Anyway, let's move on to try
> > to solve the issue.
> > 
> > 
> > > Given the failing address passed to ioread32, this seems like it's
> > > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
> > > which is an offset in 32-bit units within the hardware status page. If
> > > the status_page.page_addr value was zero, then the computed address
> > > would end up being 0x84.
> > > 
> > > And, it looks like status_page.page_addr *will* end up being zero as a
> > > result of the patch in question. The patch resets the entire ring
> > > structure contents back to the initial values, which includes smashing
> > > the status_page structure to zero, clearing the value of
> > > status_page.page_addr set in i915_init_phys_hws.
> > > 
> > > Here's an untested patch which moves the initialization of
> > > status_page.page_addr into intel_render_ring_init_dri. I note that
> > > intel_init_render_ring_buffer *already* has the setting of the
> > > status_page.page_addr value, and so I've removed the setting of
> > > status_page.page_addr from i915_init_phys_hws.
> > > 
> > > I suspect we could remove the memset from intel_init_render_ring_buffer;
> > > it seems entirely superfluous given the memset in i915_init_phys_hws.
> > > 
> > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
> > > From: Keith Packard <keithp@keithp.com>
> > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> > >  intel_render_ring_init_dri
> > > 
> > > Physically-addressed hardware status pages are initialized early in
> > > the driver load process by i915_init_phys_hws. For UMS environments,
> > > the ring structure is not initialized until the X server starts. At
> > > that point, the entire ring structure is re-initialized with all new
> > > values. Any values set in the ring structure (including
> > > ring->status_page.page_addr) will be lost when the ring is
> > > re-initialized.
> > > 
> > > This patch moves the initialization of the status_page.page_addr value
> > > to intel_render_ring_init_dri.
> > > 
> > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > > index 1271282..8a3942c 100644
> > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
> > >  static int i915_init_phys_hws(struct drm_device *dev)
> > >  {
> > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > >  
> > >  	/* Program Hardware Status Page */
> > >  	dev_priv->status_page_dmah =
> > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
> > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > >  		return -ENOMEM;
> > >  	}
> > > -	ring->status_page.page_addr =
> > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > >  
> > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > +	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
> > > +		  0, PAGE_SIZE);
> > >  
> > >  	i915_write_hws_pga(dev);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index e961568..47b9b27 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> > >  		ring->get_seqno = pc_render_get_seqno;
> > >  	}
> > >  
> > > +	if (!I915_NEED_GFX_HWS(dev))
> > > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > > +
> > >  	ring->dev = dev;
> > >  	INIT_LIST_HEAD(&ring->active_list);
> > >  	INIT_LIST_HEAD(&ring->request_list);
> > 
> > I can't tell whether this is correct, because intel gfx driver is
> > unknown to me, but from the first glance your description sounds reasonable.
> > 
> > I'm out of office till ~ next week's tuesday, and on return I'll try
> > to test it on the hardware in question.
> 
> Keith, thanks again for the patch. As promised I've tested it on the
> hardware in question and yes, bad_access is gone and X seems to work,
> so thank you, but...
> 
> 
> I see there are more such bugs in introduced-in-guilty-patch
> intel_render_ring_init_dri(). For example ring->irq_queue is
> left uninitialized and also ring->irq_lock etc...
>
>
> I'm X newbie, so if here is something stupid X-wise, please don't
> beat me too hard, but to me the gist of the problem is the original
> patch, where Chris does
> 
> ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 03e3370..51fbc5e 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
> >         return intel_init_ring_buffer(dev, ring);
> >  }
> >  
> > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> > +{
> > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > +
> > +       *ring = render_ring;
>           ^^^^^^^^^^^^^^^^^^^
>           here resets
> 
> > +       if (INTEL_INFO(dev)->gen >= 6) {
> > +               ring->add_request = gen6_add_request;
> > +               ring->irq_get = gen6_render_ring_get_irq;
> > +               ring->irq_put = gen6_render_ring_put_irq;
> > +       } else if (IS_GEN5(dev)) {
> > +               ring->add_request = pc_render_add_request;
> > +               ring->get_seqno = pc_render_get_seqno;
> > +       }
> 
> and then the rest of the `ring` is initialized seemingly copy-pasted
> from intel_init_ring_buffer():
> 
> > +       ring->dev = dev;
> > +       INIT_LIST_HEAD(&ring->active_list);
> > +       INIT_LIST_HEAD(&ring->request_list);
> > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > +
> > +       ring->size = size;
> > +       ring->effective_size = ring->size;
> > +       if (IS_I830(ring->dev))
> > +               ring->effective_size -= 128;
> > +
> > +       ring->map.offset = start;
> > +       ring->map.size = size;
> > +       ring->map.type = 0;
> > +       ring->map.flags = 0;
> > +       ring->map.mtrr = 0;
> ...
> 
> where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
> ring->effective_size tweak even stripped original comment:
> 
> # original version from intel_init_ring_buffer():
>         /* Workaround an erratum on the i830 which causes a hang if
>          * the TAIL pointer points to within the last 2 cachelines
>          * of the buffer.
>          */
>         ring->effective_size = ring->size;
>         if (IS_I830(ring->dev))
>                 ring->effective_size -= 128;
> 
> ...
> 
> 
> The line marked "here resets" resets all the fields, and maybe it's not a good
> idea to re-initialize them all afterwards (missing some as this thread show),
> or at least if it is really needed, share initialization code between
> intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> 
> >From the outside it looks like the offending patch was done as a quick
> fix in a hurry (lots of copy-paste), and maybe it would be better to
> re-do it properly...

Silence... ?

I read UMS is still ignored, because e.g. that uninitialized
ring->irq_lock which I've wrote about above is for sure used e.g. in
gen6_render_ring_get_irq() added to ring vtable in
intel_render_ring_init_dri().

And also is copy-pasting, instead of properly structuring things, ok?


Why not revert what caused trouble and introduced other subtle bugs, and
redo things properly in the first place?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 12:08                           ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 12:08 UTC (permalink / raw)
  To: Keith Packard
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Pekka Enberg, Ray Lee, Andrew Morton, Linus Torvalds

On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > Keith,
> > 
> > first of all thanks for your prompt reply. Then...
> > 
> > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > > 
> > > > And now after v3.0 is out, I've tested it again, and yes, like it was
> > > > broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
> > > > bad io access the system freezes completely:
> > > 
> > > I looked at this when I first saw it (a couple of weeks ago), and I
> > > couldn't see any obvious reason this patch would cause this particular
> > > problem. I didn't want to revert the patch at that point as I feared it
> > > would cause other subtle problems. Given that you've got a work-around,
> > > it seemed best to just push this off past 3.0.
> > 
> > What kind of a workaround are you talking about? Sorry, to me it all
> > looked like "UMS is being ignored forever". Anyway, let's move on to try
> > to solve the issue.
> > 
> > 
> > > Given the failing address passed to ioread32, this seems like it's
> > > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21,
> > > which is an offset in 32-bit units within the hardware status page. If
> > > the status_page.page_addr value was zero, then the computed address
> > > would end up being 0x84.
> > > 
> > > And, it looks like status_page.page_addr *will* end up being zero as a
> > > result of the patch in question. The patch resets the entire ring
> > > structure contents back to the initial values, which includes smashing
> > > the status_page structure to zero, clearing the value of
> > > status_page.page_addr set in i915_init_phys_hws.
> > > 
> > > Here's an untested patch which moves the initialization of
> > > status_page.page_addr into intel_render_ring_init_dri. I note that
> > > intel_init_render_ring_buffer *already* has the setting of the
> > > status_page.page_addr value, and so I've removed the setting of
> > > status_page.page_addr from i915_init_phys_hws.
> > > 
> > > I suspect we could remove the memset from intel_init_render_ring_buffer;
> > > it seems entirely superfluous given the memset in i915_init_phys_hws.
> > > 
> > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001
> > > From: Keith Packard <keithp@keithp.com>
> > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> > >  intel_render_ring_init_dri
> > > 
> > > Physically-addressed hardware status pages are initialized early in
> > > the driver load process by i915_init_phys_hws. For UMS environments,
> > > the ring structure is not initialized until the X server starts. At
> > > that point, the entire ring structure is re-initialized with all new
> > > values. Any values set in the ring structure (including
> > > ring->status_page.page_addr) will be lost when the ring is
> > > re-initialized.
> > > 
> > > This patch moves the initialization of the status_page.page_addr value
> > > to intel_render_ring_init_dri.
> > > 
> > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > > index 1271282..8a3942c 100644
> > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
> > >  static int i915_init_phys_hws(struct drm_device *dev)
> > >  {
> > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > >  
> > >  	/* Program Hardware Status Page */
> > >  	dev_priv->status_page_dmah =
> > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
> > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > >  		return -ENOMEM;
> > >  	}
> > > -	ring->status_page.page_addr =
> > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > >  
> > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > +	memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
> > > +		  0, PAGE_SIZE);
> > >  
> > >  	i915_write_hws_pga(dev);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index e961568..47b9b27 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> > >  		ring->get_seqno = pc_render_get_seqno;
> > >  	}
> > >  
> > > +	if (!I915_NEED_GFX_HWS(dev))
> > > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > > +
> > >  	ring->dev = dev;
> > >  	INIT_LIST_HEAD(&ring->active_list);
> > >  	INIT_LIST_HEAD(&ring->request_list);
> > 
> > I can't tell whether this is correct, because intel gfx driver is
> > unknown to me, but from the first glance your description sounds reasonable.
> > 
> > I'm out of office till ~ next week's tuesday, and on return I'll try
> > to test it on the hardware in question.
> 
> Keith, thanks again for the patch. As promised I've tested it on the
> hardware in question and yes, bad_access is gone and X seems to work,
> so thank you, but...
> 
> 
> I see there are more such bugs in introduced-in-guilty-patch
> intel_render_ring_init_dri(). For example ring->irq_queue is
> left uninitialized and also ring->irq_lock etc...
>
>
> I'm X newbie, so if here is something stupid X-wise, please don't
> beat me too hard, but to me the gist of the problem is the original
> patch, where Chris does
> 
> ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 03e3370..51fbc5e 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
> >         return intel_init_ring_buffer(dev, ring);
> >  }
> >  
> > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
> > +{
> > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > +
> > +       *ring = render_ring;
>           ^^^^^^^^^^^^^^^^^^^
>           here resets
> 
> > +       if (INTEL_INFO(dev)->gen >= 6) {
> > +               ring->add_request = gen6_add_request;
> > +               ring->irq_get = gen6_render_ring_get_irq;
> > +               ring->irq_put = gen6_render_ring_put_irq;
> > +       } else if (IS_GEN5(dev)) {
> > +               ring->add_request = pc_render_add_request;
> > +               ring->get_seqno = pc_render_get_seqno;
> > +       }
> 
> and then the rest of the `ring` is initialized seemingly copy-pasted
> from intel_init_ring_buffer():
> 
> > +       ring->dev = dev;
> > +       INIT_LIST_HEAD(&ring->active_list);
> > +       INIT_LIST_HEAD(&ring->request_list);
> > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > +
> > +       ring->size = size;
> > +       ring->effective_size = ring->size;
> > +       if (IS_I830(ring->dev))
> > +               ring->effective_size -= 128;
> > +
> > +       ring->map.offset = start;
> > +       ring->map.size = size;
> > +       ring->map.type = 0;
> > +       ring->map.flags = 0;
> > +       ring->map.mtrr = 0;
> ...
> 
> where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
> ring->effective_size tweak even stripped original comment:
> 
> # original version from intel_init_ring_buffer():
>         /* Workaround an erratum on the i830 which causes a hang if
>          * the TAIL pointer points to within the last 2 cachelines
>          * of the buffer.
>          */
>         ring->effective_size = ring->size;
>         if (IS_I830(ring->dev))
>                 ring->effective_size -= 128;
> 
> ...
> 
> 
> The line marked "here resets" resets all the fields, and maybe it's not a good
> idea to re-initialize them all afterwards (missing some as this thread show),
> or at least if it is really needed, share initialization code between
> intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> 
> >From the outside it looks like the offending patch was done as a quick
> fix in a hurry (lots of copy-paste), and maybe it would be better to
> re-do it properly...

Silence... ?

I read UMS is still ignored, because e.g. that uninitialized
ring->irq_lock which I've wrote about above is for sure used e.g. in
gen6_render_ring_get_irq() added to ring vtable in
intel_render_ring_init_dri().

And also is copy-pasting, instead of properly structuring things, ok?


Why not revert what caused trouble and introduced other subtle bugs, and
redo things properly in the first place?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 12:08                           ` Kirill Smelkov
@ 2011-08-09 14:00                             ` Vasily Khoruzhick
  -1 siblings, 0 replies; 84+ messages in thread
From: Vasily Khoruzhick @ 2011-08-09 14:00 UTC (permalink / raw)
  To: intel-gfx
  Cc: Kirill Smelkov, Keith Packard, Rafael J. Wysocki, Herbert Xu,
	Luke-Jr, LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > Keith,
> > > 
> > > first of all thanks for your prompt reply. Then...
> > > 
> > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> 
wrote:
> > > > > And now after v3.0 is out, I've tested it again, and yes, like it
> > > > > was broken on v3.0-rc5, it is (now even more) broken on v3.0 --
> > > > > after first
> > > > 
> > > > > bad io access the system freezes completely:
> > > > I looked at this when I first saw it (a couple of weeks ago), and I
> > > > couldn't see any obvious reason this patch would cause this
> > > > particular problem. I didn't want to revert the patch at that point
> > > > as I feared it would cause other subtle problems. Given that you've
> > > > got a work-around, it seemed best to just push this off past 3.0.
> > > 
> > > What kind of a workaround are you talking about? Sorry, to me it all
> > > looked like "UMS is being ignored forever". Anyway, let's move on to
> > > try to solve the issue.
> > > 
> > > > Given the failing address passed to ioread32, this seems like it's
> > > > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is
> > > > 0x21, which is an offset in 32-bit units within the hardware status
> > > > page. If the status_page.page_addr value was zero, then the computed
> > > > address would end up being 0x84.
> > > > 
> > > > And, it looks like status_page.page_addr *will* end up being zero as
> > > > a result of the patch in question. The patch resets the entire ring
> > > > structure contents back to the initial values, which includes
> > > > smashing the status_page structure to zero, clearing the value of
> > > > status_page.page_addr set in i915_init_phys_hws.
> > > > 
> > > > Here's an untested patch which moves the initialization of
> > > > status_page.page_addr into intel_render_ring_init_dri. I note that
> > > > intel_init_render_ring_buffer *already* has the setting of the
> > > > status_page.page_addr value, and so I've removed the setting of
> > > > status_page.page_addr from i915_init_phys_hws.
> > > > 
> > > > I suspect we could remove the memset from
> > > > intel_init_render_ring_buffer; it seems entirely superfluous given
> > > > the memset in i915_init_phys_hws.
> > > > 
> > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> > > > 
> > > >  intel_render_ring_init_dri
> > > > 
> > > > Physically-addressed hardware status pages are initialized early in
> > > > the driver load process by i915_init_phys_hws. For UMS environments,
> > > > the ring structure is not initialized until the X server starts. At
> > > > that point, the entire ring structure is re-initialized with all new
> > > > values. Any values set in the ring structure (including
> > > > ring->status_page.page_addr) will be lost when the ring is
> > > > re-initialized.
> > > > 
> > > > This patch moves the initialization of the status_page.page_addr
> > > > value to intel_render_ring_init_dri.
> > > > 
> > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > ---
> > > > 
> > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device
> > > > *dev)
> > > > 
> > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > >  {
> > > >  
> > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > 
> > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > 
> > > >  	/* Program Hardware Status Page */
> > > >  	dev_priv->status_page_dmah =
> > > > 
> > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device
> > > > *dev)
> > > > 
> > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > >  		return -ENOMEM;
> > > >  	
> > > >  	}
> > > > 
> > > > -	ring->status_page.page_addr =
> > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > 
> > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > +	memset_io((void __force __iomem
> > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > 
> > > >  	i915_write_hws_pga(dev);
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > drm_device *dev, u64 start, u32 size)
> > > > 
> > > >  		ring->get_seqno = pc_render_get_seqno;
> > > >  	
> > > >  	}
> > > > 
> > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > > > +
> > > > 
> > > >  	ring->dev = dev;
> > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > 
> > > I can't tell whether this is correct, because intel gfx driver is
> > > unknown to me, but from the first glance your description sounds
> > > reasonable.
> > > 
> > > I'm out of office till ~ next week's tuesday, and on return I'll try
> > > to test it on the hardware in question.
> > 
> > Keith, thanks again for the patch. As promised I've tested it on the
> > hardware in question and yes, bad_access is gone and X seems to work,
> > so thank you, but...
> > 
> > 
> > I see there are more such bugs in introduced-in-guilty-patch
> > intel_render_ring_init_dri(). For example ring->irq_queue is
> > left uninitialized and also ring->irq_lock etc...
> > 
> > 
> > I'm X newbie, so if here is something stupid X-wise, please don't
> > beat me too hard, but to me the gist of the problem is the original
> > patch, where Chris does
> > 
> > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > drm_device *dev)
> > > 
> > >         return intel_init_ring_buffer(dev, ring);
> > >  
> > >  }
> > > 
> > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> > > size) +{
> > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > +
> > > +       *ring = render_ring;
> > > 
> >           ^^^^^^^^^^^^^^^^^^^
> >           here resets
> > > 
> > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > +               ring->add_request = gen6_add_request;
> > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > +       } else if (IS_GEN5(dev)) {
> > > +               ring->add_request = pc_render_add_request;
> > > +               ring->get_seqno = pc_render_get_seqno;
> > > +       }
> > 
> > and then the rest of the `ring` is initialized seemingly copy-pasted
> > 
> > from intel_init_ring_buffer():
> > > +       ring->dev = dev;
> > > +       INIT_LIST_HEAD(&ring->active_list);
> > > +       INIT_LIST_HEAD(&ring->request_list);
> > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > +
> > > +       ring->size = size;
> > > +       ring->effective_size = ring->size;
> > > +       if (IS_I830(ring->dev))
> > > +               ring->effective_size -= 128;
> > > +
> > > +       ring->map.offset = start;
> > > +       ring->map.size = size;
> > > +       ring->map.type = 0;
> > > +       ring->map.flags = 0;
> > > +       ring->map.mtrr = 0;
> > 
> > ...
> > 
> > where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
> > ring->effective_size tweak even stripped original comment:
> > 
> > # original version from intel_init_ring_buffer():
> >         /* Workaround an erratum on the i830 which causes a hang if
> >         
> >          * the TAIL pointer points to within the last 2 cachelines
> >          * of the buffer.
> >          */
> >         
> >         ring->effective_size = ring->size;
> >         if (IS_I830(ring->dev))
> >         
> >                 ring->effective_size -= 128;
> > 
> > ...
> > 
> > 
> > The line marked "here resets" resets all the fields, and maybe it's not a
> > good idea to re-initialize them all afterwards (missing some as this
> > thread show), or at least if it is really needed, share initialization
> > code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > 
> > >From the outside it looks like the offending patch was done as a quick
> > 
> > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > re-do it properly...
> 
> Silence... ?
> 
> I read UMS is still ignored, because e.g. that uninitialized
> ring->irq_lock which I've wrote about above is for sure used e.g. in
> gen6_render_ring_get_irq() added to ring vtable in
> intel_render_ring_init_dri().

I really doubt that UMS supports gen6 hardware.

Regards
Vasily

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 14:00                             ` Vasily Khoruzhick
  0 siblings, 0 replies; 84+ messages in thread
From: Vasily Khoruzhick @ 2011-08-09 14:00 UTC (permalink / raw)
  To: intel-gfx
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > Keith,
> > > 
> > > first of all thanks for your prompt reply. Then...
> > > 
> > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> 
wrote:
> > > > > And now after v3.0 is out, I've tested it again, and yes, like it
> > > > > was broken on v3.0-rc5, it is (now even more) broken on v3.0 --
> > > > > after first
> > > > 
> > > > > bad io access the system freezes completely:
> > > > I looked at this when I first saw it (a couple of weeks ago), and I
> > > > couldn't see any obvious reason this patch would cause this
> > > > particular problem. I didn't want to revert the patch at that point
> > > > as I feared it would cause other subtle problems. Given that you've
> > > > got a work-around, it seemed best to just push this off past 3.0.
> > > 
> > > What kind of a workaround are you talking about? Sorry, to me it all
> > > looked like "UMS is being ignored forever". Anyway, let's move on to
> > > try to solve the issue.
> > > 
> > > > Given the failing address passed to ioread32, this seems like it's
> > > > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is
> > > > 0x21, which is an offset in 32-bit units within the hardware status
> > > > page. If the status_page.page_addr value was zero, then the computed
> > > > address would end up being 0x84.
> > > > 
> > > > And, it looks like status_page.page_addr *will* end up being zero as
> > > > a result of the patch in question. The patch resets the entire ring
> > > > structure contents back to the initial values, which includes
> > > > smashing the status_page structure to zero, clearing the value of
> > > > status_page.page_addr set in i915_init_phys_hws.
> > > > 
> > > > Here's an untested patch which moves the initialization of
> > > > status_page.page_addr into intel_render_ring_init_dri. I note that
> > > > intel_init_render_ring_buffer *already* has the setting of the
> > > > status_page.page_addr value, and so I've removed the setting of
> > > > status_page.page_addr from i915_init_phys_hws.
> > > > 
> > > > I suspect we could remove the memset from
> > > > intel_init_render_ring_buffer; it seems entirely superfluous given
> > > > the memset in i915_init_phys_hws.
> > > > 
> > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> > > > 
> > > >  intel_render_ring_init_dri
> > > > 
> > > > Physically-addressed hardware status pages are initialized early in
> > > > the driver load process by i915_init_phys_hws. For UMS environments,
> > > > the ring structure is not initialized until the X server starts. At
> > > > that point, the entire ring structure is re-initialized with all new
> > > > values. Any values set in the ring structure (including
> > > > ring->status_page.page_addr) will be lost when the ring is
> > > > re-initialized.
> > > > 
> > > > This patch moves the initialization of the status_page.page_addr
> > > > value to intel_render_ring_init_dri.
> > > > 
> > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > ---
> > > > 
> > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device
> > > > *dev)
> > > > 
> > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > >  {
> > > >  
> > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > 
> > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > 
> > > >  	/* Program Hardware Status Page */
> > > >  	dev_priv->status_page_dmah =
> > > > 
> > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device
> > > > *dev)
> > > > 
> > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > >  		return -ENOMEM;
> > > >  	
> > > >  	}
> > > > 
> > > > -	ring->status_page.page_addr =
> > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > 
> > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > +	memset_io((void __force __iomem
> > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > 
> > > >  	i915_write_hws_pga(dev);
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > drm_device *dev, u64 start, u32 size)
> > > > 
> > > >  		ring->get_seqno = pc_render_get_seqno;
> > > >  	
> > > >  	}
> > > > 
> > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > > > +
> > > > 
> > > >  	ring->dev = dev;
> > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > 
> > > I can't tell whether this is correct, because intel gfx driver is
> > > unknown to me, but from the first glance your description sounds
> > > reasonable.
> > > 
> > > I'm out of office till ~ next week's tuesday, and on return I'll try
> > > to test it on the hardware in question.
> > 
> > Keith, thanks again for the patch. As promised I've tested it on the
> > hardware in question and yes, bad_access is gone and X seems to work,
> > so thank you, but...
> > 
> > 
> > I see there are more such bugs in introduced-in-guilty-patch
> > intel_render_ring_init_dri(). For example ring->irq_queue is
> > left uninitialized and also ring->irq_lock etc...
> > 
> > 
> > I'm X newbie, so if here is something stupid X-wise, please don't
> > beat me too hard, but to me the gist of the problem is the original
> > patch, where Chris does
> > 
> > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > drm_device *dev)
> > > 
> > >         return intel_init_ring_buffer(dev, ring);
> > >  
> > >  }
> > > 
> > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> > > size) +{
> > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > +
> > > +       *ring = render_ring;
> > > 
> >           ^^^^^^^^^^^^^^^^^^^
> >           here resets
> > > 
> > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > +               ring->add_request = gen6_add_request;
> > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > +       } else if (IS_GEN5(dev)) {
> > > +               ring->add_request = pc_render_add_request;
> > > +               ring->get_seqno = pc_render_get_seqno;
> > > +       }
> > 
> > and then the rest of the `ring` is initialized seemingly copy-pasted
> > 
> > from intel_init_ring_buffer():
> > > +       ring->dev = dev;
> > > +       INIT_LIST_HEAD(&ring->active_list);
> > > +       INIT_LIST_HEAD(&ring->request_list);
> > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > +
> > > +       ring->size = size;
> > > +       ring->effective_size = ring->size;
> > > +       if (IS_I830(ring->dev))
> > > +               ring->effective_size -= 128;
> > > +
> > > +       ring->map.offset = start;
> > > +       ring->map.size = size;
> > > +       ring->map.type = 0;
> > > +       ring->map.flags = 0;
> > > +       ring->map.mtrr = 0;
> > 
> > ...
> > 
> > where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
> > ring->effective_size tweak even stripped original comment:
> > 
> > # original version from intel_init_ring_buffer():
> >         /* Workaround an erratum on the i830 which causes a hang if
> >         
> >          * the TAIL pointer points to within the last 2 cachelines
> >          * of the buffer.
> >          */
> >         
> >         ring->effective_size = ring->size;
> >         if (IS_I830(ring->dev))
> >         
> >                 ring->effective_size -= 128;
> > 
> > ...
> > 
> > 
> > The line marked "here resets" resets all the fields, and maybe it's not a
> > good idea to re-initialize them all afterwards (missing some as this
> > thread show), or at least if it is really needed, share initialization
> > code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > 
> > >From the outside it looks like the offending patch was done as a quick
> > 
> > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > re-do it properly...
> 
> Silence... ?
> 
> I read UMS is still ignored, because e.g. that uninitialized
> ring->irq_lock which I've wrote about above is for sure used e.g. in
> gen6_render_ring_get_irq() added to ring vtable in
> intel_render_ring_init_dri().

I really doubt that UMS supports gen6 hardware.

Regards
Vasily

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 14:00                             ` Vasily Khoruzhick
@ 2011-08-09 14:47                               ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 14:47 UTC (permalink / raw)
  To: Vasily Khoruzhick
  Cc: intel-gfx, Keith Packard, Rafael J. Wysocki, Herbert Xu, Luke-Jr,
	LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > Keith,
> > > > 
> > > > first of all thanks for your prompt reply. Then...
> > > > 
> > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> 
> wrote:
> > > > > > And now after v3.0 is out, I've tested it again, and yes, like it
> > > > > > was broken on v3.0-rc5, it is (now even more) broken on v3.0 --
> > > > > > after first
> > > > > 
> > > > > > bad io access the system freezes completely:
> > > > > I looked at this when I first saw it (a couple of weeks ago), and I
> > > > > couldn't see any obvious reason this patch would cause this
> > > > > particular problem. I didn't want to revert the patch at that point
> > > > > as I feared it would cause other subtle problems. Given that you've
> > > > > got a work-around, it seemed best to just push this off past 3.0.
> > > > 
> > > > What kind of a workaround are you talking about? Sorry, to me it all
> > > > looked like "UMS is being ignored forever". Anyway, let's move on to
> > > > try to solve the issue.
> > > > 
> > > > > Given the failing address passed to ioread32, this seems like it's
> > > > > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is
> > > > > 0x21, which is an offset in 32-bit units within the hardware status
> > > > > page. If the status_page.page_addr value was zero, then the computed
> > > > > address would end up being 0x84.
> > > > > 
> > > > > And, it looks like status_page.page_addr *will* end up being zero as
> > > > > a result of the patch in question. The patch resets the entire ring
> > > > > structure contents back to the initial values, which includes
> > > > > smashing the status_page structure to zero, clearing the value of
> > > > > status_page.page_addr set in i915_init_phys_hws.
> > > > > 
> > > > > Here's an untested patch which moves the initialization of
> > > > > status_page.page_addr into intel_render_ring_init_dri. I note that
> > > > > intel_init_render_ring_buffer *already* has the setting of the
> > > > > status_page.page_addr value, and so I've removed the setting of
> > > > > status_page.page_addr from i915_init_phys_hws.
> > > > > 
> > > > > I suspect we could remove the memset from
> > > > > intel_init_render_ring_buffer; it seems entirely superfluous given
> > > > > the memset in i915_init_phys_hws.
> > > > > 
> > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> > > > > 
> > > > >  intel_render_ring_init_dri
> > > > > 
> > > > > Physically-addressed hardware status pages are initialized early in
> > > > > the driver load process by i915_init_phys_hws. For UMS environments,
> > > > > the ring structure is not initialized until the X server starts. At
> > > > > that point, the entire ring structure is re-initialized with all new
> > > > > values. Any values set in the ring structure (including
> > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > re-initialized.
> > > > > 
> > > > > This patch moves the initialization of the status_page.page_addr
> > > > > value to intel_render_ring_init_dri.
> > > > > 
> > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > ---
> > > > > 
> > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device
> > > > > *dev)
> > > > > 
> > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > >  {
> > > > >  
> > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > 
> > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > 
> > > > >  	/* Program Hardware Status Page */
> > > > >  	dev_priv->status_page_dmah =
> > > > > 
> > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device
> > > > > *dev)
> > > > > 
> > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > >  		return -ENOMEM;
> > > > >  	
> > > > >  	}
> > > > > 
> > > > > -	ring->status_page.page_addr =
> > > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > > 
> > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > +	memset_io((void __force __iomem
> > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > 
> > > > >  	i915_write_hws_pga(dev);
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > > 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > drm_device *dev, u64 start, u32 size)
> > > > > 
> > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > >  	
> > > > >  	}
> > > > > 
> > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > > > > +
> > > > > 
> > > > >  	ring->dev = dev;
> > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > 
> > > > I can't tell whether this is correct, because intel gfx driver is
> > > > unknown to me, but from the first glance your description sounds
> > > > reasonable.
> > > > 
> > > > I'm out of office till ~ next week's tuesday, and on return I'll try
> > > > to test it on the hardware in question.
> > > 
> > > Keith, thanks again for the patch. As promised I've tested it on the
> > > hardware in question and yes, bad_access is gone and X seems to work,
> > > so thank you, but...
> > > 
> > > 
> > > I see there are more such bugs in introduced-in-guilty-patch
> > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > left uninitialized and also ring->irq_lock etc...
> > > 
> > > 
> > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > beat me too hard, but to me the gist of the problem is the original
> > > patch, where Chris does
> > > 
> > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > > 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > drm_device *dev)
> > > > 
> > > >         return intel_init_ring_buffer(dev, ring);
> > > >  
> > > >  }
> > > > 
> > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> > > > size) +{
> > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > +
> > > > +       *ring = render_ring;
> > > > 
> > >           ^^^^^^^^^^^^^^^^^^^
> > >           here resets
> > > > 
> > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > +               ring->add_request = gen6_add_request;
> > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > +       } else if (IS_GEN5(dev)) {
> > > > +               ring->add_request = pc_render_add_request;
> > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > +       }
> > > 
> > > and then the rest of the `ring` is initialized seemingly copy-pasted
> > > 
> > > from intel_init_ring_buffer():
> > > > +       ring->dev = dev;
> > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > +
> > > > +       ring->size = size;
> > > > +       ring->effective_size = ring->size;
> > > > +       if (IS_I830(ring->dev))
> > > > +               ring->effective_size -= 128;
> > > > +
> > > > +       ring->map.offset = start;
> > > > +       ring->map.size = size;
> > > > +       ring->map.type = 0;
> > > > +       ring->map.flags = 0;
> > > > +       ring->map.mtrr = 0;
> > > 
> > > ...
> > > 
> > > where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
> > > ring->effective_size tweak even stripped original comment:
> > > 
> > > # original version from intel_init_ring_buffer():
> > >         /* Workaround an erratum on the i830 which causes a hang if
> > >         
> > >          * the TAIL pointer points to within the last 2 cachelines
> > >          * of the buffer.
> > >          */
> > >         
> > >         ring->effective_size = ring->size;
> > >         if (IS_I830(ring->dev))
> > >         
> > >                 ring->effective_size -= 128;
> > > 
> > > ...
> > > 
> > > 
> > > The line marked "here resets" resets all the fields, and maybe it's not a
> > > good idea to re-initialize them all afterwards (missing some as this
> > > thread show), or at least if it is really needed, share initialization
> > > code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > > 
> > > >From the outside it looks like the offending patch was done as a quick
> > > 
> > > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > > re-do it properly...
> > 
> > Silence... ?
> > 
> > I read UMS is still ignored, because e.g. that uninitialized
> > ring->irq_lock which I've wrote about above is for sure used e.g. in
> > gen6_render_ring_get_irq() added to ring vtable in
> > intel_render_ring_init_dri().
> 
> I really doubt that UMS supports gen6 hardware.

Then why it is there in intel_render_ring_init_dri():

    int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
    {
    	drm_i915_private_t *dev_priv = dev->dev_private;
    	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
    
    	*ring = render_ring;
    	if (INTEL_INFO(dev)->gen >= 6) {
    		ring->add_request = gen6_add_request;
    		ring->irq_get = gen6_render_ring_get_irq;
    		ring->irq_put = gen6_render_ring_put_irq;
    	} else if (IS_GEN5(dev)) {
    		ring->add_request = pc_render_add_request;
    		ring->get_seqno = pc_render_get_seqno;
    	}


?


Added by the same guilty commit e8616b6c I'm talking about.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 14:47                               ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 14:47 UTC (permalink / raw)
  To: Vasily Khoruzhick
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > Keith,
> > > > 
> > > > first of all thanks for your prompt reply. Then...
> > > > 
> > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov <kirr@mns.spb.ru> 
> wrote:
> > > > > > And now after v3.0 is out, I've tested it again, and yes, like it
> > > > > > was broken on v3.0-rc5, it is (now even more) broken on v3.0 --
> > > > > > after first
> > > > > 
> > > > > > bad io access the system freezes completely:
> > > > > I looked at this when I first saw it (a couple of weeks ago), and I
> > > > > couldn't see any obvious reason this patch would cause this
> > > > > particular problem. I didn't want to revert the patch at that point
> > > > > as I feared it would cause other subtle problems. Given that you've
> > > > > got a work-around, it seemed best to just push this off past 3.0.
> > > > 
> > > > What kind of a workaround are you talking about? Sorry, to me it all
> > > > looked like "UMS is being ignored forever". Anyway, let's move on to
> > > > try to solve the issue.
> > > > 
> > > > > Given the failing address passed to ioread32, this seems like it's
> > > > > probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is
> > > > > 0x21, which is an offset in 32-bit units within the hardware status
> > > > > page. If the status_page.page_addr value was zero, then the computed
> > > > > address would end up being 0x84.
> > > > > 
> > > > > And, it looks like status_page.page_addr *will* end up being zero as
> > > > > a result of the patch in question. The patch resets the entire ring
> > > > > structure contents back to the initial values, which includes
> > > > > smashing the status_page structure to zero, clearing the value of
> > > > > status_page.page_addr set in i915_init_phys_hws.
> > > > > 
> > > > > Here's an untested patch which moves the initialization of
> > > > > status_page.page_addr into intel_render_ring_init_dri. I note that
> > > > > intel_init_render_ring_buffer *already* has the setting of the
> > > > > status_page.page_addr value, and so I've removed the setting of
> > > > > status_page.page_addr from i915_init_phys_hws.
> > > > > 
> > > > > I suspect we could remove the memset from
> > > > > intel_init_render_ring_buffer; it seems entirely superfluous given
> > > > > the memset in i915_init_phys_hws.
> > > > > 
> > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
> > > > > 
> > > > >  intel_render_ring_init_dri
> > > > > 
> > > > > Physically-addressed hardware status pages are initialized early in
> > > > > the driver load process by i915_init_phys_hws. For UMS environments,
> > > > > the ring structure is not initialized until the X server starts. At
> > > > > that point, the entire ring structure is re-initialized with all new
> > > > > values. Any values set in the ring structure (including
> > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > re-initialized.
> > > > > 
> > > > > This patch moves the initialization of the status_page.page_addr
> > > > > value to intel_render_ring_init_dri.
> > > > > 
> > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > ---
> > > > > 
> > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device
> > > > > *dev)
> > > > > 
> > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > >  {
> > > > >  
> > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > 
> > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > 
> > > > >  	/* Program Hardware Status Page */
> > > > >  	dev_priv->status_page_dmah =
> > > > > 
> > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device
> > > > > *dev)
> > > > > 
> > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > >  		return -ENOMEM;
> > > > >  	
> > > > >  	}
> > > > > 
> > > > > -	ring->status_page.page_addr =
> > > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > > 
> > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > +	memset_io((void __force __iomem
> > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > 
> > > > >  	i915_write_hws_pga(dev);
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > > 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > drm_device *dev, u64 start, u32 size)
> > > > > 
> > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > >  	
> > > > >  	}
> > > > > 
> > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > +		ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> > > > > +
> > > > > 
> > > > >  	ring->dev = dev;
> > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > 
> > > > I can't tell whether this is correct, because intel gfx driver is
> > > > unknown to me, but from the first glance your description sounds
> > > > reasonable.
> > > > 
> > > > I'm out of office till ~ next week's tuesday, and on return I'll try
> > > > to test it on the hardware in question.
> > > 
> > > Keith, thanks again for the patch. As promised I've tested it on the
> > > hardware in question and yes, bad_access is gone and X seems to work,
> > > so thank you, but...
> > > 
> > > 
> > > I see there are more such bugs in introduced-in-guilty-patch
> > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > left uninitialized and also ring->irq_lock etc...
> > > 
> > > 
> > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > beat me too hard, but to me the gist of the problem is the original
> > > patch, where Chris does
> > > 
> > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > > 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > drm_device *dev)
> > > > 
> > > >         return intel_init_ring_buffer(dev, ring);
> > > >  
> > > >  }
> > > > 
> > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> > > > size) +{
> > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > +
> > > > +       *ring = render_ring;
> > > > 
> > >           ^^^^^^^^^^^^^^^^^^^
> > >           here resets
> > > > 
> > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > +               ring->add_request = gen6_add_request;
> > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > +       } else if (IS_GEN5(dev)) {
> > > > +               ring->add_request = pc_render_add_request;
> > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > +       }
> > > 
> > > and then the rest of the `ring` is initialized seemingly copy-pasted
> > > 
> > > from intel_init_ring_buffer():
> > > > +       ring->dev = dev;
> > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > +
> > > > +       ring->size = size;
> > > > +       ring->effective_size = ring->size;
> > > > +       if (IS_I830(ring->dev))
> > > > +               ring->effective_size -= 128;
> > > > +
> > > > +       ring->map.offset = start;
> > > > +       ring->map.size = size;
> > > > +       ring->map.type = 0;
> > > > +       ring->map.flags = 0;
> > > > +       ring->map.mtrr = 0;
> > > 
> > > ...
> > > 
> > > where both 3 chunks go almost exactly from intel_init_ring_buffer(), and
> > > ring->effective_size tweak even stripped original comment:
> > > 
> > > # original version from intel_init_ring_buffer():
> > >         /* Workaround an erratum on the i830 which causes a hang if
> > >         
> > >          * the TAIL pointer points to within the last 2 cachelines
> > >          * of the buffer.
> > >          */
> > >         
> > >         ring->effective_size = ring->size;
> > >         if (IS_I830(ring->dev))
> > >         
> > >                 ring->effective_size -= 128;
> > > 
> > > ...
> > > 
> > > 
> > > The line marked "here resets" resets all the fields, and maybe it's not a
> > > good idea to re-initialize them all afterwards (missing some as this
> > > thread show), or at least if it is really needed, share initialization
> > > code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > > 
> > > >From the outside it looks like the offending patch was done as a quick
> > > 
> > > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > > re-do it properly...
> > 
> > Silence... ?
> > 
> > I read UMS is still ignored, because e.g. that uninitialized
> > ring->irq_lock which I've wrote about above is for sure used e.g. in
> > gen6_render_ring_get_irq() added to ring vtable in
> > intel_render_ring_init_dri().
> 
> I really doubt that UMS supports gen6 hardware.

Then why it is there in intel_render_ring_init_dri():

    int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
    {
    	drm_i915_private_t *dev_priv = dev->dev_private;
    	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
    
    	*ring = render_ring;
    	if (INTEL_INFO(dev)->gen >= 6) {
    		ring->add_request = gen6_add_request;
    		ring->irq_get = gen6_render_ring_get_irq;
    		ring->irq_put = gen6_render_ring_put_irq;
    	} else if (IS_GEN5(dev)) {
    		ring->add_request = pc_render_add_request;
    		ring->get_seqno = pc_render_get_seqno;
    	}


?


Added by the same guilty commit e8616b6c I'm talking about.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 14:47                               ` Kirill Smelkov
@ 2011-08-09 15:09                                 ` Vasily Khoruzhick
  -1 siblings, 0 replies; 84+ messages in thread
From: Vasily Khoruzhick @ 2011-08-09 15:09 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: intel-gfx, Keith Packard, Rafael J. Wysocki, Herbert Xu, Luke-Jr,
	LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > Keith,
> > > > > 
> > > > > first of all thanks for your prompt reply. Then...
> > > > > 
> > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > <kirr@mns.spb.ru>
> > 
> > wrote:
> > > > > > > And now after v3.0 is out, I've tested it again, and yes, like
> > > > > > > it was broken on v3.0-rc5, it is (now even more) broken on
> > > > > > > v3.0 -- after first
> > > > > > 
> > > > > > > bad io access the system freezes completely:
> > > > > > I looked at this when I first saw it (a couple of weeks ago), and
> > > > > > I couldn't see any obvious reason this patch would cause this
> > > > > > particular problem. I didn't want to revert the patch at that
> > > > > > point as I feared it would cause other subtle problems. Given
> > > > > > that you've got a work-around, it seemed best to just push this
> > > > > > off past 3.0.
> > > > > 
> > > > > What kind of a workaround are you talking about? Sorry, to me it
> > > > > all looked like "UMS is being ignored forever". Anyway, let's move
> > > > > on to try to solve the issue.
> > > > > 
> > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > units within the hardware status page. If the
> > > > > > status_page.page_addr value was zero, then the computed address
> > > > > > would end up being 0x84.
> > > > > > 
> > > > > > And, it looks like status_page.page_addr *will* end up being zero
> > > > > > as a result of the patch in question. The patch resets the
> > > > > > entire ring structure contents back to the initial values, which
> > > > > > includes smashing the status_page structure to zero, clearing
> > > > > > the value of status_page.page_addr set in i915_init_phys_hws.
> > > > > > 
> > > > > > Here's an untested patch which moves the initialization of
> > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > that intel_init_render_ring_buffer *already* has the setting of
> > > > > > the status_page.page_addr value, and so I've removed the setting
> > > > > > of status_page.page_addr from i915_init_phys_hws.
> > > > > > 
> > > > > > I suspect we could remove the memset from
> > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > given the memset in i915_init_phys_hws.
> > > > > > 
> > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > address in
> > > > > > 
> > > > > >  intel_render_ring_init_dri
> > > > > > 
> > > > > > Physically-addressed hardware status pages are initialized early
> > > > > > in the driver load process by i915_init_phys_hws. For UMS
> > > > > > environments, the ring structure is not initialized until the X
> > > > > > server starts. At that point, the entire ring structure is
> > > > > > re-initialized with all new values. Any values set in the ring
> > > > > > structure (including
> > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > re-initialized.
> > > > > > 
> > > > > > This patch moves the initialization of the status_page.page_addr
> > > > > > value to intel_render_ring_init_dri.
> > > > > > 
> > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > ---
> > > > > > 
> > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > drm_device *dev)
> > > > > > 
> > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > >  {
> > > > > >  
> > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > 
> > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > 
> > > > > >  	/* Program Hardware Status Page */
> > > > > >  	dev_priv->status_page_dmah =
> > > > > > 
> > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > drm_device *dev)
> > > > > > 
> > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > >  		return -ENOMEM;
> > > > > >  	
> > > > > >  	}
> > > > > > 
> > > > > > -	ring->status_page.page_addr =
> > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > > > 
> > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > +	memset_io((void __force __iomem
> > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > 
> > > > > >  	i915_write_hws_pga(dev);
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > > > 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > 
> > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > >  	
> > > > > >  	}
> > > > > > 
> > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > +		ring->status_page.page_addr =
> > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > 
> > > > > >  	ring->dev = dev;
> > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > 
> > > > > I can't tell whether this is correct, because intel gfx driver is
> > > > > unknown to me, but from the first glance your description sounds
> > > > > reasonable.
> > > > > 
> > > > > I'm out of office till ~ next week's tuesday, and on return I'll
> > > > > try to test it on the hardware in question.
> > > > 
> > > > Keith, thanks again for the patch. As promised I've tested it on the
> > > > hardware in question and yes, bad_access is gone and X seems to work,
> > > > so thank you, but...
> > > > 
> > > > 
> > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > left uninitialized and also ring->irq_lock etc...
> > > > 
> > > > 
> > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > beat me too hard, but to me the gist of the problem is the original
> > > > patch, where Chris does
> > > > 
> > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > > > 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > drm_device *dev)
> > > > > 
> > > > >         return intel_init_ring_buffer(dev, ring);
> > > > >  
> > > > >  }
> > > > > 
> > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > > > > u32 size) +{
> > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > +
> > > > > +       *ring = render_ring;
> > > > > 
> > > >           ^^^^^^^^^^^^^^^^^^^
> > > >           here resets
> > > > > 
> > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > +               ring->add_request = gen6_add_request;
> > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > +       } else if (IS_GEN5(dev)) {
> > > > > +               ring->add_request = pc_render_add_request;
> > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > +       }
> > > > 
> > > > and then the rest of the `ring` is initialized seemingly copy-pasted
> > > > 
> > > > from intel_init_ring_buffer():
> > > > > +       ring->dev = dev;
> > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > +
> > > > > +       ring->size = size;
> > > > > +       ring->effective_size = ring->size;
> > > > > +       if (IS_I830(ring->dev))
> > > > > +               ring->effective_size -= 128;
> > > > > +
> > > > > +       ring->map.offset = start;
> > > > > +       ring->map.size = size;
> > > > > +       ring->map.type = 0;
> > > > > +       ring->map.flags = 0;
> > > > > +       ring->map.mtrr = 0;
> > > > 
> > > > ...
> > > > 
> > > > where both 3 chunks go almost exactly from intel_init_ring_buffer(),
> > > > and ring->effective_size tweak even stripped original comment:
> > > > 
> > > > # original version from intel_init_ring_buffer():
> > > >         /* Workaround an erratum on the i830 which causes a hang if
> > > >         
> > > >          * the TAIL pointer points to within the last 2 cachelines
> > > >          * of the buffer.
> > > >          */
> > > >         
> > > >         ring->effective_size = ring->size;
> > > >         if (IS_I830(ring->dev))
> > > >         
> > > >                 ring->effective_size -= 128;
> > > > 
> > > > ...
> > > > 
> > > > 
> > > > The line marked "here resets" resets all the fields, and maybe it's
> > > > not a good idea to re-initialize them all afterwards (missing some
> > > > as this thread show), or at least if it is really needed, share
> > > > initialization code between intel_render_ring_init_dri() and
> > > > intel_init_ring_buffer() ?
> > > > 
> > > > >From the outside it looks like the offending patch was done as a
> > > > >quick
> > > > 
> > > > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > > > re-do it properly...
> > > 
> > > Silence... ?
> > > 
> > > I read UMS is still ignored, because e.g. that uninitialized
> > > ring->irq_lock which I've wrote about above is for sure used e.g. in
> > > gen6_render_ring_get_irq() added to ring vtable in
> > > intel_render_ring_init_dri().
> > 
> > I really doubt that UMS supports gen6 hardware.
> 
> Then why it is there in intel_render_ring_init_dri():
> 
>     int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> size) {
>     	drm_i915_private_t *dev_priv = dev->dev_private;
>     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> 
>     	*ring = render_ring;
>     	if (INTEL_INFO(dev)->gen >= 6) {

This branch executes only when hw generation is 6 or newer.

>     		ring->add_request = gen6_add_request;
>     		ring->irq_get = gen6_render_ring_get_irq;
>     		ring->irq_put = gen6_render_ring_put_irq;
>     	} else if (IS_GEN5(dev)) {
>     		ring->add_request = pc_render_add_request;
>     		ring->get_seqno = pc_render_get_seqno;
>     	}

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 15:09                                 ` Vasily Khoruzhick
  0 siblings, 0 replies; 84+ messages in thread
From: Vasily Khoruzhick @ 2011-08-09 15:09 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > Keith,
> > > > > 
> > > > > first of all thanks for your prompt reply. Then...
> > > > > 
> > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > <kirr@mns.spb.ru>
> > 
> > wrote:
> > > > > > > And now after v3.0 is out, I've tested it again, and yes, like
> > > > > > > it was broken on v3.0-rc5, it is (now even more) broken on
> > > > > > > v3.0 -- after first
> > > > > > 
> > > > > > > bad io access the system freezes completely:
> > > > > > I looked at this when I first saw it (a couple of weeks ago), and
> > > > > > I couldn't see any obvious reason this patch would cause this
> > > > > > particular problem. I didn't want to revert the patch at that
> > > > > > point as I feared it would cause other subtle problems. Given
> > > > > > that you've got a work-around, it seemed best to just push this
> > > > > > off past 3.0.
> > > > > 
> > > > > What kind of a workaround are you talking about? Sorry, to me it
> > > > > all looked like "UMS is being ignored forever". Anyway, let's move
> > > > > on to try to solve the issue.
> > > > > 
> > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > units within the hardware status page. If the
> > > > > > status_page.page_addr value was zero, then the computed address
> > > > > > would end up being 0x84.
> > > > > > 
> > > > > > And, it looks like status_page.page_addr *will* end up being zero
> > > > > > as a result of the patch in question. The patch resets the
> > > > > > entire ring structure contents back to the initial values, which
> > > > > > includes smashing the status_page structure to zero, clearing
> > > > > > the value of status_page.page_addr set in i915_init_phys_hws.
> > > > > > 
> > > > > > Here's an untested patch which moves the initialization of
> > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > that intel_init_render_ring_buffer *already* has the setting of
> > > > > > the status_page.page_addr value, and so I've removed the setting
> > > > > > of status_page.page_addr from i915_init_phys_hws.
> > > > > > 
> > > > > > I suspect we could remove the memset from
> > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > given the memset in i915_init_phys_hws.
> > > > > > 
> > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > address in
> > > > > > 
> > > > > >  intel_render_ring_init_dri
> > > > > > 
> > > > > > Physically-addressed hardware status pages are initialized early
> > > > > > in the driver load process by i915_init_phys_hws. For UMS
> > > > > > environments, the ring structure is not initialized until the X
> > > > > > server starts. At that point, the entire ring structure is
> > > > > > re-initialized with all new values. Any values set in the ring
> > > > > > structure (including
> > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > re-initialized.
> > > > > > 
> > > > > > This patch moves the initialization of the status_page.page_addr
> > > > > > value to intel_render_ring_init_dri.
> > > > > > 
> > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > ---
> > > > > > 
> > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > drm_device *dev)
> > > > > > 
> > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > >  {
> > > > > >  
> > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > 
> > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > 
> > > > > >  	/* Program Hardware Status Page */
> > > > > >  	dev_priv->status_page_dmah =
> > > > > > 
> > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > drm_device *dev)
> > > > > > 
> > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > >  		return -ENOMEM;
> > > > > >  	
> > > > > >  	}
> > > > > > 
> > > > > > -	ring->status_page.page_addr =
> > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > > > 
> > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > +	memset_io((void __force __iomem
> > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > 
> > > > > >  	i915_write_hws_pga(dev);
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > > > 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > 
> > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > >  	
> > > > > >  	}
> > > > > > 
> > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > +		ring->status_page.page_addr =
> > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > 
> > > > > >  	ring->dev = dev;
> > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > 
> > > > > I can't tell whether this is correct, because intel gfx driver is
> > > > > unknown to me, but from the first glance your description sounds
> > > > > reasonable.
> > > > > 
> > > > > I'm out of office till ~ next week's tuesday, and on return I'll
> > > > > try to test it on the hardware in question.
> > > > 
> > > > Keith, thanks again for the patch. As promised I've tested it on the
> > > > hardware in question and yes, bad_access is gone and X seems to work,
> > > > so thank you, but...
> > > > 
> > > > 
> > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > left uninitialized and also ring->irq_lock etc...
> > > > 
> > > > 
> > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > beat me too hard, but to me the gist of the problem is the original
> > > > patch, where Chris does
> > > > 
> > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > > > 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > drm_device *dev)
> > > > > 
> > > > >         return intel_init_ring_buffer(dev, ring);
> > > > >  
> > > > >  }
> > > > > 
> > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > > > > u32 size) +{
> > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > +
> > > > > +       *ring = render_ring;
> > > > > 
> > > >           ^^^^^^^^^^^^^^^^^^^
> > > >           here resets
> > > > > 
> > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > +               ring->add_request = gen6_add_request;
> > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > +       } else if (IS_GEN5(dev)) {
> > > > > +               ring->add_request = pc_render_add_request;
> > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > +       }
> > > > 
> > > > and then the rest of the `ring` is initialized seemingly copy-pasted
> > > > 
> > > > from intel_init_ring_buffer():
> > > > > +       ring->dev = dev;
> > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > +
> > > > > +       ring->size = size;
> > > > > +       ring->effective_size = ring->size;
> > > > > +       if (IS_I830(ring->dev))
> > > > > +               ring->effective_size -= 128;
> > > > > +
> > > > > +       ring->map.offset = start;
> > > > > +       ring->map.size = size;
> > > > > +       ring->map.type = 0;
> > > > > +       ring->map.flags = 0;
> > > > > +       ring->map.mtrr = 0;
> > > > 
> > > > ...
> > > > 
> > > > where both 3 chunks go almost exactly from intel_init_ring_buffer(),
> > > > and ring->effective_size tweak even stripped original comment:
> > > > 
> > > > # original version from intel_init_ring_buffer():
> > > >         /* Workaround an erratum on the i830 which causes a hang if
> > > >         
> > > >          * the TAIL pointer points to within the last 2 cachelines
> > > >          * of the buffer.
> > > >          */
> > > >         
> > > >         ring->effective_size = ring->size;
> > > >         if (IS_I830(ring->dev))
> > > >         
> > > >                 ring->effective_size -= 128;
> > > > 
> > > > ...
> > > > 
> > > > 
> > > > The line marked "here resets" resets all the fields, and maybe it's
> > > > not a good idea to re-initialize them all afterwards (missing some
> > > > as this thread show), or at least if it is really needed, share
> > > > initialization code between intel_render_ring_init_dri() and
> > > > intel_init_ring_buffer() ?
> > > > 
> > > > >From the outside it looks like the offending patch was done as a
> > > > >quick
> > > > 
> > > > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > > > re-do it properly...
> > > 
> > > Silence... ?
> > > 
> > > I read UMS is still ignored, because e.g. that uninitialized
> > > ring->irq_lock which I've wrote about above is for sure used e.g. in
> > > gen6_render_ring_get_irq() added to ring vtable in
> > > intel_render_ring_init_dri().
> > 
> > I really doubt that UMS supports gen6 hardware.
> 
> Then why it is there in intel_render_ring_init_dri():
> 
>     int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> size) {
>     	drm_i915_private_t *dev_priv = dev->dev_private;
>     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> 
>     	*ring = render_ring;
>     	if (INTEL_INFO(dev)->gen >= 6) {

This branch executes only when hw generation is 6 or newer.

>     		ring->add_request = gen6_add_request;
>     		ring->irq_get = gen6_render_ring_get_irq;
>     		ring->irq_put = gen6_render_ring_put_irq;
>     	} else if (IS_GEN5(dev)) {
>     		ring->add_request = pc_render_add_request;
>     		ring->get_seqno = pc_render_get_seqno;
>     	}

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 15:09                                 ` Vasily Khoruzhick
@ 2011-08-09 15:34                                   ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 15:34 UTC (permalink / raw)
  To: Vasily Khoruzhick
  Cc: intel-gfx, Keith Packard, Rafael J. Wysocki, Herbert Xu, Luke-Jr,
	LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
> On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> > On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > > Keith,
> > > > > > 
> > > > > > first of all thanks for your prompt reply. Then...
> > > > > > 
> > > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > > <kirr@mns.spb.ru>
> > > 
> > > wrote:
> > > > > > > > And now after v3.0 is out, I've tested it again, and yes, like
> > > > > > > > it was broken on v3.0-rc5, it is (now even more) broken on
> > > > > > > > v3.0 -- after first
> > > > > > > 
> > > > > > > > bad io access the system freezes completely:
> > > > > > > I looked at this when I first saw it (a couple of weeks ago), and
> > > > > > > I couldn't see any obvious reason this patch would cause this
> > > > > > > particular problem. I didn't want to revert the patch at that
> > > > > > > point as I feared it would cause other subtle problems. Given
> > > > > > > that you've got a work-around, it seemed best to just push this
> > > > > > > off past 3.0.
> > > > > > 
> > > > > > What kind of a workaround are you talking about? Sorry, to me it
> > > > > > all looked like "UMS is being ignored forever". Anyway, let's move
> > > > > > on to try to solve the issue.
> > > > > > 
> > > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > > units within the hardware status page. If the
> > > > > > > status_page.page_addr value was zero, then the computed address
> > > > > > > would end up being 0x84.
> > > > > > > 
> > > > > > > And, it looks like status_page.page_addr *will* end up being zero
> > > > > > > as a result of the patch in question. The patch resets the
> > > > > > > entire ring structure contents back to the initial values, which
> > > > > > > includes smashing the status_page structure to zero, clearing
> > > > > > > the value of status_page.page_addr set in i915_init_phys_hws.
> > > > > > > 
> > > > > > > Here's an untested patch which moves the initialization of
> > > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > > that intel_init_render_ring_buffer *already* has the setting of
> > > > > > > the status_page.page_addr value, and so I've removed the setting
> > > > > > > of status_page.page_addr from i915_init_phys_hws.
> > > > > > > 
> > > > > > > I suspect we could remove the memset from
> > > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > > given the memset in i915_init_phys_hws.
> > > > > > > 
> > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > > address in
> > > > > > > 
> > > > > > >  intel_render_ring_init_dri
> > > > > > > 
> > > > > > > Physically-addressed hardware status pages are initialized early
> > > > > > > in the driver load process by i915_init_phys_hws. For UMS
> > > > > > > environments, the ring structure is not initialized until the X
> > > > > > > server starts. At that point, the entire ring structure is
> > > > > > > re-initialized with all new values. Any values set in the ring
> > > > > > > structure (including
> > > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > > re-initialized.
> > > > > > > 
> > > > > > > This patch moves the initialization of the status_page.page_addr
> > > > > > > value to intel_render_ring_init_dri.
> > > > > > > 
> > > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > > ---
> > > > > > > 
> > > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > > drm_device *dev)
> > > > > > > 
> > > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > > >  {
> > > > > > >  
> > > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > 
> > > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > > 
> > > > > > >  	/* Program Hardware Status Page */
> > > > > > >  	dev_priv->status_page_dmah =
> > > > > > > 
> > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > > drm_device *dev)
> > > > > > > 
> > > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > > >  		return -ENOMEM;
> > > > > > >  	
> > > > > > >  	}
> > > > > > > 
> > > > > > > -	ring->status_page.page_addr =
> > > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > > > > 
> > > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > > +	memset_io((void __force __iomem
> > > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > > 
> > > > > > >  	i915_write_hws_pga(dev);
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > > 
> > > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > > >  	
> > > > > > >  	}
> > > > > > > 
> > > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > > +		ring->status_page.page_addr =
> > > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > > 
> > > > > > >  	ring->dev = dev;
> > > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > > 
> > > > > > I can't tell whether this is correct, because intel gfx driver is
> > > > > > unknown to me, but from the first glance your description sounds
> > > > > > reasonable.
> > > > > > 
> > > > > > I'm out of office till ~ next week's tuesday, and on return I'll
> > > > > > try to test it on the hardware in question.
> > > > > 
> > > > > Keith, thanks again for the patch. As promised I've tested it on the
> > > > > hardware in question and yes, bad_access is gone and X seems to work,
> > > > > so thank you, but...
> > > > > 
> > > > > 
> > > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > > left uninitialized and also ring->irq_lock etc...
> > > > > 
> > > > > 
> > > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > > beat me too hard, but to me the gist of the problem is the original
> > > > > patch, where Chris does
> > > > > 
> > > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > > > > 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > > drm_device *dev)
> > > > > > 
> > > > > >         return intel_init_ring_buffer(dev, ring);
> > > > > >  
> > > > > >  }
> > > > > > 
> > > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > > > > > u32 size) +{
> > > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > > +
> > > > > > +       *ring = render_ring;
> > > > > > 
> > > > >           ^^^^^^^^^^^^^^^^^^^
> > > > >           here resets
> > > > > > 
> > > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > > +               ring->add_request = gen6_add_request;
> > > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > > +       } else if (IS_GEN5(dev)) {
> > > > > > +               ring->add_request = pc_render_add_request;
> > > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > > +       }
> > > > > 
> > > > > and then the rest of the `ring` is initialized seemingly copy-pasted
> > > > > 
> > > > > from intel_init_ring_buffer():
> > > > > > +       ring->dev = dev;
> > > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > > +
> > > > > > +       ring->size = size;
> > > > > > +       ring->effective_size = ring->size;
> > > > > > +       if (IS_I830(ring->dev))
> > > > > > +               ring->effective_size -= 128;
> > > > > > +
> > > > > > +       ring->map.offset = start;
> > > > > > +       ring->map.size = size;
> > > > > > +       ring->map.type = 0;
> > > > > > +       ring->map.flags = 0;
> > > > > > +       ring->map.mtrr = 0;
> > > > > 
> > > > > ...
> > > > > 
> > > > > where both 3 chunks go almost exactly from intel_init_ring_buffer(),
> > > > > and ring->effective_size tweak even stripped original comment:
> > > > > 
> > > > > # original version from intel_init_ring_buffer():
> > > > >         /* Workaround an erratum on the i830 which causes a hang if
> > > > >         
> > > > >          * the TAIL pointer points to within the last 2 cachelines
> > > > >          * of the buffer.
> > > > >          */
> > > > >         
> > > > >         ring->effective_size = ring->size;
> > > > >         if (IS_I830(ring->dev))
> > > > >         
> > > > >                 ring->effective_size -= 128;
> > > > > 
> > > > > ...
> > > > > 
> > > > > 
> > > > > The line marked "here resets" resets all the fields, and maybe it's
> > > > > not a good idea to re-initialize them all afterwards (missing some
> > > > > as this thread show), or at least if it is really needed, share
> > > > > initialization code between intel_render_ring_init_dri() and
> > > > > intel_init_ring_buffer() ?
> > > > > 
> > > > > >From the outside it looks like the offending patch was done as a
> > > > > >quick
> > > > > 
> > > > > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > > > > re-do it properly...
> > > > 
> > > > Silence... ?
> > > > 
> > > > I read UMS is still ignored, because e.g. that uninitialized
> > > > ring->irq_lock which I've wrote about above is for sure used e.g. in
> > > > gen6_render_ring_get_irq() added to ring vtable in
> > > > intel_render_ring_init_dri().
> > > 
> > > I really doubt that UMS supports gen6 hardware.
> > 
> > Then why it is there in intel_render_ring_init_dri():
> > 
> >     int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> > size) {
> >     	drm_i915_private_t *dev_priv = dev->dev_private;
> >     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > 
> >     	*ring = render_ring;
> >     	if (INTEL_INFO(dev)->gen >= 6) {
> 
> This branch executes only when hw generation is 6 or newer.

and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock
which is left uninitialized.

I don't understand what you were trying to say. How does it matter if
some branch executes only for such-and-such hardware, when this branch
contains bugs? Could you please clarify?


> >     		ring->add_request = gen6_add_request;
> >     		ring->irq_get = gen6_render_ring_get_irq;
> >     		ring->irq_put = gen6_render_ring_put_irq;
> >     	} else if (IS_GEN5(dev)) {
> >     		ring->add_request = pc_render_add_request;
> >     		ring->get_seqno = pc_render_get_seqno;
> >     	}

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 15:34                                   ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 15:34 UTC (permalink / raw)
  To: Vasily Khoruzhick
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
> On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> > On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > > Keith,
> > > > > > 
> > > > > > first of all thanks for your prompt reply. Then...
> > > > > > 
> > > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > > <kirr@mns.spb.ru>
> > > 
> > > wrote:
> > > > > > > > And now after v3.0 is out, I've tested it again, and yes, like
> > > > > > > > it was broken on v3.0-rc5, it is (now even more) broken on
> > > > > > > > v3.0 -- after first
> > > > > > > 
> > > > > > > > bad io access the system freezes completely:
> > > > > > > I looked at this when I first saw it (a couple of weeks ago), and
> > > > > > > I couldn't see any obvious reason this patch would cause this
> > > > > > > particular problem. I didn't want to revert the patch at that
> > > > > > > point as I feared it would cause other subtle problems. Given
> > > > > > > that you've got a work-around, it seemed best to just push this
> > > > > > > off past 3.0.
> > > > > > 
> > > > > > What kind of a workaround are you talking about? Sorry, to me it
> > > > > > all looked like "UMS is being ignored forever". Anyway, let's move
> > > > > > on to try to solve the issue.
> > > > > > 
> > > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > > units within the hardware status page. If the
> > > > > > > status_page.page_addr value was zero, then the computed address
> > > > > > > would end up being 0x84.
> > > > > > > 
> > > > > > > And, it looks like status_page.page_addr *will* end up being zero
> > > > > > > as a result of the patch in question. The patch resets the
> > > > > > > entire ring structure contents back to the initial values, which
> > > > > > > includes smashing the status_page structure to zero, clearing
> > > > > > > the value of status_page.page_addr set in i915_init_phys_hws.
> > > > > > > 
> > > > > > > Here's an untested patch which moves the initialization of
> > > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > > that intel_init_render_ring_buffer *already* has the setting of
> > > > > > > the status_page.page_addr value, and so I've removed the setting
> > > > > > > of status_page.page_addr from i915_init_phys_hws.
> > > > > > > 
> > > > > > > I suspect we could remove the memset from
> > > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > > given the memset in i915_init_phys_hws.
> > > > > > > 
> > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00
> > > > > > > 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > > address in
> > > > > > > 
> > > > > > >  intel_render_ring_init_dri
> > > > > > > 
> > > > > > > Physically-addressed hardware status pages are initialized early
> > > > > > > in the driver load process by i915_init_phys_hws. For UMS
> > > > > > > environments, the ring structure is not initialized until the X
> > > > > > > server starts. At that point, the entire ring structure is
> > > > > > > re-initialized with all new values. Any values set in the ring
> > > > > > > structure (including
> > > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > > re-initialized.
> > > > > > > 
> > > > > > > This patch moves the initialization of the status_page.page_addr
> > > > > > > value to intel_render_ring_init_dri.
> > > > > > > 
> > > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > > ---
> > > > > > > 
> > > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > > drm_device *dev)
> > > > > > > 
> > > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > > >  {
> > > > > > >  
> > > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > 
> > > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > > 
> > > > > > >  	/* Program Hardware Status Page */
> > > > > > >  	dev_priv->status_page_dmah =
> > > > > > > 
> > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > > drm_device *dev)
> > > > > > > 
> > > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > > >  		return -ENOMEM;
> > > > > > >  	
> > > > > > >  	}
> > > > > > > 
> > > > > > > -	ring->status_page.page_addr =
> > > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
> > > > > > > 
> > > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > > +	memset_io((void __force __iomem
> > > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > > 
> > > > > > >  	i915_write_hws_pga(dev);
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > > 
> > > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > > >  	
> > > > > > >  	}
> > > > > > > 
> > > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > > +		ring->status_page.page_addr =
> > > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > > 
> > > > > > >  	ring->dev = dev;
> > > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > > 
> > > > > > I can't tell whether this is correct, because intel gfx driver is
> > > > > > unknown to me, but from the first glance your description sounds
> > > > > > reasonable.
> > > > > > 
> > > > > > I'm out of office till ~ next week's tuesday, and on return I'll
> > > > > > try to test it on the hardware in question.
> > > > > 
> > > > > Keith, thanks again for the patch. As promised I've tested it on the
> > > > > hardware in question and yes, bad_access is gone and X seems to work,
> > > > > so thank you, but...
> > > > > 
> > > > > 
> > > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > > left uninitialized and also ring->irq_lock etc...
> > > > > 
> > > > > 
> > > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > > beat me too hard, but to me the gist of the problem is the original
> > > > > patch, where Chris does
> > > > > 
> > > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e
> > > > > > 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > > drm_device *dev)
> > > > > > 
> > > > > >         return intel_init_ring_buffer(dev, ring);
> > > > > >  
> > > > > >  }
> > > > > > 
> > > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > > > > > u32 size) +{
> > > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > > +
> > > > > > +       *ring = render_ring;
> > > > > > 
> > > > >           ^^^^^^^^^^^^^^^^^^^
> > > > >           here resets
> > > > > > 
> > > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > > +               ring->add_request = gen6_add_request;
> > > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > > +       } else if (IS_GEN5(dev)) {
> > > > > > +               ring->add_request = pc_render_add_request;
> > > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > > +       }
> > > > > 
> > > > > and then the rest of the `ring` is initialized seemingly copy-pasted
> > > > > 
> > > > > from intel_init_ring_buffer():
> > > > > > +       ring->dev = dev;
> > > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > > +
> > > > > > +       ring->size = size;
> > > > > > +       ring->effective_size = ring->size;
> > > > > > +       if (IS_I830(ring->dev))
> > > > > > +               ring->effective_size -= 128;
> > > > > > +
> > > > > > +       ring->map.offset = start;
> > > > > > +       ring->map.size = size;
> > > > > > +       ring->map.type = 0;
> > > > > > +       ring->map.flags = 0;
> > > > > > +       ring->map.mtrr = 0;
> > > > > 
> > > > > ...
> > > > > 
> > > > > where both 3 chunks go almost exactly from intel_init_ring_buffer(),
> > > > > and ring->effective_size tweak even stripped original comment:
> > > > > 
> > > > > # original version from intel_init_ring_buffer():
> > > > >         /* Workaround an erratum on the i830 which causes a hang if
> > > > >         
> > > > >          * the TAIL pointer points to within the last 2 cachelines
> > > > >          * of the buffer.
> > > > >          */
> > > > >         
> > > > >         ring->effective_size = ring->size;
> > > > >         if (IS_I830(ring->dev))
> > > > >         
> > > > >                 ring->effective_size -= 128;
> > > > > 
> > > > > ...
> > > > > 
> > > > > 
> > > > > The line marked "here resets" resets all the fields, and maybe it's
> > > > > not a good idea to re-initialize them all afterwards (missing some
> > > > > as this thread show), or at least if it is really needed, share
> > > > > initialization code between intel_render_ring_init_dri() and
> > > > > intel_init_ring_buffer() ?
> > > > > 
> > > > > >From the outside it looks like the offending patch was done as a
> > > > > >quick
> > > > > 
> > > > > fix in a hurry (lots of copy-paste), and maybe it would be better to
> > > > > re-do it properly...
> > > > 
> > > > Silence... ?
> > > > 
> > > > I read UMS is still ignored, because e.g. that uninitialized
> > > > ring->irq_lock which I've wrote about above is for sure used e.g. in
> > > > gen6_render_ring_get_irq() added to ring vtable in
> > > > intel_render_ring_init_dri().
> > > 
> > > I really doubt that UMS supports gen6 hardware.
> > 
> > Then why it is there in intel_render_ring_init_dri():
> > 
> >     int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
> > size) {
> >     	drm_i915_private_t *dev_priv = dev->dev_private;
> >     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > 
> >     	*ring = render_ring;
> >     	if (INTEL_INFO(dev)->gen >= 6) {
> 
> This branch executes only when hw generation is 6 or newer.

and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock
which is left uninitialized.

I don't understand what you were trying to say. How does it matter if
some branch executes only for such-and-such hardware, when this branch
contains bugs? Could you please clarify?


> >     		ring->add_request = gen6_add_request;
> >     		ring->irq_get = gen6_render_ring_get_irq;
> >     		ring->irq_put = gen6_render_ring_put_irq;
> >     	} else if (IS_GEN5(dev)) {
> >     		ring->add_request = pc_render_add_request;
> >     		ring->get_seqno = pc_render_get_seqno;
> >     	}

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 15:34                                   ` Kirill Smelkov
@ 2011-08-09 16:02                                     ` Vasily Khoruzhick
  -1 siblings, 0 replies; 84+ messages in thread
From: Vasily Khoruzhick @ 2011-08-09 16:02 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: intel-gfx, Keith Packard, Rafael J. Wysocki, Herbert Xu, Luke-Jr,
	LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tuesday 09 August 2011 18:34:46 Kirill Smelkov wrote:
> On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
> > On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> > > On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > > > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > > > Keith,
> > > > > > > 
> > > > > > > first of all thanks for your prompt reply. Then...
> > > > > > > 
> > > > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > > > <kirr@mns.spb.ru>
> > > > 
> > > > wrote:
> > > > > > > > > And now after v3.0 is out, I've tested it again, and yes,
> > > > > > > > > like it was broken on v3.0-rc5, it is (now even more)
> > > > > > > > > broken on v3.0 -- after first
> > > > > > > > 
> > > > > > > > > bad io access the system freezes completely:
> > > > > > > > I looked at this when I first saw it (a couple of weeks ago),
> > > > > > > > and I couldn't see any obvious reason this patch would cause
> > > > > > > > this particular problem. I didn't want to revert the patch
> > > > > > > > at that point as I feared it would cause other subtle
> > > > > > > > problems. Given that you've got a work-around, it seemed
> > > > > > > > best to just push this off past 3.0.
> > > > > > > 
> > > > > > > What kind of a workaround are you talking about? Sorry, to me
> > > > > > > it all looked like "UMS is being ignored forever". Anyway,
> > > > > > > let's move on to try to solve the issue.
> > > > > > > 
> > > > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > > > units within the hardware status page. If the
> > > > > > > > status_page.page_addr value was zero, then the computed
> > > > > > > > address would end up being 0x84.
> > > > > > > > 
> > > > > > > > And, it looks like status_page.page_addr *will* end up being
> > > > > > > > zero as a result of the patch in question. The patch resets
> > > > > > > > the entire ring structure contents back to the initial
> > > > > > > > values, which includes smashing the status_page structure to
> > > > > > > > zero, clearing the value of status_page.page_addr set in
> > > > > > > > i915_init_phys_hws.
> > > > > > > > 
> > > > > > > > Here's an untested patch which moves the initialization of
> > > > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > > > that intel_init_render_ring_buffer *already* has the setting
> > > > > > > > of the status_page.page_addr value, and so I've removed the
> > > > > > > > setting of status_page.page_addr from i915_init_phys_hws.
> > > > > > > > 
> > > > > > > > I suspect we could remove the memset from
> > > > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > > > given the memset in i915_init_phys_hws.
> > > > > > > > 
> > > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17
> > > > > > > > 00:00:00 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > > > address in
> > > > > > > > 
> > > > > > > >  intel_render_ring_init_dri
> > > > > > > > 
> > > > > > > > Physically-addressed hardware status pages are initialized
> > > > > > > > early in the driver load process by i915_init_phys_hws. For
> > > > > > > > UMS environments, the ring structure is not initialized
> > > > > > > > until the X server starts. At that point, the entire ring
> > > > > > > > structure is re-initialized with all new values. Any values
> > > > > > > > set in the ring structure (including
> > > > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > > > re-initialized.
> > > > > > > > 
> > > > > > > > This patch moves the initialization of the
> > > > > > > > status_page.page_addr value to intel_render_ring_init_dri.
> > > > > > > > 
> > > > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > > > ---
> > > > > > > > 
> > > > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c
> > > > > > > > 100644 --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > > > drm_device *dev)
> > > > > > > > 
> > > > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > > > >  {
> > > > > > > >  
> > > > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > > 
> > > > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > > > 
> > > > > > > >  	/* Program Hardware Status Page */
> > > > > > > >  	dev_priv->status_page_dmah =
> > > > > > > > 
> > > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > > > drm_device *dev)
> > > > > > > > 
> > > > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > > > >  		return -ENOMEM;
> > > > > > > >  	
> > > > > > > >  	}
> > > > > > > > 
> > > > > > > > -	ring->status_page.page_addr =
> > > > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah-
>vaddr;
> > > > > > > > 
> > > > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > > > +	memset_io((void __force __iomem
> > > > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > > > 
> > > > > > > >  	i915_write_hws_pga(dev);
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > > e961568..47b9b27 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > > > 
> > > > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > > > >  	
> > > > > > > >  	}
> > > > > > > > 
> > > > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > > > +		ring->status_page.page_addr =
> > > > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > > > 
> > > > > > > >  	ring->dev = dev;
> > > > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > > > 
> > > > > > > I can't tell whether this is correct, because intel gfx driver
> > > > > > > is unknown to me, but from the first glance your description
> > > > > > > sounds reasonable.
> > > > > > > 
> > > > > > > I'm out of office till ~ next week's tuesday, and on return
> > > > > > > I'll try to test it on the hardware in question.
> > > > > > 
> > > > > > Keith, thanks again for the patch. As promised I've tested it on
> > > > > > the hardware in question and yes, bad_access is gone and X seems
> > > > > > to work, so thank you, but...
> > > > > > 
> > > > > > 
> > > > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > > > left uninitialized and also ring->irq_lock etc...
> > > > > > 
> > > > > > 
> > > > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > > > beat me too hard, but to me the gist of the problem is the
> > > > > > original patch, where Chris does
> > > > > > 
> > > > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > 03e3370..51fbc5e 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > > > drm_device *dev)
> > > > > > > 
> > > > > > >         return intel_init_ring_buffer(dev, ring);
> > > > > > >  
> > > > > > >  }
> > > > > > > 
> > > > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64
> > > > > > > start, u32 size) +{
> > > > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > > > +
> > > > > > > +       *ring = render_ring;
> > > > > > > 
> > > > > >           ^^^^^^^^^^^^^^^^^^^
> > > > > >           here resets
> > > > > > > 
> > > > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > > > +               ring->add_request = gen6_add_request;
> > > > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > > > +       } else if (IS_GEN5(dev)) {
> > > > > > > +               ring->add_request = pc_render_add_request;
> > > > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > > > +       }
> > > > > > 
> > > > > > and then the rest of the `ring` is initialized seemingly
> > > > > > copy-pasted
> > > > > > 
> > > > > > from intel_init_ring_buffer():
> > > > > > > +       ring->dev = dev;
> > > > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > > > +
> > > > > > > +       ring->size = size;
> > > > > > > +       ring->effective_size = ring->size;
> > > > > > > +       if (IS_I830(ring->dev))
> > > > > > > +               ring->effective_size -= 128;
> > > > > > > +
> > > > > > > +       ring->map.offset = start;
> > > > > > > +       ring->map.size = size;
> > > > > > > +       ring->map.type = 0;
> > > > > > > +       ring->map.flags = 0;
> > > > > > > +       ring->map.mtrr = 0;
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > where both 3 chunks go almost exactly from
> > > > > > intel_init_ring_buffer(), and ring->effective_size tweak even
> > > > > > stripped original comment:
> > > > > > 
> > > > > > # original version from intel_init_ring_buffer():
> > > > > >         /* Workaround an erratum on the i830 which causes a hang
> > > > > >         if
> > > > > >         
> > > > > >          * the TAIL pointer points to within the last 2
> > > > > >          cachelines * of the buffer.
> > > > > >          */
> > > > > >         
> > > > > >         ring->effective_size = ring->size;
> > > > > >         if (IS_I830(ring->dev))
> > > > > >         
> > > > > >                 ring->effective_size -= 128;
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > 
> > > > > > The line marked "here resets" resets all the fields, and maybe
> > > > > > it's not a good idea to re-initialize them all afterwards
> > > > > > (missing some as this thread show), or at least if it is really
> > > > > > needed, share initialization code between
> > > > > > intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > > > > > 
> > > > > > >From the outside it looks like the offending patch was done as a
> > > > > > >quick
> > > > > > 
> > > > > > fix in a hurry (lots of copy-paste), and maybe it would be better
> > > > > > to re-do it properly...
> > > > > 
> > > > > Silence... ?
> > > > > 
> > > > > I read UMS is still ignored, because e.g. that uninitialized
> > > > > ring->irq_lock which I've wrote about above is for sure used e.g.
> > > > > in gen6_render_ring_get_irq() added to ring vtable in
> > > > > intel_render_ring_init_dri().
> > > > 
> > > > I really doubt that UMS supports gen6 hardware.
> > > 
> > > Then why it is there in intel_render_ring_init_dri():
> > >     int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > >     u32
> > > 
> > > size) {
> > > 
> > >     	drm_i915_private_t *dev_priv = dev->dev_private;
> > >     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > >     	
> > >     	*ring = render_ring;
> > >     	if (INTEL_INFO(dev)->gen >= 6) {
> > 
> > This branch executes only when hw generation is 6 or newer.
> 
> and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock
> which is left uninitialized.
> 
> I don't understand what you were trying to say. How does it matter if
> some branch executes only for such-and-such hardware, when this branch
> contains bugs? Could you please clarify?

I want to say that xf86-video-intel with gen6 support does not support UMS. So 
you can't even hit this "bug".

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 16:02                                     ` Vasily Khoruzhick
  0 siblings, 0 replies; 84+ messages in thread
From: Vasily Khoruzhick @ 2011-08-09 16:02 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Pekka Enberg, Herbert Xu, Luke-Jr, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Ray Lee, Andrew Morton, Linus Torvalds

On Tuesday 09 August 2011 18:34:46 Kirill Smelkov wrote:
> On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
> > On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> > > On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > > > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > > > Keith,
> > > > > > > 
> > > > > > > first of all thanks for your prompt reply. Then...
> > > > > > > 
> > > > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > > > <kirr@mns.spb.ru>
> > > > 
> > > > wrote:
> > > > > > > > > And now after v3.0 is out, I've tested it again, and yes,
> > > > > > > > > like it was broken on v3.0-rc5, it is (now even more)
> > > > > > > > > broken on v3.0 -- after first
> > > > > > > > 
> > > > > > > > > bad io access the system freezes completely:
> > > > > > > > I looked at this when I first saw it (a couple of weeks ago),
> > > > > > > > and I couldn't see any obvious reason this patch would cause
> > > > > > > > this particular problem. I didn't want to revert the patch
> > > > > > > > at that point as I feared it would cause other subtle
> > > > > > > > problems. Given that you've got a work-around, it seemed
> > > > > > > > best to just push this off past 3.0.
> > > > > > > 
> > > > > > > What kind of a workaround are you talking about? Sorry, to me
> > > > > > > it all looked like "UMS is being ignored forever". Anyway,
> > > > > > > let's move on to try to solve the issue.
> > > > > > > 
> > > > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > > > units within the hardware status page. If the
> > > > > > > > status_page.page_addr value was zero, then the computed
> > > > > > > > address would end up being 0x84.
> > > > > > > > 
> > > > > > > > And, it looks like status_page.page_addr *will* end up being
> > > > > > > > zero as a result of the patch in question. The patch resets
> > > > > > > > the entire ring structure contents back to the initial
> > > > > > > > values, which includes smashing the status_page structure to
> > > > > > > > zero, clearing the value of status_page.page_addr set in
> > > > > > > > i915_init_phys_hws.
> > > > > > > > 
> > > > > > > > Here's an untested patch which moves the initialization of
> > > > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > > > that intel_init_render_ring_buffer *already* has the setting
> > > > > > > > of the status_page.page_addr value, and so I've removed the
> > > > > > > > setting of status_page.page_addr from i915_init_phys_hws.
> > > > > > > > 
> > > > > > > > I suspect we could remove the memset from
> > > > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > > > given the memset in i915_init_phys_hws.
> > > > > > > > 
> > > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17
> > > > > > > > 00:00:00 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > > > address in
> > > > > > > > 
> > > > > > > >  intel_render_ring_init_dri
> > > > > > > > 
> > > > > > > > Physically-addressed hardware status pages are initialized
> > > > > > > > early in the driver load process by i915_init_phys_hws. For
> > > > > > > > UMS environments, the ring structure is not initialized
> > > > > > > > until the X server starts. At that point, the entire ring
> > > > > > > > structure is re-initialized with all new values. Any values
> > > > > > > > set in the ring structure (including
> > > > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > > > re-initialized.
> > > > > > > > 
> > > > > > > > This patch moves the initialization of the
> > > > > > > > status_page.page_addr value to intel_render_ring_init_dri.
> > > > > > > > 
> > > > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > > > ---
> > > > > > > > 
> > > > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c
> > > > > > > > 100644 --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > > > drm_device *dev)
> > > > > > > > 
> > > > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > > > >  {
> > > > > > > >  
> > > > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > > 
> > > > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > > > 
> > > > > > > >  	/* Program Hardware Status Page */
> > > > > > > >  	dev_priv->status_page_dmah =
> > > > > > > > 
> > > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > > > drm_device *dev)
> > > > > > > > 
> > > > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > > > >  		return -ENOMEM;
> > > > > > > >  	
> > > > > > > >  	}
> > > > > > > > 
> > > > > > > > -	ring->status_page.page_addr =
> > > > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah-
>vaddr;
> > > > > > > > 
> > > > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > > > +	memset_io((void __force __iomem
> > > > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > > > 
> > > > > > > >  	i915_write_hws_pga(dev);
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > > e961568..47b9b27 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > > > 
> > > > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > > > >  	
> > > > > > > >  	}
> > > > > > > > 
> > > > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > > > +		ring->status_page.page_addr =
> > > > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > > > 
> > > > > > > >  	ring->dev = dev;
> > > > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > > > 
> > > > > > > I can't tell whether this is correct, because intel gfx driver
> > > > > > > is unknown to me, but from the first glance your description
> > > > > > > sounds reasonable.
> > > > > > > 
> > > > > > > I'm out of office till ~ next week's tuesday, and on return
> > > > > > > I'll try to test it on the hardware in question.
> > > > > > 
> > > > > > Keith, thanks again for the patch. As promised I've tested it on
> > > > > > the hardware in question and yes, bad_access is gone and X seems
> > > > > > to work, so thank you, but...
> > > > > > 
> > > > > > 
> > > > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > > > left uninitialized and also ring->irq_lock etc...
> > > > > > 
> > > > > > 
> > > > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > > > beat me too hard, but to me the gist of the problem is the
> > > > > > original patch, where Chris does
> > > > > > 
> > > > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > 03e3370..51fbc5e 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > > > drm_device *dev)
> > > > > > > 
> > > > > > >         return intel_init_ring_buffer(dev, ring);
> > > > > > >  
> > > > > > >  }
> > > > > > > 
> > > > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64
> > > > > > > start, u32 size) +{
> > > > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > > > +
> > > > > > > +       *ring = render_ring;
> > > > > > > 
> > > > > >           ^^^^^^^^^^^^^^^^^^^
> > > > > >           here resets
> > > > > > > 
> > > > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > > > +               ring->add_request = gen6_add_request;
> > > > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > > > +       } else if (IS_GEN5(dev)) {
> > > > > > > +               ring->add_request = pc_render_add_request;
> > > > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > > > +       }
> > > > > > 
> > > > > > and then the rest of the `ring` is initialized seemingly
> > > > > > copy-pasted
> > > > > > 
> > > > > > from intel_init_ring_buffer():
> > > > > > > +       ring->dev = dev;
> > > > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > > > +
> > > > > > > +       ring->size = size;
> > > > > > > +       ring->effective_size = ring->size;
> > > > > > > +       if (IS_I830(ring->dev))
> > > > > > > +               ring->effective_size -= 128;
> > > > > > > +
> > > > > > > +       ring->map.offset = start;
> > > > > > > +       ring->map.size = size;
> > > > > > > +       ring->map.type = 0;
> > > > > > > +       ring->map.flags = 0;
> > > > > > > +       ring->map.mtrr = 0;
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > where both 3 chunks go almost exactly from
> > > > > > intel_init_ring_buffer(), and ring->effective_size tweak even
> > > > > > stripped original comment:
> > > > > > 
> > > > > > # original version from intel_init_ring_buffer():
> > > > > >         /* Workaround an erratum on the i830 which causes a hang
> > > > > >         if
> > > > > >         
> > > > > >          * the TAIL pointer points to within the last 2
> > > > > >          cachelines * of the buffer.
> > > > > >          */
> > > > > >         
> > > > > >         ring->effective_size = ring->size;
> > > > > >         if (IS_I830(ring->dev))
> > > > > >         
> > > > > >                 ring->effective_size -= 128;
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > 
> > > > > > The line marked "here resets" resets all the fields, and maybe
> > > > > > it's not a good idea to re-initialize them all afterwards
> > > > > > (missing some as this thread show), or at least if it is really
> > > > > > needed, share initialization code between
> > > > > > intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > > > > > 
> > > > > > >From the outside it looks like the offending patch was done as a
> > > > > > >quick
> > > > > > 
> > > > > > fix in a hurry (lots of copy-paste), and maybe it would be better
> > > > > > to re-do it properly...
> > > > > 
> > > > > Silence... ?
> > > > > 
> > > > > I read UMS is still ignored, because e.g. that uninitialized
> > > > > ring->irq_lock which I've wrote about above is for sure used e.g.
> > > > > in gen6_render_ring_get_irq() added to ring vtable in
> > > > > intel_render_ring_init_dri().
> > > > 
> > > > I really doubt that UMS supports gen6 hardware.
> > > 
> > > Then why it is there in intel_render_ring_init_dri():
> > >     int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > >     u32
> > > 
> > > size) {
> > > 
> > >     	drm_i915_private_t *dev_priv = dev->dev_private;
> > >     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > >     	
> > >     	*ring = render_ring;
> > >     	if (INTEL_INFO(dev)->gen >= 6) {
> > 
> > This branch executes only when hw generation is 6 or newer.
> 
> and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock
> which is left uninitialized.
> 
> I don't understand what you were trying to say. How does it matter if
> some branch executes only for such-and-such hardware, when this branch
> contains bugs? Could you please clarify?

I want to say that xf86-video-intel with gen6 support does not support UMS. So 
you can't even hit this "bug".

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 16:02                                     ` Vasily Khoruzhick
@ 2011-08-09 16:32                                       ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 16:32 UTC (permalink / raw)
  To: Vasily Khoruzhick
  Cc: intel-gfx, Keith Packard, Rafael J. Wysocki, Herbert Xu, Luke-Jr,
	LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tue, Aug 09, 2011 at 07:02:59PM +0300, Vasily Khoruzhick wrote:
> On Tuesday 09 August 2011 18:34:46 Kirill Smelkov wrote:
> > On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
> > > On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> > > > On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > > > > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > > > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > > > > Keith,
> > > > > > > > 
> > > > > > > > first of all thanks for your prompt reply. Then...
> > > > > > > > 
> > > > > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > > > > <kirr@mns.spb.ru>
> > > > > 
> > > > > wrote:
> > > > > > > > > > And now after v3.0 is out, I've tested it again, and yes,
> > > > > > > > > > like it was broken on v3.0-rc5, it is (now even more)
> > > > > > > > > > broken on v3.0 -- after first
> > > > > > > > > 
> > > > > > > > > > bad io access the system freezes completely:
> > > > > > > > > I looked at this when I first saw it (a couple of weeks ago),
> > > > > > > > > and I couldn't see any obvious reason this patch would cause
> > > > > > > > > this particular problem. I didn't want to revert the patch
> > > > > > > > > at that point as I feared it would cause other subtle
> > > > > > > > > problems. Given that you've got a work-around, it seemed
> > > > > > > > > best to just push this off past 3.0.
> > > > > > > > 
> > > > > > > > What kind of a workaround are you talking about? Sorry, to me
> > > > > > > > it all looked like "UMS is being ignored forever". Anyway,
> > > > > > > > let's move on to try to solve the issue.
> > > > > > > > 
> > > > > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > > > > units within the hardware status page. If the
> > > > > > > > > status_page.page_addr value was zero, then the computed
> > > > > > > > > address would end up being 0x84.
> > > > > > > > > 
> > > > > > > > > And, it looks like status_page.page_addr *will* end up being
> > > > > > > > > zero as a result of the patch in question. The patch resets
> > > > > > > > > the entire ring structure contents back to the initial
> > > > > > > > > values, which includes smashing the status_page structure to
> > > > > > > > > zero, clearing the value of status_page.page_addr set in
> > > > > > > > > i915_init_phys_hws.
> > > > > > > > > 
> > > > > > > > > Here's an untested patch which moves the initialization of
> > > > > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > > > > that intel_init_render_ring_buffer *already* has the setting
> > > > > > > > > of the status_page.page_addr value, and so I've removed the
> > > > > > > > > setting of status_page.page_addr from i915_init_phys_hws.
> > > > > > > > > 
> > > > > > > > > I suspect we could remove the memset from
> > > > > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > > > > given the memset in i915_init_phys_hws.
> > > > > > > > > 
> > > > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17
> > > > > > > > > 00:00:00 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > > > > address in
> > > > > > > > > 
> > > > > > > > >  intel_render_ring_init_dri
> > > > > > > > > 
> > > > > > > > > Physically-addressed hardware status pages are initialized
> > > > > > > > > early in the driver load process by i915_init_phys_hws. For
> > > > > > > > > UMS environments, the ring structure is not initialized
> > > > > > > > > until the X server starts. At that point, the entire ring
> > > > > > > > > structure is re-initialized with all new values. Any values
> > > > > > > > > set in the ring structure (including
> > > > > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > > > > re-initialized.
> > > > > > > > > 
> > > > > > > > > This patch moves the initialization of the
> > > > > > > > > status_page.page_addr value to intel_render_ring_init_dri.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > > > > ---
> > > > > > > > > 
> > > > > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c
> > > > > > > > > 100644 --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > > > > drm_device *dev)
> > > > > > > > > 
> > > > > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > > > > >  {
> > > > > > > > >  
> > > > > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > > > 
> > > > > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > > > > 
> > > > > > > > >  	/* Program Hardware Status Page */
> > > > > > > > >  	dev_priv->status_page_dmah =
> > > > > > > > > 
> > > > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > > > > drm_device *dev)
> > > > > > > > > 
> > > > > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > > > > >  		return -ENOMEM;
> > > > > > > > >  	
> > > > > > > > >  	}
> > > > > > > > > 
> > > > > > > > > -	ring->status_page.page_addr =
> > > > > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah-
> >vaddr;
> > > > > > > > > 
> > > > > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > > > > +	memset_io((void __force __iomem
> > > > > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > > > > 
> > > > > > > > >  	i915_write_hws_pga(dev);
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > > > e961568..47b9b27 100644
> > > > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > > > > 
> > > > > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > > > > >  	
> > > > > > > > >  	}
> > > > > > > > > 
> > > > > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > > > > +		ring->status_page.page_addr =
> > > > > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > > > > 
> > > > > > > > >  	ring->dev = dev;
> > > > > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > > > > 
> > > > > > > > I can't tell whether this is correct, because intel gfx driver
> > > > > > > > is unknown to me, but from the first glance your description
> > > > > > > > sounds reasonable.
> > > > > > > > 
> > > > > > > > I'm out of office till ~ next week's tuesday, and on return
> > > > > > > > I'll try to test it on the hardware in question.
> > > > > > > 
> > > > > > > Keith, thanks again for the patch. As promised I've tested it on
> > > > > > > the hardware in question and yes, bad_access is gone and X seems
> > > > > > > to work, so thank you, but...
> > > > > > > 
> > > > > > > 
> > > > > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > > > > left uninitialized and also ring->irq_lock etc...
> > > > > > > 
> > > > > > > 
> > > > > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > > > > beat me too hard, but to me the gist of the problem is the
> > > > > > > original patch, where Chris does
> > > > > > > 
> > > > > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > > 03e3370..51fbc5e 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > > > > drm_device *dev)
> > > > > > > > 
> > > > > > > >         return intel_init_ring_buffer(dev, ring);
> > > > > > > >  
> > > > > > > >  }
> > > > > > > > 
> > > > > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64
> > > > > > > > start, u32 size) +{
> > > > > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > > > > +
> > > > > > > > +       *ring = render_ring;
> > > > > > > > 
> > > > > > >           ^^^^^^^^^^^^^^^^^^^
> > > > > > >           here resets
> > > > > > > > 
> > > > > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > > > > +               ring->add_request = gen6_add_request;
> > > > > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > > > > +       } else if (IS_GEN5(dev)) {
> > > > > > > > +               ring->add_request = pc_render_add_request;
> > > > > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > > > > +       }
> > > > > > > 
> > > > > > > and then the rest of the `ring` is initialized seemingly
> > > > > > > copy-pasted
> > > > > > > 
> > > > > > > from intel_init_ring_buffer():
> > > > > > > > +       ring->dev = dev;
> > > > > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > > > > +
> > > > > > > > +       ring->size = size;
> > > > > > > > +       ring->effective_size = ring->size;
> > > > > > > > +       if (IS_I830(ring->dev))
> > > > > > > > +               ring->effective_size -= 128;
> > > > > > > > +
> > > > > > > > +       ring->map.offset = start;
> > > > > > > > +       ring->map.size = size;
> > > > > > > > +       ring->map.type = 0;
> > > > > > > > +       ring->map.flags = 0;
> > > > > > > > +       ring->map.mtrr = 0;
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > where both 3 chunks go almost exactly from
> > > > > > > intel_init_ring_buffer(), and ring->effective_size tweak even
> > > > > > > stripped original comment:
> > > > > > > 
> > > > > > > # original version from intel_init_ring_buffer():
> > > > > > >         /* Workaround an erratum on the i830 which causes a hang
> > > > > > >         if
> > > > > > >         
> > > > > > >          * the TAIL pointer points to within the last 2
> > > > > > >          cachelines * of the buffer.
> > > > > > >          */
> > > > > > >         
> > > > > > >         ring->effective_size = ring->size;
> > > > > > >         if (IS_I830(ring->dev))
> > > > > > >         
> > > > > > >                 ring->effective_size -= 128;
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > 
> > > > > > > The line marked "here resets" resets all the fields, and maybe
> > > > > > > it's not a good idea to re-initialize them all afterwards
> > > > > > > (missing some as this thread show), or at least if it is really
> > > > > > > needed, share initialization code between
> > > > > > > intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > > > > > > 
> > > > > > > >From the outside it looks like the offending patch was done as a
> > > > > > > >quick
> > > > > > > 
> > > > > > > fix in a hurry (lots of copy-paste), and maybe it would be better
> > > > > > > to re-do it properly...
> > > > > > 
> > > > > > Silence... ?
> > > > > > 
> > > > > > I read UMS is still ignored, because e.g. that uninitialized
> > > > > > ring->irq_lock which I've wrote about above is for sure used e.g.
> > > > > > in gen6_render_ring_get_irq() added to ring vtable in
> > > > > > intel_render_ring_init_dri().
> > > > > 
> > > > > I really doubt that UMS supports gen6 hardware.
> > > > 
> > > > Then why it is there in intel_render_ring_init_dri():
> > > >     int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > > >     u32
> > > > 
> > > > size) {
> > > > 
> > > >     	drm_i915_private_t *dev_priv = dev->dev_private;
> > > >     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > >     	
> > > >     	*ring = render_ring;
> > > >     	if (INTEL_INFO(dev)->gen >= 6) {
> > > 
> > > This branch executes only when hw generation is 6 or newer.
> > 
> > and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock
> > which is left uninitialized.
> > 
> > I don't understand what you were trying to say. How does it matter if
> > some branch executes only for such-and-such hardware, when this branch
> > contains bugs? Could you please clarify?
> 
> I want to say that xf86-video-intel with gen6 support does not support UMS. So 
> you can't even hit this "bug".


Ok, but so then there is a dead code in the kernel, right? Or not dead
at all because potentially some non-X userspace could trigger the bug.

Why it was added in the first place?


To me, intel_render_ring_init_dri() looks like being copy-pasted from
several places in a hurry. And I was already beaten by one bug
introduced in it, without a single response for 3 kernel cycles though
I've asked for help several times and provided detailed info.

Finally Keith analyzed and plugged NULL-pointer dereference (thanks)
but I'm telling, it seems there are more bugs introduced in e8616b6c.

The patch title says "drm/i915: Initialise ring vfuncs for old DRI
paths" and one could ask, why couldn't it be done without bugs and
regressions. Are we waiting for another one hitting left bugs instead of
fix them in the first place?

Quite frankly, I don't understand intel-gfx developers attitude: why is
it me, just random user who is nitpicking here? Why there is no
interest/will to analyze now obviously buggy/duplicate code and fix it?


If support for UMS/old-dri/whatever is dropped, could you please say so
and clean the driver from legacy code and move on. That would be at
least fair for people not hoping their old setups will continue to
work.


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 16:32                                       ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 16:32 UTC (permalink / raw)
  To: Vasily Khoruzhick
  Cc: intel-gfx, Keith Packard, Rafael J. Wysocki, Herbert Xu, Luke-Jr,
	LKML, dri-devel, Pekka Enberg, Ray Lee, Andrew Morton,
	Linus Torvalds

On Tue, Aug 09, 2011 at 07:02:59PM +0300, Vasily Khoruzhick wrote:
> On Tuesday 09 August 2011 18:34:46 Kirill Smelkov wrote:
> > On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
> > > On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
> > > > On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
> > > > > On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
> > > > > > On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
> > > > > > > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
> > > > > > > > Keith,
> > > > > > > > 
> > > > > > > > first of all thanks for your prompt reply. Then...
> > > > > > > > 
> > > > > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
> > > > > > > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov
> > > > > > > > > <kirr@mns.spb.ru>
> > > > > 
> > > > > wrote:
> > > > > > > > > > And now after v3.0 is out, I've tested it again, and yes,
> > > > > > > > > > like it was broken on v3.0-rc5, it is (now even more)
> > > > > > > > > > broken on v3.0 -- after first
> > > > > > > > > 
> > > > > > > > > > bad io access the system freezes completely:
> > > > > > > > > I looked at this when I first saw it (a couple of weeks ago),
> > > > > > > > > and I couldn't see any obvious reason this patch would cause
> > > > > > > > > this particular problem. I didn't want to revert the patch
> > > > > > > > > at that point as I feared it would cause other subtle
> > > > > > > > > problems. Given that you've got a work-around, it seemed
> > > > > > > > > best to just push this off past 3.0.
> > > > > > > > 
> > > > > > > > What kind of a workaround are you talking about? Sorry, to me
> > > > > > > > it all looked like "UMS is being ignored forever". Anyway,
> > > > > > > > let's move on to try to solve the issue.
> > > > > > > > 
> > > > > > > > > Given the failing address passed to ioread32, this seems like
> > > > > > > > > it's probably the call to READ_BREADCRUMB --
> > > > > > > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit
> > > > > > > > > units within the hardware status page. If the
> > > > > > > > > status_page.page_addr value was zero, then the computed
> > > > > > > > > address would end up being 0x84.
> > > > > > > > > 
> > > > > > > > > And, it looks like status_page.page_addr *will* end up being
> > > > > > > > > zero as a result of the patch in question. The patch resets
> > > > > > > > > the entire ring structure contents back to the initial
> > > > > > > > > values, which includes smashing the status_page structure to
> > > > > > > > > zero, clearing the value of status_page.page_addr set in
> > > > > > > > > i915_init_phys_hws.
> > > > > > > > > 
> > > > > > > > > Here's an untested patch which moves the initialization of
> > > > > > > > > status_page.page_addr into intel_render_ring_init_dri. I note
> > > > > > > > > that intel_init_render_ring_buffer *already* has the setting
> > > > > > > > > of the status_page.page_addr value, and so I've removed the
> > > > > > > > > setting of status_page.page_addr from i915_init_phys_hws.
> > > > > > > > > 
> > > > > > > > > I suspect we could remove the memset from
> > > > > > > > > intel_init_render_ring_buffer; it seems entirely superfluous
> > > > > > > > > given the memset in i915_init_phys_hws.
> > > > > > > > > 
> > > > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17
> > > > > > > > > 00:00:00 2001 From: Keith Packard <keithp@keithp.com>
> > > > > > > > > Date: Fri, 22 Jul 2011 10:44:39 -0700
> > > > > > > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page
> > > > > > > > > address in
> > > > > > > > > 
> > > > > > > > >  intel_render_ring_init_dri
> > > > > > > > > 
> > > > > > > > > Physically-addressed hardware status pages are initialized
> > > > > > > > > early in the driver load process by i915_init_phys_hws. For
> > > > > > > > > UMS environments, the ring structure is not initialized
> > > > > > > > > until the X server starts. At that point, the entire ring
> > > > > > > > > structure is re-initialized with all new values. Any values
> > > > > > > > > set in the ring structure (including
> > > > > > > > > ring->status_page.page_addr) will be lost when the ring is
> > > > > > > > > re-initialized.
> > > > > > > > > 
> > > > > > > > > This patch moves the initialization of the
> > > > > > > > > status_page.page_addr value to intel_render_ring_init_dri.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Keith Packard <keithp@keithp.com>
> > > > > > > > > ---
> > > > > > > > > 
> > > > > > > > >  drivers/gpu/drm/i915/i915_dma.c         |    6 ++----
> > > > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +++
> > > > > > > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c
> > > > > > > > > 100644 --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > > > > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct
> > > > > > > > > drm_device *dev)
> > > > > > > > > 
> > > > > > > > >  static int i915_init_phys_hws(struct drm_device *dev)
> > > > > > > > >  {
> > > > > > > > >  
> > > > > > > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > > > 
> > > > > > > > > -	struct intel_ring_buffer *ring = LP_RING(dev_priv);
> > > > > > > > > 
> > > > > > > > >  	/* Program Hardware Status Page */
> > > > > > > > >  	dev_priv->status_page_dmah =
> > > > > > > > > 
> > > > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct
> > > > > > > > > drm_device *dev)
> > > > > > > > > 
> > > > > > > > >  		DRM_ERROR("Can not allocate hardware status page\n");
> > > > > > > > >  		return -ENOMEM;
> > > > > > > > >  	
> > > > > > > > >  	}
> > > > > > > > > 
> > > > > > > > > -	ring->status_page.page_addr =
> > > > > > > > > -		(void __force __iomem *)dev_priv->status_page_dmah-
> >vaddr;
> > > > > > > > > 
> > > > > > > > > -	memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
> > > > > > > > > +	memset_io((void __force __iomem
> > > > > > > > > *)dev_priv->status_page_dmah->vaddr, +		  0, PAGE_SIZE);
> > > > > > > > > 
> > > > > > > > >  	i915_write_hws_pga(dev);
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > > > e961568..47b9b27 100644
> > > > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct
> > > > > > > > > drm_device *dev, u64 start, u32 size)
> > > > > > > > > 
> > > > > > > > >  		ring->get_seqno = pc_render_get_seqno;
> > > > > > > > >  	
> > > > > > > > >  	}
> > > > > > > > > 
> > > > > > > > > +	if (!I915_NEED_GFX_HWS(dev))
> > > > > > > > > +		ring->status_page.page_addr =
> > > > > > > > > dev_priv->status_page_dmah->vaddr; +
> > > > > > > > > 
> > > > > > > > >  	ring->dev = dev;
> > > > > > > > >  	INIT_LIST_HEAD(&ring->active_list);
> > > > > > > > >  	INIT_LIST_HEAD(&ring->request_list);
> > > > > > > > 
> > > > > > > > I can't tell whether this is correct, because intel gfx driver
> > > > > > > > is unknown to me, but from the first glance your description
> > > > > > > > sounds reasonable.
> > > > > > > > 
> > > > > > > > I'm out of office till ~ next week's tuesday, and on return
> > > > > > > > I'll try to test it on the hardware in question.
> > > > > > > 
> > > > > > > Keith, thanks again for the patch. As promised I've tested it on
> > > > > > > the hardware in question and yes, bad_access is gone and X seems
> > > > > > > to work, so thank you, but...
> > > > > > > 
> > > > > > > 
> > > > > > > I see there are more such bugs in introduced-in-guilty-patch
> > > > > > > intel_render_ring_init_dri(). For example ring->irq_queue is
> > > > > > > left uninitialized and also ring->irq_lock etc...
> > > > > > > 
> > > > > > > 
> > > > > > > I'm X newbie, so if here is something stupid X-wise, please don't
> > > > > > > beat me too hard, but to me the gist of the problem is the
> > > > > > > original patch, where Chris does
> > > > > > > 
> > > > > > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index
> > > > > > > > 03e3370..51fbc5e 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > > > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct
> > > > > > > > drm_device *dev)
> > > > > > > > 
> > > > > > > >         return intel_init_ring_buffer(dev, ring);
> > > > > > > >  
> > > > > > > >  }
> > > > > > > > 
> > > > > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64
> > > > > > > > start, u32 size) +{
> > > > > > > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > > > > > > +       struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > > > > > > +
> > > > > > > > +       *ring = render_ring;
> > > > > > > > 
> > > > > > >           ^^^^^^^^^^^^^^^^^^^
> > > > > > >           here resets
> > > > > > > > 
> > > > > > > > +       if (INTEL_INFO(dev)->gen >= 6) {
> > > > > > > > +               ring->add_request = gen6_add_request;
> > > > > > > > +               ring->irq_get = gen6_render_ring_get_irq;
> > > > > > > > +               ring->irq_put = gen6_render_ring_put_irq;
> > > > > > > > +       } else if (IS_GEN5(dev)) {
> > > > > > > > +               ring->add_request = pc_render_add_request;
> > > > > > > > +               ring->get_seqno = pc_render_get_seqno;
> > > > > > > > +       }
> > > > > > > 
> > > > > > > and then the rest of the `ring` is initialized seemingly
> > > > > > > copy-pasted
> > > > > > > 
> > > > > > > from intel_init_ring_buffer():
> > > > > > > > +       ring->dev = dev;
> > > > > > > > +       INIT_LIST_HEAD(&ring->active_list);
> > > > > > > > +       INIT_LIST_HEAD(&ring->request_list);
> > > > > > > > +       INIT_LIST_HEAD(&ring->gpu_write_list);
> > > > > > > > +
> > > > > > > > +       ring->size = size;
> > > > > > > > +       ring->effective_size = ring->size;
> > > > > > > > +       if (IS_I830(ring->dev))
> > > > > > > > +               ring->effective_size -= 128;
> > > > > > > > +
> > > > > > > > +       ring->map.offset = start;
> > > > > > > > +       ring->map.size = size;
> > > > > > > > +       ring->map.type = 0;
> > > > > > > > +       ring->map.flags = 0;
> > > > > > > > +       ring->map.mtrr = 0;
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > where both 3 chunks go almost exactly from
> > > > > > > intel_init_ring_buffer(), and ring->effective_size tweak even
> > > > > > > stripped original comment:
> > > > > > > 
> > > > > > > # original version from intel_init_ring_buffer():
> > > > > > >         /* Workaround an erratum on the i830 which causes a hang
> > > > > > >         if
> > > > > > >         
> > > > > > >          * the TAIL pointer points to within the last 2
> > > > > > >          cachelines * of the buffer.
> > > > > > >          */
> > > > > > >         
> > > > > > >         ring->effective_size = ring->size;
> > > > > > >         if (IS_I830(ring->dev))
> > > > > > >         
> > > > > > >                 ring->effective_size -= 128;
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > 
> > > > > > > The line marked "here resets" resets all the fields, and maybe
> > > > > > > it's not a good idea to re-initialize them all afterwards
> > > > > > > (missing some as this thread show), or at least if it is really
> > > > > > > needed, share initialization code between
> > > > > > > intel_render_ring_init_dri() and intel_init_ring_buffer() ?
> > > > > > > 
> > > > > > > >From the outside it looks like the offending patch was done as a
> > > > > > > >quick
> > > > > > > 
> > > > > > > fix in a hurry (lots of copy-paste), and maybe it would be better
> > > > > > > to re-do it properly...
> > > > > > 
> > > > > > Silence... ?
> > > > > > 
> > > > > > I read UMS is still ignored, because e.g. that uninitialized
> > > > > > ring->irq_lock which I've wrote about above is for sure used e.g.
> > > > > > in gen6_render_ring_get_irq() added to ring vtable in
> > > > > > intel_render_ring_init_dri().
> > > > > 
> > > > > I really doubt that UMS supports gen6 hardware.
> > > > 
> > > > Then why it is there in intel_render_ring_init_dri():
> > > >     int intel_render_ring_init_dri(struct drm_device *dev, u64 start,
> > > >     u32
> > > > 
> > > > size) {
> > > > 
> > > >     	drm_i915_private_t *dev_priv = dev->dev_private;
> > > >     	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
> > > >     	
> > > >     	*ring = render_ring;
> > > >     	if (INTEL_INFO(dev)->gen >= 6) {
> > > 
> > > This branch executes only when hw generation is 6 or newer.
> > 
> > and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock
> > which is left uninitialized.
> > 
> > I don't understand what you were trying to say. How does it matter if
> > some branch executes only for such-and-such hardware, when this branch
> > contains bugs? Could you please clarify?
> 
> I want to say that xf86-video-intel with gen6 support does not support UMS. So 
> you can't even hit this "bug".


Ok, but so then there is a dead code in the kernel, right? Or not dead
at all because potentially some non-X userspace could trigger the bug.

Why it was added in the first place?


To me, intel_render_ring_init_dri() looks like being copy-pasted from
several places in a hurry. And I was already beaten by one bug
introduced in it, without a single response for 3 kernel cycles though
I've asked for help several times and provided detailed info.

Finally Keith analyzed and plugged NULL-pointer dereference (thanks)
but I'm telling, it seems there are more bugs introduced in e8616b6c.

The patch title says "drm/i915: Initialise ring vfuncs for old DRI
paths" and one could ask, why couldn't it be done without bugs and
regressions. Are we waiting for another one hitting left bugs instead of
fix them in the first place?

Quite frankly, I don't understand intel-gfx developers attitude: why is
it me, just random user who is nitpicking here? Why there is no
interest/will to analyze now obviously buggy/duplicate code and fix it?


If support for UMS/old-dri/whatever is dropped, could you please say so
and clean the driver from legacy code and move on. That would be at
least fair for people not hoping their old setups will continue to
work.


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 16:32                                       ` Kirill Smelkov
@ 2011-08-09 16:56                                         ` Ray Lee
  -1 siblings, 0 replies; 84+ messages in thread
From: Ray Lee @ 2011-08-09 16:56 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Vasily Khoruzhick, intel-gfx, Keith Packard, Rafael J. Wysocki,
	Herbert Xu, Luke-Jr, LKML, dri-devel, Pekka Enberg,
	Andrew Morton, Linus Torvalds

On Tue, Aug 9, 2011 at 9:32 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> Quite frankly, I don't understand intel-gfx developers attitude: why is
> it me, just random user who is nitpicking here? Why there is no
> interest/will to analyze now obviously buggy/duplicate code and fix it?

Because they don't have an infinite amount of manpower. Actual bugs
hitting actual users take precedence over 'cleanups' which always have
a chance of causing regressions, as you're well aware. Code churn for
the sake of abstract prettiness is discouraged, as it has a potential
cost for little potential gain.

If you like, submit a patch. You may now be more up-to-date on those
particular code paths than most of the intel-gfx developers.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 16:56                                         ` Ray Lee
  0 siblings, 0 replies; 84+ messages in thread
From: Ray Lee @ 2011-08-09 16:56 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Vasily Khoruzhick, intel-gfx, Keith Packard, Rafael J. Wysocki,
	Herbert Xu, Luke-Jr, LKML, dri-devel, Pekka Enberg,
	Andrew Morton, Linus Torvalds

On Tue, Aug 9, 2011 at 9:32 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> Quite frankly, I don't understand intel-gfx developers attitude: why is
> it me, just random user who is nitpicking here? Why there is no
> interest/will to analyze now obviously buggy/duplicate code and fix it?

Because they don't have an infinite amount of manpower. Actual bugs
hitting actual users take precedence over 'cleanups' which always have
a chance of causing regressions, as you're well aware. Code churn for
the sake of abstract prettiness is discouraged, as it has a potential
cost for little potential gain.

If you like, submit a patch. You may now be more up-to-date on those
particular code paths than most of the intel-gfx developers.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 16:56                                         ` Ray Lee
@ 2011-08-09 17:40                                           ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 17:40 UTC (permalink / raw)
  To: Ray Lee
  Cc: Vasily Khoruzhick, intel-gfx, Keith Packard, Rafael J. Wysocki,
	Herbert Xu, Luke-Jr, LKML, dri-devel, Pekka Enberg,
	Andrew Morton, Linus Torvalds

On Tue, Aug 09, 2011 at 09:56:01AM -0700, Ray Lee wrote:
> On Tue, Aug 9, 2011 at 9:32 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > Quite frankly, I don't understand intel-gfx developers attitude: why is
> > it me, just random user who is nitpicking here? Why there is no
> > interest/will to analyze now obviously buggy/duplicate code and fix it?
> 
> Because they don't have an infinite amount of manpower. Actual bugs
> hitting actual users take precedence over 'cleanups' which always have
> a chance of causing regressions, as you're well aware. Code churn for
> the sake of abstract prettiness is discouraged, as it has a potential
> cost for little potential gain.
> 
> If you like, submit a patch. You may now be more up-to-date on those
> particular code paths than most of the intel-gfx developers.

Ray, I'd agree with you if the topic was about cleanups.

But here I was talking about copy-pasty commit which introduced
regressions and bugs, and if now it's a user dilemma to either "clean up"
it after developers himself, or accept that something is broken because
developers lack manpower and so plug things in a hurry increasing
entropy, I'd like to remind a good rule, at least to me one more time,
not to break things in the first place.

I'm not talking about cleanup here. I'm talking about original commit
which introduced problems, and that there is no need to clean it up, but
better revert and redo properly to avoid subsequent code churn in lots
of fixes.


Sorry, I won't submit a patch. If there is a need to find/fix/cleanup
obvious things after company's developers, I have better things to do,
and a todo item to re-evaluate hardware for my next project.


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 17:40                                           ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-09 17:40 UTC (permalink / raw)
  To: Ray Lee
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Andrew Morton, Linus Torvalds, Pekka Enberg

On Tue, Aug 09, 2011 at 09:56:01AM -0700, Ray Lee wrote:
> On Tue, Aug 9, 2011 at 9:32 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> > Quite frankly, I don't understand intel-gfx developers attitude: why is
> > it me, just random user who is nitpicking here? Why there is no
> > interest/will to analyze now obviously buggy/duplicate code and fix it?
> 
> Because they don't have an infinite amount of manpower. Actual bugs
> hitting actual users take precedence over 'cleanups' which always have
> a chance of causing regressions, as you're well aware. Code churn for
> the sake of abstract prettiness is discouraged, as it has a potential
> cost for little potential gain.
> 
> If you like, submit a patch. You may now be more up-to-date on those
> particular code paths than most of the intel-gfx developers.

Ray, I'd agree with you if the topic was about cleanups.

But here I was talking about copy-pasty commit which introduced
regressions and bugs, and if now it's a user dilemma to either "clean up"
it after developers himself, or accept that something is broken because
developers lack manpower and so plug things in a hurry increasing
entropy, I'd like to remind a good rule, at least to me one more time,
not to break things in the first place.

I'm not talking about cleanup here. I'm talking about original commit
which introduced problems, and that there is no need to clean it up, but
better revert and redo properly to avoid subsequent code churn in lots
of fixes.


Sorry, I won't submit a patch. If there is a need to find/fix/cleanup
obvious things after company's developers, I have better things to do,
and a todo item to re-evaluate hardware for my next project.


Thanks,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 17:40                                           ` Kirill Smelkov
@ 2011-08-09 17:43                                             ` Ray Lee
  -1 siblings, 0 replies; 84+ messages in thread
From: Ray Lee @ 2011-08-09 17:43 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Vasily Khoruzhick, intel-gfx, Keith Packard, Rafael J. Wysocki,
	Herbert Xu, Luke-Jr, LKML, dri-devel, Pekka Enberg,
	Andrew Morton, Linus Torvalds

On Tue, Aug 9, 2011 at 10:40 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
>> If you like, submit a patch. You may now be more up-to-date on those
>> particular code paths than most of the intel-gfx developers.
>
> Ray, I'd agree with you if the topic was about cleanups.

At this point it is about cleanups unless Keith's patch upthread does
not work for you. Does it or not?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-09 17:43                                             ` Ray Lee
  0 siblings, 0 replies; 84+ messages in thread
From: Ray Lee @ 2011-08-09 17:43 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Vasily Khoruzhick, intel-gfx, Keith Packard, Rafael J. Wysocki,
	Herbert Xu, Luke-Jr, LKML, dri-devel, Pekka Enberg,
	Andrew Morton, Linus Torvalds

On Tue, Aug 9, 2011 at 10:40 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
>> If you like, submit a patch. You may now be more up-to-date on those
>> particular code paths than most of the intel-gfx developers.
>
> Ray, I'd agree with you if the topic was about cleanups.

At this point it is about cleanups unless Keith's patch upthread does
not work for you. Does it or not?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 17:43                                             ` Ray Lee
@ 2011-08-10  8:36                                               ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-10  8:36 UTC (permalink / raw)
  To: Ray Lee
  Cc: Vasily Khoruzhick, intel-gfx, Keith Packard, Rafael J. Wysocki,
	Herbert Xu, Luke-Jr, LKML, dri-devel, Pekka Enberg,
	Andrew Morton, Linus Torvalds

On Tue, Aug 09, 2011 at 10:43:08AM -0700, Ray Lee wrote:
> On Tue, Aug 9, 2011 at 10:40 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> >> If you like, submit a patch. You may now be more up-to-date on those
> >> particular code paths than most of the intel-gfx developers.
> >
> > Ray, I'd agree with you if the topic was about cleanups.
> 
> At this point it is about cleanups unless Keith's patch upthread does
> not work for you. Does it or not?

I've already wrote two weeks ago it does, but if this needs to be
restated one more time here it is: Keith's patch fixes the problem in a
sense that X now starts and seemingly works (thanks), but several issues
remain to be there imho. I've got the message, if it's ok for intel-gfx
to leave them as is - it's ok for me.


Peace,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-10  8:36                                               ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-10  8:36 UTC (permalink / raw)
  To: Ray Lee
  Cc: Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Andrew Morton, Linus Torvalds, Pekka Enberg

On Tue, Aug 09, 2011 at 10:43:08AM -0700, Ray Lee wrote:
> On Tue, Aug 9, 2011 at 10:40 AM, Kirill Smelkov <kirr@mns.spb.ru> wrote:
> >> If you like, submit a patch. You may now be more up-to-date on those
> >> particular code paths than most of the intel-gfx developers.
> >
> > Ray, I'd agree with you if the topic was about cleanups.
> 
> At this point it is about cleanups unless Keith's patch upthread does
> not work for you. Does it or not?

I've already wrote two weeks ago it does, but if this needs to be
restated one more time here it is: Keith's patch fixes the problem in a
sense that X now starts and seemingly works (thanks), but several issues
remain to be there imho. I've got the message, if it's ok for intel-gfx
to leave them as is - it's ok for me.


Peace,
Kirill

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-09 17:40                                           ` Kirill Smelkov
@ 2011-08-10  9:41                                             ` Alan Cox
  -1 siblings, 0 replies; 84+ messages in thread
From: Alan Cox @ 2011-08-10  9:41 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Ray Lee, Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Vasily Khoruzhick, Andrew Morton, Linus Torvalds,
	Pekka Enberg

> Sorry, I won't submit a patch. If there is a need to find/fix/cleanup
> obvious things after company's developers, I have better things to do,
> and a todo item to re-evaluate hardware for my next project.

You seem confused. If you have a support contract of some form with a
Linux supplier or Intel please contact your support. This mailing list
isn't for support services.

Alan

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-10  9:41                                             ` Alan Cox
  0 siblings, 0 replies; 84+ messages in thread
From: Alan Cox @ 2011-08-10  9:41 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Vasily Khoruzhick, Herbert Xu, Ray Lee, intel-gfx, LKML,
	dri-devel, Rafael J. Wysocki, Luke-Jr, Andrew Morton,
	Linus Torvalds, Pekka Enberg

> Sorry, I won't submit a patch. If there is a need to find/fix/cleanup
> obvious things after company's developers, I have better things to do,
> and a todo item to re-evaluate hardware for my next project.

You seem confused. If you have a support contract of some form with a
Linux supplier or Intel please contact your support. This mailing list
isn't for support services.

Alan

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
  2011-08-10  9:41                                             ` Alan Cox
@ 2011-08-10 11:37                                               ` Kirill Smelkov
  -1 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-10 11:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ray Lee, Rafael J. Wysocki, Herbert Xu, Luke-Jr, intel-gfx, LKML,
	dri-devel, Vasily Khoruzhick, Andrew Morton, Linus Torvalds,
	Pekka Enberg

On Wed, Aug 10, 2011 at 10:41:44AM +0100, Alan Cox wrote:
> > Sorry, I won't submit a patch. If there is a need to find/fix/cleanup
> > obvious things after company's developers, I have better things to do,
> > and a todo item to re-evaluate hardware for my next project.
> 
> You seem confused. If you have a support contract of some form with a
> Linux supplier or Intel please contact your support. This mailing list
> isn't for support services.

Thanks for clarifying.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Major 2.6.38 / 2.6.39 / 3.0 regression ignored?
@ 2011-08-10 11:37                                               ` Kirill Smelkov
  0 siblings, 0 replies; 84+ messages in thread
From: Kirill Smelkov @ 2011-08-10 11:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Herbert Xu, Ray Lee, intel-gfx, LKML, dri-devel,
	Rafael J. Wysocki, Luke-Jr, Andrew Morton, Linus Torvalds,
	Pekka Enberg

On Wed, Aug 10, 2011 at 10:41:44AM +0100, Alan Cox wrote:
> > Sorry, I won't submit a patch. If there is a need to find/fix/cleanup
> > obvious things after company's developers, I have better things to do,
> > and a todo item to re-evaluate hardware for my next project.
> 
> You seem confused. If you have a support contract of some form with a
> Linux supplier or Intel please contact your support. This mailing list
> isn't for support services.

Thanks for clarifying.

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2011-08-10 11:38 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-20 17:06 Major 2.6.38 regression ignored? Luke-Jr
2011-05-20 18:08 ` Ray Lee
2011-05-20 20:24   ` Rafael J. Wysocki
2011-05-20 21:11     ` Ray Lee
2011-05-21  8:41   ` Chris Wilson
2011-05-21 15:23     ` Luke-Jr
2011-05-21 15:40       ` Chris Wilson
2011-05-21 15:40         ` Chris Wilson
2011-05-21 19:33         ` Luke-Jr
2011-05-21 19:33           ` Luke-Jr
2011-05-28 13:19         ` Major 2.6.38 / 2.6.39 " Kirill Smelkov
2011-07-12 17:17           ` [Intel-gfx] " Kirill Smelkov
2011-07-12 18:07             ` Pekka Enberg
2011-07-12 18:07               ` Pekka Enberg
2011-07-22  2:59               ` Linux 3.0 release Linus Torvalds
2011-07-22 11:08                 ` Major 2.6.38 / 2.6.39 / 3.0 regression ignored? Kirill Smelkov
2011-07-22 11:08                   ` Kirill Smelkov
2011-07-22 14:12                   ` Herbert Xu
2011-07-22 14:12                     ` Herbert Xu
2011-07-22 18:00                   ` Keith Packard
2011-07-22 18:00                     ` Keith Packard
2011-07-22 20:23                     ` Kirill Smelkov
2011-07-22 20:23                       ` Kirill Smelkov
2011-07-22 20:50                       ` Keith Packard
2011-07-22 20:50                         ` Keith Packard
2011-07-22 21:08                         ` Kirill Smelkov
2011-07-22 21:08                           ` Kirill Smelkov
2011-07-22 21:31                           ` [Intel-gfx] " Kirill Smelkov
2011-07-22 21:31                             ` Kirill Smelkov
2011-07-23 15:10                             ` [Intel-gfx] " Alex Deucher
2011-07-23 15:10                               ` Alex Deucher
2011-07-23 18:19                               ` Kirill Smelkov
2011-07-23 18:19                                 ` Kirill Smelkov
2011-07-23 15:55                         ` Pekka Enberg
2011-07-25  4:29                           ` Keith Packard
2011-07-26 13:48                       ` [Intel-gfx] " Kirill Smelkov
2011-07-26 13:48                         ` Kirill Smelkov
2011-08-09 12:08                         ` Kirill Smelkov
2011-08-09 12:08                           ` Kirill Smelkov
2011-08-09 14:00                           ` [Intel-gfx] " Vasily Khoruzhick
2011-08-09 14:00                             ` Vasily Khoruzhick
2011-08-09 14:47                             ` [Intel-gfx] " Kirill Smelkov
2011-08-09 14:47                               ` Kirill Smelkov
2011-08-09 15:09                               ` [Intel-gfx] " Vasily Khoruzhick
2011-08-09 15:09                                 ` Vasily Khoruzhick
2011-08-09 15:34                                 ` [Intel-gfx] " Kirill Smelkov
2011-08-09 15:34                                   ` Kirill Smelkov
2011-08-09 16:02                                   ` [Intel-gfx] " Vasily Khoruzhick
2011-08-09 16:02                                     ` Vasily Khoruzhick
2011-08-09 16:32                                     ` [Intel-gfx] " Kirill Smelkov
2011-08-09 16:32                                       ` Kirill Smelkov
2011-08-09 16:56                                       ` Ray Lee
2011-08-09 16:56                                         ` Ray Lee
2011-08-09 17:40                                         ` Kirill Smelkov
2011-08-09 17:40                                           ` Kirill Smelkov
2011-08-09 17:43                                           ` [Intel-gfx] " Ray Lee
2011-08-09 17:43                                             ` Ray Lee
2011-08-10  8:36                                             ` Kirill Smelkov
2011-08-10  8:36                                               ` Kirill Smelkov
2011-08-10  9:41                                           ` [Intel-gfx] " Alan Cox
2011-08-10  9:41                                             ` Alan Cox
2011-08-10 11:37                                             ` Kirill Smelkov
2011-08-10 11:37                                               ` Kirill Smelkov
2011-07-22 12:52                 ` Linux 3.0 release Martin Knoblauch
2011-07-22 19:11                 ` David
2011-07-22 19:21                   ` Linus Torvalds
2011-07-22 19:44                     ` Ben Greear
2011-07-22 20:32                       ` Stephen Hemminger
2011-07-22 20:35                         ` Linus Torvalds
2011-07-23  2:27                           ` Tejun Heo
2011-07-23  2:30                             ` Tejun Heo
2011-07-22 21:26                         ` Francois Romieu
2011-07-22 22:09                           ` Stephen Hemminger
2011-07-22 22:53                             ` [PATCH] net: allow netif_carrier to be called safely from IRQ Stephen Hemminger
2011-07-23  0:16                               ` David Miller
2011-07-22 23:21                 ` Linux 3.0 release - btrfs possible locking deadlock Ed Tomlinson
2011-07-25 19:49                   ` Chris Mason
2011-07-26  0:22                     ` Ed Tomlinson
2011-07-24 22:04                 ` Linux 3.0 release Arnaud Lacombe
2011-07-25  2:21                   ` Yoshinori Sato
2011-07-25 15:50                     ` Arnaud Lacombe
2011-07-27 15:22                       ` Yoshinori Sato
2011-07-27 17:29                         ` Arnaud Lacombe
2011-07-28  2:08                         ` Arnaud Lacombe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.