All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-03 11:17 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-03 11:17 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, sparclinux

In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
frequent lock-up like behaviour on my Ultra5 (ATI Mach64).

I finally managed to trace the cause to this change in 2.6.16-git6:

diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c
index 580b63d..8cb0620 100644
--- a/arch/sparc64/mm/generic.c
+++ b/arch/sparc64/mm/generic.c
@@ -144,7 +140,6 @@ int io_remap_pfn_range(struct vm_area_st
 	vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
 	vma->vm_pgoff = phys_base >> PAGE_SHIFT;
 
-	prot = __pgprot(pg_iobits);
 	offset -= from;
 	dir = pgd_offset(mm, from);
 	flush_cache_range(vma, beg, end);

Reverting this patch fixes my X11 problems.

Adding a debug printk and a dump_stack there, plus a hack to
sys_mmap() to store away its parameters, shows:

io_remap_pfn_range: prot 0x8000000000000788, pg_iobits 0x8000000000000f8a, mmap() in 1961(X), prot 0x3, flags 0x1
Call Trace:
 [00000000004eb9a0] proc_bus_pci_mmap+0x38/0x54
 [000000000047122c] do_mmap_pgoff+0x474/0x674
 [000000000041f02c] sys_mmap+0x178/0x1b0
 [00000000004069d4] linux_sparc_syscall32+0x34/0x40
 [0000000000072b78] 0x72b78

I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
something PCI-related, presumably the ATI card. The protection
bits passed into io_remap_pfn_range() are 0x80...0788, while
pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
which, if I read things correctly, means that _PAGE_W_4U and
_PAGE_MODIFIED_4U don't get set any more.

I guess something else in the kernel should have set those
bits before they got to io_remap_pfn_range()?

/Mikael

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-03 11:17 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-03 11:17 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, sparclinux

In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
frequent lock-up like behaviour on my Ultra5 (ATI Mach64).

I finally managed to trace the cause to this change in 2.6.16-git6:

diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c
index 580b63d..8cb0620 100644
--- a/arch/sparc64/mm/generic.c
+++ b/arch/sparc64/mm/generic.c
@@ -144,7 +140,6 @@ int io_remap_pfn_range(struct vm_area_st
 	vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
 	vma->vm_pgoff = phys_base >> PAGE_SHIFT;
 
-	prot = __pgprot(pg_iobits);
 	offset -= from;
 	dir = pgd_offset(mm, from);
 	flush_cache_range(vma, beg, end);

Reverting this patch fixes my X11 problems.

Adding a debug printk and a dump_stack there, plus a hack to
sys_mmap() to store away its parameters, shows:

io_remap_pfn_range: prot 0x8000000000000788, pg_iobits 0x8000000000000f8a, mmap() in 1961(X), prot 0x3, flags 0x1
Call Trace:
 [00000000004eb9a0] proc_bus_pci_mmap+0x38/0x54
 [000000000047122c] do_mmap_pgoff+0x474/0x674
 [000000000041f02c] sys_mmap+0x178/0x1b0
 [00000000004069d4] linux_sparc_syscall32+0x34/0x40
 [0000000000072b78] 0x72b78

I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
something PCI-related, presumably the ATI card. The protection
bits passed into io_remap_pfn_range() are 0x80...0788, while
pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
which, if I read things correctly, means that _PAGE_W_4U and
_PAGE_MODIFIED_4U don't get set any more.

I guess something else in the kernel should have set those
bits before they got to io_remap_pfn_range()?

/Mikael

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-07-04  7:41   ` Rene Rebe
  -1 siblings, 0 replies; 38+ messages in thread
From: Rene Rebe @ 2006-07-04  7:41 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: davem, linux-kernel, sparclinux

Hi,

On Monday 03 July 2006 13:17, Mikael Pettersson wrote:
> In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> I finally managed to trace the cause to this change in 2.6.16-git6:

I can confirm this behaviour, on a U5 with ATi onboard, but for me it
happens also on the Creator 3D of a U30, likewise.

I'll try to test if this changeset makes a difference for me as well
as soon as possible.

-- 
René Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
            http://exactcode.de | http://t2-project.org | http://rebe.name
            +49 (0)30 / 255 897 45

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-04  7:41   ` Rene Rebe
  0 siblings, 0 replies; 38+ messages in thread
From: Rene Rebe @ 2006-07-04  7:41 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: davem, linux-kernel, sparclinux

Hi,

On Monday 03 July 2006 13:17, Mikael Pettersson wrote:
> In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> I finally managed to trace the cause to this change in 2.6.16-git6:

I can confirm this behaviour, on a U5 with ATi onboard, but for me it
happens also on the Creator 3D of a U30, likewise.

I'll try to test if this changeset makes a difference for me as well
as soon as possible.

-- 
René Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
            http://exactcode.de | http://t2-project.org | http://rebe.name
            +49 (0)30 / 255 897 45

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-04  7:41   ` Rene Rebe
@ 2006-07-04  9:32     ` Rene Rebe
  -1 siblings, 0 replies; 38+ messages in thread
From: Rene Rebe @ 2006-07-04  9:32 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: davem, linux-kernel, sparclinux

Hi,

On Tuesday 04 July 2006 09:41, Rene Rebe wrote:

> On Monday 03 July 2006 13:17, Mikael Pettersson wrote:
> > In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> > frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> > I finally managed to trace the cause to this change in 2.6.16-git6:
> 
> I can confirm this behaviour, on a U5 with ATi onboard, but for me it
> happens also on the Creator 3D of a U30, likewise.
> 
> I'll try to test if this changeset makes a difference for me as well
> as soon as possible.

I can confirm that backing out this changeset fixes X on ATi@U5 as
well as Creator3D@U30 to not stall and hang every few seconds for
many more seconds/minutes.

Yours,

-- 
René Rebe - ExactCODE - Berlin (Europe / Germany)
            http://exactcode.de | http://t2-project.org | http://rene.rebe.name
            +49 (0)30 / 255 897 45

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-04  9:32     ` Rene Rebe
  0 siblings, 0 replies; 38+ messages in thread
From: Rene Rebe @ 2006-07-04  9:32 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: davem, linux-kernel, sparclinux

Hi,

On Tuesday 04 July 2006 09:41, Rene Rebe wrote:

> On Monday 03 July 2006 13:17, Mikael Pettersson wrote:
> > In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> > frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> > I finally managed to trace the cause to this change in 2.6.16-git6:
> 
> I can confirm this behaviour, on a U5 with ATi onboard, but for me it
> happens also on the Creator 3D of a U30, likewise.
> 
> I'll try to test if this changeset makes a difference for me as well
> as soon as possible.

I can confirm that backing out this changeset fixes X on ATi@U5 as
well as Creator3D@U30 to not stall and hang every few seconds for
many more seconds/minutes.

Yours,

-- 
René Rebe - ExactCODE - Berlin (Europe / Germany)
            http://exactcode.de | http://t2-project.org | http://rene.rebe.name
            +49 (0)30 / 255 897 45

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-07-04 10:03 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-04 10:03 UTC (permalink / raw)
  To: mikpe, rene; +Cc: davem, linux-kernel, sparclinux

On Tue, 4 Jul 2006 11:32:31 +0200, Rene Rebe wrote:
> On Tuesday 04 July 2006 09:41, Rene Rebe wrote:
> 
> > On Monday 03 July 2006 13:17, Mikael Pettersson wrote:
> > > In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> > > frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> > > I finally managed to trace the cause to this change in 2.6.16-git6:
> > 
> > I can confirm this behaviour, on a U5 with ATi onboard, but for me it
> > happens also on the Creator 3D of a U30, likewise.
> > 
> > I'll try to test if this changeset makes a difference for me as well
> > as soon as possible.
> 
> I can confirm that backing out this changeset fixes X on ATi@U5 as
> well as Creator3D@U30 to not stall and hang every few seconds for
> many more seconds/minutes.

Thanks.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-04 10:03 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-04 10:03 UTC (permalink / raw)
  To: mikpe, rene; +Cc: davem, linux-kernel, sparclinux

On Tue, 4 Jul 2006 11:32:31 +0200, Rene Rebe wrote:
> On Tuesday 04 July 2006 09:41, Rene Rebe wrote:
> 
> > On Monday 03 July 2006 13:17, Mikael Pettersson wrote:
> > > In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> > > frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> > > I finally managed to trace the cause to this change in 2.6.16-git6:
> > 
> > I can confirm this behaviour, on a U5 with ATi onboard, but for me it
> > happens also on the Creator 3D of a U30, likewise.
> > 
> > I'll try to test if this changeset makes a difference for me as well
> > as soon as possible.
> 
> I can confirm that backing out this changeset fixes X on ATi@U5 as
> well as Creator3D@U30 to not stall and hang every few seconds for
> many more seconds/minutes.

Thanks.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
  (?)
  (?)
@ 2006-07-05 21:07 ` Ludovic Courtès
  -1 siblings, 0 replies; 38+ messages in thread
From: Ludovic Courtès @ 2006-07-05 21:07 UTC (permalink / raw)
  To: sparclinux

Hi,

(Stripped `linux-kernel@'.)

2 days, 9 hours, 28 minutes, 40 seconds ago, Mikael Pettersson wrote:
> In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
> frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
> 
> I finally managed to trace the cause to this change in 2.6.16-git6:

Can you be more specific as to which version of Xorg you are using?

Several people including me have reported being unable to run Xorg 7.0
on ATI-based sparc64 machines such as U5s [0].  Symptoms include kernel
hangs as well as transient [1,2] or permanent [3] filesystem corruptions.

I just recompiled a kernel with your patch and it doesn't make any
difference in that respect (besides, re-adding the
`prot = __pgprot(...)' line looks questionable since the input value of
PROT becomes unused as a result).

Xorg 6.9, OTOH, was apparently working fine for most people.

Thanks,
Ludovic.

[0] http://lists.debian.org/debian-sparc/2006/04/msg00096.html
[1] http://lists.debian.org/debian-sparc/2006/05/msg00008.html
[2] http://lists.debian.org/debian-sparc/2006/06/msg00112.html
[3] http://lists.debian.org/debian-sparc/2006/06/msg00141.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
                   ` (2 preceding siblings ...)
  (?)
@ 2006-07-06  1:46 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-06  1:46 UTC (permalink / raw)
  To: sparclinux

On Wed, 5 Jul 2006 23:07:31 +0200, <ludo@chbouib.org> wrote:
>2 days, 9 hours, 28 minutes, 40 seconds ago, Mikael Pettersson wrote:
>> In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
>> frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
>> 
>> I finally managed to trace the cause to this change in 2.6.16-git6:
>
>Can you be more specific as to which version of Xorg you are using?
>
>Several people including me have reported being unable to run Xorg 7.0
>on ATI-based sparc64 machines such as U5s [0].  Symptoms include kernel
>hangs as well as transient [1,2] or permanent [3] filesystem corruptions.
>
>I just recompiled a kernel with your patch and it doesn't make any
>difference in that respect (besides, re-adding the
>`prot = __pgprot(...)' line looks questionable since the input value of
>PROT becomes unused as a result).
>
>Xorg 6.9, OTOH, was apparently working fine for most people.

Sorry, I can't help you with your Xorg 7.0 problems.
The user-space on my Ultra5 is Aurora Linux 2.0, which
has xorg-6.8.1-13sparc.

Concerning the kernel change, reverting the patch simply
makes the kernel do that it did prior to 2.6.16-git6,
including ignoring the prot parameter. So it should be safe,
although perhaps not optimal.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-07-06  3:40   ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-06  3:40 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Mon, 3 Jul 2006 13:17:44 +0200 (MEST)

> I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
> something PCI-related, presumably the ATI card. The protection
> bits passed into io_remap_pfn_range() are 0x80...0788, while
> pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
> which, if I read things correctly, means that _PAGE_W_4U and
> _PAGE_MODIFIED_4U don't get set any more.
> 
> I guess something else in the kernel should have set those
> bits before they got to io_remap_pfn_range()?

The problem is with X, it should not be doing a MAP_SHARED
mmap() of the framebuffer device.  It should be using
MAP_PRIVATE instead.

The kernel is trying to provide copy-on-write semantics for
the mapping, which doesn't make any sense for device registers.
That's why the kernel isn't setting the writable or modified
bits in the protection bitmask.

Please fix the X server :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-06  3:40   ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-06  3:40 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Mon, 3 Jul 2006 13:17:44 +0200 (MEST)

> I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
> something PCI-related, presumably the ATI card. The protection
> bits passed into io_remap_pfn_range() are 0x80...0788, while
> pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
> which, if I read things correctly, means that _PAGE_W_4U and
> _PAGE_MODIFIED_4U don't get set any more.
> 
> I guess something else in the kernel should have set those
> bits before they got to io_remap_pfn_range()?

The problem is with X, it should not be doing a MAP_SHARED
mmap() of the framebuffer device.  It should be using
MAP_PRIVATE instead.

The kernel is trying to provide copy-on-write semantics for
the mapping, which doesn't make any sense for device registers.
That's why the kernel isn't setting the writable or modified
bits in the protection bitmask.

Please fix the X server :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-07-06  9:37 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-06  9:37 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Wed, 05 Jul 2006 20:40:36 -0700 (PDT), David Miller wrote:
>> I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
>> something PCI-related, presumably the ATI card. The protection
>> bits passed into io_remap_pfn_range() are 0x80...0788, while
>> pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
>> which, if I read things correctly, means that _PAGE_W_4U and
>> _PAGE_MODIFIED_4U don't get set any more.
>> 
>> I guess something else in the kernel should have set those
>> bits before they got to io_remap_pfn_range()?
>
>The problem is with X, it should not be doing a MAP_SHARED
>mmap() of the framebuffer device.  It should be using
>MAP_PRIVATE instead.
>
>The kernel is trying to provide copy-on-write semantics for
>the mapping, which doesn't make any sense for device registers.
>That's why the kernel isn't setting the writable or modified
>bits in the protection bitmask.

Now I'm confused. That COW behaviour would be consistent with
MAP_PRIVATE, not MAP_SHARED which is what X did use.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-06  9:37 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-06  9:37 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Wed, 05 Jul 2006 20:40:36 -0700 (PDT), David Miller wrote:
>> I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
>> something PCI-related, presumably the ATI card. The protection
>> bits passed into io_remap_pfn_range() are 0x80...0788, while
>> pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
>> which, if I read things correctly, means that _PAGE_W_4U and
>> _PAGE_MODIFIED_4U don't get set any more.
>> 
>> I guess something else in the kernel should have set those
>> bits before they got to io_remap_pfn_range()?
>
>The problem is with X, it should not be doing a MAP_SHARED
>mmap() of the framebuffer device.  It should be using
>MAP_PRIVATE instead.
>
>The kernel is trying to provide copy-on-write semantics for
>the mapping, which doesn't make any sense for device registers.
>That's why the kernel isn't setting the writable or modified
>bits in the protection bitmask.

Now I'm confused. That COW behaviour would be consistent with
MAP_PRIVATE, not MAP_SHARED which is what X did use.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-06  9:37 ` Mikael Pettersson
@ 2006-07-07  7:05   ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-07  7:05 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Thu, 6 Jul 2006 11:37:35 +0200 (MEST)

> On Wed, 05 Jul 2006 20:40:36 -0700 (PDT), David Miller wrote:
> >> I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
> >> something PCI-related, presumably the ATI card. The protection
> >> bits passed into io_remap_pfn_range() are 0x80...0788, while
> >> pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
> >> which, if I read things correctly, means that _PAGE_W_4U and
> >> _PAGE_MODIFIED_4U don't get set any more.
> >> 
> >> I guess something else in the kernel should have set those
> >> bits before they got to io_remap_pfn_range()?
> >
> >The problem is with X, it should not be doing a MAP_SHARED
> >mmap() of the framebuffer device.  It should be using
> >MAP_PRIVATE instead.
> >
> >The kernel is trying to provide copy-on-write semantics for
> >the mapping, which doesn't make any sense for device registers.
> >That's why the kernel isn't setting the writable or modified
> >bits in the protection bitmask.
> 
> Now I'm confused. That COW behaviour would be consistent with
> MAP_PRIVATE, not MAP_SHARED which is what X did use.

Yes, I'm totally wrong here, MAP_SHARED is correct.

I'll have to figure out how the writeable bits get lost
in the call chain.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-07  7:05   ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-07  7:05 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Thu, 6 Jul 2006 11:37:35 +0200 (MEST)

> On Wed, 05 Jul 2006 20:40:36 -0700 (PDT), David Miller wrote:
> >> I.e., X did a simple PROT_READ|PROT_WRITE MAP_SHARED mmap() of
> >> something PCI-related, presumably the ATI card. The protection
> >> bits passed into io_remap_pfn_range() are 0x80...0788, while
> >> pg_iobits are 0x80...0f8a. Current kernels obey the prot bits,
> >> which, if I read things correctly, means that _PAGE_W_4U and
> >> _PAGE_MODIFIED_4U don't get set any more.
> >> 
> >> I guess something else in the kernel should have set those
> >> bits before they got to io_remap_pfn_range()?
> >
> >The problem is with X, it should not be doing a MAP_SHARED
> >mmap() of the framebuffer device.  It should be using
> >MAP_PRIVATE instead.
> >
> >The kernel is trying to provide copy-on-write semantics for
> >the mapping, which doesn't make any sense for device registers.
> >That's why the kernel isn't setting the writable or modified
> >bits in the protection bitmask.
> 
> Now I'm confused. That COW behaviour would be consistent with
> MAP_PRIVATE, not MAP_SHARED which is what X did use.

Yes, I'm totally wrong here, MAP_SHARED is correct.

I'll have to figure out how the writeable bits get lost
in the call chain.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
                   ` (4 preceding siblings ...)
  (?)
@ 2006-07-07  8:40 ` René Rebe
  -1 siblings, 0 replies; 38+ messages in thread
From: René Rebe @ 2006-07-07  8:40 UTC (permalink / raw)
  To: sparclinux

Hi,

On Jul 5, 2006, at 11:07 PM, Ludovic Courtès wrote:

> 2 days, 9 hours, 28 minutes, 40 seconds ago, Mikael Pettersson wrote:
>> In 2.6.17 sparc64 kernels, X11 runs _extremely_ slowly with
>> frequent lock-up like behaviour on my Ultra5 (ATI Mach64).
>>
>> I finally managed to trace the cause to this change in 2.6.16-git6:
>
> Can you be more specific as to which version of Xorg you are using?
>
> Several people including me have reported being unable to run Xorg 7.0
> on ATI-based sparc64 machines such as U5s [0].  Symptoms include  
> kernel
> hangs as well as transient [1,2] or permanent [3] filesystem  
> corruptions.
>
> I just recompiled a kernel with your patch and it doesn't make any
> difference in that respect (besides, re-adding the
> `prot = __pgprot(...)' line looks questionable since the input  
> value of
> PROT becomes unused as a result).
>
> Xorg 6.9, OTOH, was apparently working fine for most people.

Xorg 7.0 works for me on said U5 and U30 just fine ("distribution" is
T2 - http://www.t2-project.org).

When I'm back in the office (abroad right now) I can patch X to mmap
as advised by David.

Yours,

-- 
Rene Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
            http://exactcode.de | http://t2-project.org | http:// 
rebe.name
            +49 (0)30 / 255 897 45



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
                   ` (5 preceding siblings ...)
  (?)
@ 2006-07-07  9:32 ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-07  9:32 UTC (permalink / raw)
  To: sparclinux

From: René Rebe <rene@exactcode.de>
Date: Fri, 7 Jul 2006 10:40:49 +0200

> When I'm back in the office (abroad right now) I can patch X to mmap
> as advised by David.

Don't bother, my suggestion was bogus.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-07  7:05   ` David Miller
@ 2006-07-28  1:13     ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-28  1:13 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: David Miller <davem@davemloft.net>
Date: Fri, 07 Jul 2006 00:05:24 -0700 (PDT)

> I'll have to figure out how the writeable bits get lost
> in the call chain.

Actually, I digged further, things seem correct.

Initially we only set the SW-writable bit, and this is the right thing
to do for a MAP_SHARED writable mapping.

If the process actually tries to write to the mapping, the page fault
path will set the two bits that actually enable writes, namely the
HW-writable bit and the SW-dirty bit.

This occurs when pte_mkdirty() is called on the PTE during the
execution of mm/memory.c:handle_pte_fault(), right here:

	if (write_access) {
		if (!pte_write(entry))
			return do_wp_page(mm, vma, address,
					pte, pmd, ptl, entry);
		entry = pte_mkdirty(entry);
	}

pte_write() will return true, since the SW-writable bit is set.  So we
don't should not invoke do_wp_page(), and we'll just set the dirty bit
on the existing PTE.

For some reason that isn't happening properly, or something keeps
clearing the HW-writable bit on us.  Another possibility is that
one of these operations sets the cacheable bits, or clears the
side-effect bit, either of which would cause corruption or other
problems when accessing the ATI card through such a mapping.

I wonder why.... I'll try to run some experiments on my system to try
and get to the bottom of this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-28  1:13     ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-28  1:13 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: David Miller <davem@davemloft.net>
Date: Fri, 07 Jul 2006 00:05:24 -0700 (PDT)

> I'll have to figure out how the writeable bits get lost
> in the call chain.

Actually, I digged further, things seem correct.

Initially we only set the SW-writable bit, and this is the right thing
to do for a MAP_SHARED writable mapping.

If the process actually tries to write to the mapping, the page fault
path will set the two bits that actually enable writes, namely the
HW-writable bit and the SW-dirty bit.

This occurs when pte_mkdirty() is called on the PTE during the
execution of mm/memory.c:handle_pte_fault(), right here:

	if (write_access) {
		if (!pte_write(entry))
			return do_wp_page(mm, vma, address,
					pte, pmd, ptl, entry);
		entry = pte_mkdirty(entry);
	}

pte_write() will return true, since the SW-writable bit is set.  So we
don't should not invoke do_wp_page(), and we'll just set the dirty bit
on the existing PTE.

For some reason that isn't happening properly, or something keeps
clearing the HW-writable bit on us.  Another possibility is that
one of these operations sets the cacheable bits, or clears the
side-effect bit, either of which would cause corruption or other
problems when accessing the ATI card through such a mapping.

I wonder why.... I'll try to run some experiments on my system to try
and get to the bottom of this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-28  1:13     ` David Miller
@ 2006-07-28  3:38       ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-28  3:38 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: David Miller <davem@davemloft.net>
Date: Thu, 27 Jul 2006 18:13:56 -0700 (PDT)

> If the process actually tries to write to the mapping, the page fault
> path will set the two bits that actually enable writes, namely the
> HW-writable bit and the SW-dirty bit.
> 
> This occurs when pte_mkdirty() is called on the PTE during the
> execution of mm/memory.c:handle_pte_fault(), right here:
> 
> 	if (write_access) {
> 		if (!pte_write(entry))
> 			return do_wp_page(mm, vma, address,
> 					pte, pmd, ptl, entry);
> 		entry = pte_mkdirty(entry);
> 	}
> 
> pte_write() will return true, since the SW-writable bit is set.  So we
> don't should not invoke do_wp_page(), and we'll just set the dirty bit
> on the existing PTE.

I just confirmed that this is working properly with a debugging
patch included below.

Mikael, can you put this debugging patch into a kernel that exhibits
the problem and post all the "FAULT: " debugging messages that appear
in your kernel log when the problem happens?

Thanks a lot.

diff --git a/mm/memory.c b/mm/memory.c
index 109e986..b129ae4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2270,6 +2270,12 @@ static inline int handle_pte_fault(struc
 	spinlock_t *ptl;
 
 	old_entry = entry = *pte;
+#if 1
+	if (pte_val(old_entry) & _PAGE_E_4U) {
+		printk("FAULT: write(%d) old_entry[%016lx]\n",
+		       write_access, pte_val(old_entry));
+	}
+#endif
 	if (!pte_present(entry)) {
 		if (pte_none(entry)) {
 			if (!vma->vm_ops || !vma->vm_ops->nopage)
@@ -2311,6 +2317,12 @@ static inline int handle_pte_fault(struc
 			flush_tlb_page(vma, address);
 	}
 unlock:
+#if 1
+	if (pte_val(old_entry) & _PAGE_E_4U) {
+		printk("FAULT: After, entry[%016lx]\n",
+		       pte_val(entry));
+	}
+#endif
 	pte_unmap_unlock(pte, ptl);
 	return VM_FAULT_MINOR;
 }

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-28  3:38       ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-28  3:38 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: David Miller <davem@davemloft.net>
Date: Thu, 27 Jul 2006 18:13:56 -0700 (PDT)

> If the process actually tries to write to the mapping, the page fault
> path will set the two bits that actually enable writes, namely the
> HW-writable bit and the SW-dirty bit.
> 
> This occurs when pte_mkdirty() is called on the PTE during the
> execution of mm/memory.c:handle_pte_fault(), right here:
> 
> 	if (write_access) {
> 		if (!pte_write(entry))
> 			return do_wp_page(mm, vma, address,
> 					pte, pmd, ptl, entry);
> 		entry = pte_mkdirty(entry);
> 	}
> 
> pte_write() will return true, since the SW-writable bit is set.  So we
> don't should not invoke do_wp_page(), and we'll just set the dirty bit
> on the existing PTE.

I just confirmed that this is working properly with a debugging
patch included below.

Mikael, can you put this debugging patch into a kernel that exhibits
the problem and post all the "FAULT: " debugging messages that appear
in your kernel log when the problem happens?

Thanks a lot.

diff --git a/mm/memory.c b/mm/memory.c
index 109e986..b129ae4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2270,6 +2270,12 @@ static inline int handle_pte_fault(struc
 	spinlock_t *ptl;
 
 	old_entry = entry = *pte;
+#if 1
+	if (pte_val(old_entry) & _PAGE_E_4U) {
+		printk("FAULT: write(%d) old_entry[%016lx]\n",
+		       write_access, pte_val(old_entry));
+	}
+#endif
 	if (!pte_present(entry)) {
 		if (pte_none(entry)) {
 			if (!vma->vm_ops || !vma->vm_ops->nopage)
@@ -2311,6 +2317,12 @@ static inline int handle_pte_fault(struc
 			flush_tlb_page(vma, address);
 	}
 unlock:
+#if 1
+	if (pte_val(old_entry) & _PAGE_E_4U) {
+		printk("FAULT: After, entry[%016lx]\n",
+		       pte_val(entry));
+	}
+#endif
 	pte_unmap_unlock(pte, ptl);
 	return VM_FAULT_MINOR;
 }

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-07-28 10:35 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-28 10:35 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Thu, 27 Jul 2006 20:38:59 -0700 (PDT), David Miller wrote:
>I just confirmed that this is working properly with a debugging
>patch included below.
>
>Mikael, can you put this debugging patch into a kernel that exhibits
>the problem and post all the "FAULT: " debugging messages that appear
>in your kernel log when the problem happens?
>
>Thanks a lot.
>
>diff --git a/mm/memory.c b/mm/memory.c
>index 109e986..b129ae4 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -2270,6 +2270,12 @@ static inline int handle_pte_fault(struc
> 	spinlock_t *ptl;
> 
> 	old_entry = entry = *pte;
>+#if 1
>+	if (pte_val(old_entry) & _PAGE_E_4U) {
>+		printk("FAULT: write(%d) old_entry[%016lx]\n",
>+		       write_access, pte_val(old_entry));
>+	}
>+#endif
> 	if (!pte_present(entry)) {
> 		if (pte_none(entry)) {
> 			if (!vma->vm_ops || !vma->vm_ops->nopage)
>@@ -2311,6 +2317,12 @@ static inline int handle_pte_fault(struc
> 			flush_tlb_page(vma, address);
> 	}
> unlock:
>+#if 1
>+	if (pte_val(old_entry) & _PAGE_E_4U) {
>+		printk("FAULT: After, entry[%016lx]\n",
>+		       pte_val(entry));
>+	}
>+#endif
> 	pte_unmap_unlock(pte, ptl);
> 	return VM_FAULT_MINOR;
> }

Sure. Here's what 2.6.18-rc2 (vanilla) prints when I start X:

FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe13fe788]
FAULT: After, entry[800001ffe13fef8a]
FAULT: write(1) old_entry[e00001ffe1978788]
FAULT: After, entry[e00001ffe1978f8a]
FAULT: write(1) old_entry[e00001ffe1970788]
FAULT: After, entry[e00001ffe1970f8a]
FAULT: write(1) old_entry[e00001ffe1970f8a]
FAULT: After, entry[e00001ffe1970f8a]
FAULT: write(1) old_entry[e00001ffe1970f8a]
FAULT: After, entry[e00001ffe1970f8a]

The last two lines then repeat semi-infinitely, and they
were generated at an extremely high rate.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-28 10:35 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-07-28 10:35 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Thu, 27 Jul 2006 20:38:59 -0700 (PDT), David Miller wrote:
>I just confirmed that this is working properly with a debugging
>patch included below.
>
>Mikael, can you put this debugging patch into a kernel that exhibits
>the problem and post all the "FAULT: " debugging messages that appear
>in your kernel log when the problem happens?
>
>Thanks a lot.
>
>diff --git a/mm/memory.c b/mm/memory.c
>index 109e986..b129ae4 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -2270,6 +2270,12 @@ static inline int handle_pte_fault(struc
> 	spinlock_t *ptl;
> 
> 	old_entry = entry = *pte;
>+#if 1
>+	if (pte_val(old_entry) & _PAGE_E_4U) {
>+		printk("FAULT: write(%d) old_entry[%016lx]\n",
>+		       write_access, pte_val(old_entry));
>+	}
>+#endif
> 	if (!pte_present(entry)) {
> 		if (pte_none(entry)) {
> 			if (!vma->vm_ops || !vma->vm_ops->nopage)
>@@ -2311,6 +2317,12 @@ static inline int handle_pte_fault(struc
> 			flush_tlb_page(vma, address);
> 	}
> unlock:
>+#if 1
>+	if (pte_val(old_entry) & _PAGE_E_4U) {
>+		printk("FAULT: After, entry[%016lx]\n",
>+		       pte_val(entry));
>+	}
>+#endif
> 	pte_unmap_unlock(pte, ptl);
> 	return VM_FAULT_MINOR;
> }

Sure. Here's what 2.6.18-rc2 (vanilla) prints when I start X:

FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe2000788]
FAULT: After, entry[800001ffe2000f8a]
FAULT: write(1) old_entry[800001ffe13fe788]
FAULT: After, entry[800001ffe13fef8a]
FAULT: write(1) old_entry[e00001ffe1978788]
FAULT: After, entry[e00001ffe1978f8a]
FAULT: write(1) old_entry[e00001ffe1970788]
FAULT: After, entry[e00001ffe1970f8a]
FAULT: write(1) old_entry[e00001ffe1970f8a]
FAULT: After, entry[e00001ffe1970f8a]
FAULT: write(1) old_entry[e00001ffe1970f8a]
FAULT: After, entry[e00001ffe1970f8a]

The last two lines then repeat semi-infinitely, and they
were generated at an extremely high rate.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-28 10:35 ` Mikael Pettersson
@ 2006-07-28 11:13   ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-28 11:13 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)

> FAULT: write(1) old_entry[e00001ffe1970f8a]
> FAULT: After, entry[e00001ffe1970f8a]
> 
> The last two lines then repeat semi-infinitely, and they
> were generated at an extremely high rate.

Thanks, this should help me find the bug.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-07-28 11:13   ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-07-28 11:13 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)

> FAULT: write(1) old_entry[e00001ffe1970f8a]
> FAULT: After, entry[e00001ffe1970f8a]
> 
> The last two lines then repeat semi-infinitely, and they
> were generated at an extremely high rate.

Thanks, this should help me find the bug.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-28 10:35 ` Mikael Pettersson
@ 2006-08-01  5:42   ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-08-01  5:42 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)

> FAULT: write(1) old_entry[e00001ffe1970f8a]
> FAULT: After, entry[e00001ffe1970f8a]
> FAULT: write(1) old_entry[e00001ffe1970f8a]
> FAULT: After, entry[e00001ffe1970f8a]
> 
> The last two lines then repeat semi-infinitely, and they
> were generated at an extremely high rate.

It looks like the TSB is never updated.

Do you have CONFIG_HUGETLB_PAGE disabled by chance?
I bet that's part of what helps trigger this bug.

Meanwhile I think I know what's wrong, I'll let you
know when I have a fix to test out.

Thanks a lot.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-08-01  5:42   ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-08-01  5:42 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)

> FAULT: write(1) old_entry[e00001ffe1970f8a]
> FAULT: After, entry[e00001ffe1970f8a]
> FAULT: write(1) old_entry[e00001ffe1970f8a]
> FAULT: After, entry[e00001ffe1970f8a]
> 
> The last two lines then repeat semi-infinitely, and they
> were generated at an extremely high rate.

It looks like the TSB is never updated.

Do you have CONFIG_HUGETLB_PAGE disabled by chance?
I bet that's part of what helps trigger this bug.

Meanwhile I think I know what's wrong, I'll let you
know when I have a fix to test out.

Thanks a lot.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-08-01 11:30 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-08-01 11:30 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@csd.uu.se>
To: David Miller <davem@davemloft.net>
Subject: Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
In-Reply-To: <20060731.224235.19784785.davem@davemloft.net>
References: <200607281035.k6SAZOJ3015670@harpo.it.uu.se>
	<20060731.224235.19784785.davem@davemloft.net>
X-Mailer: VM 7.17 under Emacs 20.7.1
--text follows this line--
David Miller writes:
 > From: Mikael Pettersson <mikpe@it.uu.se>
 > Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)
 > 
 > > FAULT: write(1) old_entry[e00001ffe1970f8a]
 > > FAULT: After, entry[e00001ffe1970f8a]
 > > FAULT: write(1) old_entry[e00001ffe1970f8a]
 > > FAULT: After, entry[e00001ffe1970f8a]
 > > 
 > > The last two lines then repeat semi-infinitely, and they
 > > were generated at an extremely high rate.
 > 
 > It looks like the TSB is never updated.
 > 
 > Do you have CONFIG_HUGETLB_PAGE disabled by chance?
 > I bet that's part of what helps trigger this bug.

I'm away from my U5 right now and won't be able to check for
certain until later this week, but I'm pretty sure CONFIG_HUGETLBFS
and CONFIG_HUGETLB_PAGE are disabled in its kernel.

 > Meanwhile I think I know what's wrong, I'll let you
 > know when I have a fix to test out.

Thanks. I'm looking forward to it.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-08-01 11:30 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-08-01 11:30 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@csd.uu.se>
To: David Miller <davem@davemloft.net>
Subject: Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
In-Reply-To: <20060731.224235.19784785.davem@davemloft.net>
References: <200607281035.k6SAZOJ3015670@harpo.it.uu.se>
	<20060731.224235.19784785.davem@davemloft.net>
X-Mailer: VM 7.17 under Emacs 20.7.1
--text follows this line--
David Miller writes:
 > From: Mikael Pettersson <mikpe@it.uu.se>
 > Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)
 > 
 > > FAULT: write(1) old_entry[e00001ffe1970f8a]
 > > FAULT: After, entry[e00001ffe1970f8a]
 > > FAULT: write(1) old_entry[e00001ffe1970f8a]
 > > FAULT: After, entry[e00001ffe1970f8a]
 > > 
 > > The last two lines then repeat semi-infinitely, and they
 > > were generated at an extremely high rate.
 > 
 > It looks like the TSB is never updated.
 > 
 > Do you have CONFIG_HUGETLB_PAGE disabled by chance?
 > I bet that's part of what helps trigger this bug.

I'm away from my U5 right now and won't be able to check for
certain until later this week, but I'm pretty sure CONFIG_HUGETLBFS
and CONFIG_HUGETLB_PAGE are disabled in its kernel.

 > Meanwhile I think I know what's wrong, I'll let you
 > know when I have a fix to test out.

Thanks. I'm looking forward to it.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-08-06 21:09 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-08-06 21:09 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Tue, 1 Aug 2006 13:30:08 +0200 (MEST), Mikael Pettersson wrote:
>David Miller writes:
> > From: Mikael Pettersson <mikpe@it.uu.se>
> > Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)
> > 
> > > FAULT: write(1) old_entry[e00001ffe1970f8a]
> > > FAULT: After, entry[e00001ffe1970f8a]
> > > FAULT: write(1) old_entry[e00001ffe1970f8a]
> > > FAULT: After, entry[e00001ffe1970f8a]
> > > 
> > > The last two lines then repeat semi-infinitely, and they
> > > were generated at an extremely high rate.
> > 
> > It looks like the TSB is never updated.
> > 
> > Do you have CONFIG_HUGETLB_PAGE disabled by chance?
> > I bet that's part of what helps trigger this bug.
>
>I'm away from my U5 right now and won't be able to check for
>certain until later this week, but I'm pretty sure CONFIG_HUGETLBFS
>and CONFIG_HUGETLB_PAGE are disabled in its kernel.

Correction, it turns out that all my 2.6 sparc64 kernels have
had CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE enabled.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-08-06 21:09 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-08-06 21:09 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Tue, 1 Aug 2006 13:30:08 +0200 (MEST), Mikael Pettersson wrote:
>David Miller writes:
> > From: Mikael Pettersson <mikpe@it.uu.se>
> > Date: Fri, 28 Jul 2006 12:35:24 +0200 (MEST)
> > 
> > > FAULT: write(1) old_entry[e00001ffe1970f8a]
> > > FAULT: After, entry[e00001ffe1970f8a]
> > > FAULT: write(1) old_entry[e00001ffe1970f8a]
> > > FAULT: After, entry[e00001ffe1970f8a]
> > > 
> > > The last two lines then repeat semi-infinitely, and they
> > > were generated at an extremely high rate.
> > 
> > It looks like the TSB is never updated.
> > 
> > Do you have CONFIG_HUGETLB_PAGE disabled by chance?
> > I bet that's part of what helps trigger this bug.
>
>I'm away from my U5 right now and won't be able to check for
>certain until later this week, but I'm pretty sure CONFIG_HUGETLBFS
>and CONFIG_HUGETLB_PAGE are disabled in its kernel.

Correction, it turns out that all my 2.6 sparc64 kernels have
had CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE enabled.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-08-06 21:09 ` Mikael Pettersson
@ 2006-08-06 23:37   ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-08-06 23:37 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Sun, 6 Aug 2006 23:09:38 +0200 (MEST)

> Correction, it turns out that all my 2.6 sparc64 kernels have
> had CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE enabled.

Ok, thanks for the info.

I'm still a bit stumped about this bug and trying to figure out what's
wrong.  I'll let you know when I have a patch to test.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-08-06 23:37   ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-08-06 23:37 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux

From: Mikael Pettersson <mikpe@it.uu.se>
Date: Sun, 6 Aug 2006 23:09:38 +0200 (MEST)

> Correction, it turns out that all my 2.6 sparc64 kernels have
> had CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE enabled.

Ok, thanks for the info.

I'm still a bit stumped about this bug and trying to figure out what's
wrong.  I'll let you know when I have a patch to test.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-08-01  5:42   ` David Miller
@ 2006-08-28  7:39     ` David Miller
  -1 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-08-28  7:39 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux


Ok, I finally figured this one out and reproduced it on my
ultra5.

If, for example, we have a 4MB PTE and we do a write
we'll only update one of the 8KB sub-PTEs of that
mapping.

This is fine until that new mapping gets displaced from
the TLB and someone does a read that ends up hitting one
of the sub-PTEs that didn't get it's write-enable bit
set yet.

At this point we have a problem, because if a write is
made to the original address, the kernel says "the writable
bit is set, nothing to do".  So it won't flush the TLB,
and therefore it won't kick out the TLB mapping brought
in by the read.

So we just get wedged here until something displaces that
TLB entry.  This is why X acts sluggish and since it can
loop like this for quite a while the X server and the
hardware can get plenty confused.

The end result is that we have to make sure any PTE updates
propagate to all sub-PTEs of a large mapping during any
change.  That's really expensive and we'd have to add some
complex code to the set_pte_at() code path just to handle
this.

So the easiest way to fix this, without having to disable
largepage PTE mappings of I/O devices, is the patch below.
I will push this to Linus for 2.6.18 and -stable so that
2.6.17 gets it too.

commit 6ad7d29d2edd8c3d632e71454f619f5c0c6c2703
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Mon Aug 28 00:33:03 2006 -0700

    [SPARC64]: Fix X server hangs due to large pages.
    
    This problem was introduced by changeset
    14778d9072e53d2171f66ffd9657daff41acfaed
    
    Unlike the hugetlb code paths, the normal fault code is not setup to
    propagate PTE changes for large page sizes correctly like the ones we
    make for I/O mappings in io_remap_pfn_range().
    
    It is absolutely necessary to update all sub-ptes of a largepage
    mapping on a fault.  Adding special handling for this would add
    considerably complexity to tlb_batch_add().  So let's just side-step
    the issue and forcefully dirty any writable PTEs created by
    io_remap_pfn_range().
    
    The only other real option would be to disable to large PTE code of
    io_remap_pfn_range() and we really don't want to do that.
    
    Much thanks to Mikael Pettersson for tracking down this problem and
    testing debug patches.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c
index 8cb0620..af9d81d 100644
--- a/arch/sparc64/mm/generic.c
+++ b/arch/sparc64/mm/generic.c
@@ -69,6 +69,8 @@ static inline void io_remap_pte_range(st
 		} else
 			offset += PAGE_SIZE;
 
+		if (pte_write(entry))
+			entry = pte_mkdirty(entry);
 		do {
 			BUG_ON(!pte_none(*pte));
 			set_pte_at(mm, address, pte, entry);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-08-28  7:39     ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2006-08-28  7:39 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel, sparclinux


Ok, I finally figured this one out and reproduced it on my
ultra5.

If, for example, we have a 4MB PTE and we do a write
we'll only update one of the 8KB sub-PTEs of that
mapping.

This is fine until that new mapping gets displaced from
the TLB and someone does a read that ends up hitting one
of the sub-PTEs that didn't get it's write-enable bit
set yet.

At this point we have a problem, because if a write is
made to the original address, the kernel says "the writable
bit is set, nothing to do".  So it won't flush the TLB,
and therefore it won't kick out the TLB mapping brought
in by the read.

So we just get wedged here until something displaces that
TLB entry.  This is why X acts sluggish and since it can
loop like this for quite a while the X server and the
hardware can get plenty confused.

The end result is that we have to make sure any PTE updates
propagate to all sub-PTEs of a large mapping during any
change.  That's really expensive and we'd have to add some
complex code to the set_pte_at() code path just to handle
this.

So the easiest way to fix this, without having to disable
largepage PTE mappings of I/O devices, is the patch below.
I will push this to Linus for 2.6.18 and -stable so that
2.6.17 gets it too.

commit 6ad7d29d2edd8c3d632e71454f619f5c0c6c2703
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Mon Aug 28 00:33:03 2006 -0700

    [SPARC64]: Fix X server hangs due to large pages.
    
    This problem was introduced by changeset
    14778d9072e53d2171f66ffd9657daff41acfaed
    
    Unlike the hugetlb code paths, the normal fault code is not setup to
    propagate PTE changes for large page sizes correctly like the ones we
    make for I/O mappings in io_remap_pfn_range().
    
    It is absolutely necessary to update all sub-ptes of a largepage
    mapping on a fault.  Adding special handling for this would add
    considerably complexity to tlb_batch_add().  So let's just side-step
    the issue and forcefully dirty any writable PTEs created by
    io_remap_pfn_range().
    
    The only other real option would be to disable to large PTE code of
    io_remap_pfn_range() and we really don't want to do that.
    
    Much thanks to Mikael Pettersson for tracking down this problem and
    testing debug patches.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c
index 8cb0620..af9d81d 100644
--- a/arch/sparc64/mm/generic.c
+++ b/arch/sparc64/mm/generic.c
@@ -69,6 +69,8 @@ static inline void io_remap_pte_range(st
 		} else
 			offset += PAGE_SIZE;
 
+		if (pte_write(entry))
+			entry = pte_mkdirty(entry);
 		do {
 			BUG_ON(!pte_none(*pte));
 			set_pte_at(mm, address, pte, entry);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
  2006-07-03 11:17 ` Mikael Pettersson
@ 2006-08-28 21:18 ` Mikael Pettersson
  -1 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-08-28 21:18 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Mon, 28 Aug 2006 00:39:08 -0700 (PDT), David Miller wrote:
>Ok, I finally figured this one out and reproduced it on my
>ultra5.
>
>If, for example, we have a 4MB PTE and we do a write
>we'll only update one of the 8KB sub-PTEs of that
>mapping.
>
>This is fine until that new mapping gets displaced from
>the TLB and someone does a read that ends up hitting one
>of the sub-PTEs that didn't get it's write-enable bit
>set yet.
>
>At this point we have a problem, because if a write is
>made to the original address, the kernel says "the writable
>bit is set, nothing to do".  So it won't flush the TLB,
>and therefore it won't kick out the TLB mapping brought
>in by the read.
>
>So we just get wedged here until something displaces that
>TLB entry.  This is why X acts sluggish and since it can
>loop like this for quite a while the X server and the
>hardware can get plenty confused.
>
>The end result is that we have to make sure any PTE updates
>propagate to all sub-PTEs of a large mapping during any
>change.  That's really expensive and we'd have to add some
>complex code to the set_pte_at() code path just to handle
>this.
>
>So the easiest way to fix this, without having to disable
>largepage PTE mappings of I/O devices, is the patch below.
>I will push this to Linus for 2.6.18 and -stable so that
>2.6.17 gets it too.
>
>commit 6ad7d29d2edd8c3d632e71454f619f5c0c6c2703
>Author: David S. Miller <davem@sunset.davemloft.net>
>Date:   Mon Aug 28 00:33:03 2006 -0700
>
>    [SPARC64]: Fix X server hangs due to large pages.
>    
>    This problem was introduced by changeset
>    14778d9072e53d2171f66ffd9657daff41acfaed
>    
>    Unlike the hugetlb code paths, the normal fault code is not setup to
>    propagate PTE changes for large page sizes correctly like the ones we
>    make for I/O mappings in io_remap_pfn_range().
>    
>    It is absolutely necessary to update all sub-ptes of a largepage
>    mapping on a fault.  Adding special handling for this would add
>    considerably complexity to tlb_batch_add().  So let's just side-step
>    the issue and forcefully dirty any writable PTEs created by
>    io_remap_pfn_range().
>    
>    The only other real option would be to disable to large PTE code of
>    io_remap_pfn_range() and we really don't want to do that.
>    
>    Much thanks to Mikael Pettersson for tracking down this problem and
>    testing debug patches.
>    
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
>diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c
>index 8cb0620..af9d81d 100644
>--- a/arch/sparc64/mm/generic.c
>+++ b/arch/sparc64/mm/generic.c
>@@ -69,6 +69,8 @@ static inline void io_remap_pte_range(st
> 		} else
> 			offset += PAGE_SIZE;
> 
>+		if (pte_write(entry))
>+			entry = pte_mkdirty(entry);
> 		do {
> 			BUG_ON(!pte_none(*pte));
> 			set_pte_at(mm, address, pte, entry);
> 

Thanks. X works fine on my U5 now with 2.6.18-rc5 + this patch.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64
@ 2006-08-28 21:18 ` Mikael Pettersson
  0 siblings, 0 replies; 38+ messages in thread
From: Mikael Pettersson @ 2006-08-28 21:18 UTC (permalink / raw)
  To: davem, mikpe; +Cc: linux-kernel, sparclinux

On Mon, 28 Aug 2006 00:39:08 -0700 (PDT), David Miller wrote:
>Ok, I finally figured this one out and reproduced it on my
>ultra5.
>
>If, for example, we have a 4MB PTE and we do a write
>we'll only update one of the 8KB sub-PTEs of that
>mapping.
>
>This is fine until that new mapping gets displaced from
>the TLB and someone does a read that ends up hitting one
>of the sub-PTEs that didn't get it's write-enable bit
>set yet.
>
>At this point we have a problem, because if a write is
>made to the original address, the kernel says "the writable
>bit is set, nothing to do".  So it won't flush the TLB,
>and therefore it won't kick out the TLB mapping brought
>in by the read.
>
>So we just get wedged here until something displaces that
>TLB entry.  This is why X acts sluggish and since it can
>loop like this for quite a while the X server and the
>hardware can get plenty confused.
>
>The end result is that we have to make sure any PTE updates
>propagate to all sub-PTEs of a large mapping during any
>change.  That's really expensive and we'd have to add some
>complex code to the set_pte_at() code path just to handle
>this.
>
>So the easiest way to fix this, without having to disable
>largepage PTE mappings of I/O devices, is the patch below.
>I will push this to Linus for 2.6.18 and -stable so that
>2.6.17 gets it too.
>
>commit 6ad7d29d2edd8c3d632e71454f619f5c0c6c2703
>Author: David S. Miller <davem@sunset.davemloft.net>
>Date:   Mon Aug 28 00:33:03 2006 -0700
>
>    [SPARC64]: Fix X server hangs due to large pages.
>    
>    This problem was introduced by changeset
>    14778d9072e53d2171f66ffd9657daff41acfaed
>    
>    Unlike the hugetlb code paths, the normal fault code is not setup to
>    propagate PTE changes for large page sizes correctly like the ones we
>    make for I/O mappings in io_remap_pfn_range().
>    
>    It is absolutely necessary to update all sub-ptes of a largepage
>    mapping on a fault.  Adding special handling for this would add
>    considerably complexity to tlb_batch_add().  So let's just side-step
>    the issue and forcefully dirty any writable PTEs created by
>    io_remap_pfn_range().
>    
>    The only other real option would be to disable to large PTE code of
>    io_remap_pfn_range() and we really don't want to do that.
>    
>    Much thanks to Mikael Pettersson for tracking down this problem and
>    testing debug patches.
>    
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
>diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c
>index 8cb0620..af9d81d 100644
>--- a/arch/sparc64/mm/generic.c
>+++ b/arch/sparc64/mm/generic.c
>@@ -69,6 +69,8 @@ static inline void io_remap_pte_range(st
> 		} else
> 			offset += PAGE_SIZE;
> 
>+		if (pte_write(entry))
>+			entry = pte_mkdirty(entry);
> 		do {
> 			BUG_ON(!pte_none(*pte));
> 			set_pte_at(mm, address, pte, entry);
> 

Thanks. X works fine on my U5 now with 2.6.18-rc5 + this patch.

/Mikael

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2006-08-28 21:18 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-01 11:30 [BUG sparc64] 2.6.16-git6 broke X11 on Ultra5 with ATI Mach64 Mikael Pettersson
2006-08-01 11:30 ` Mikael Pettersson
  -- strict thread matches above, loose matches on Subject: below --
2006-08-28 21:18 Mikael Pettersson
2006-08-28 21:18 ` Mikael Pettersson
2006-08-06 21:09 Mikael Pettersson
2006-08-06 21:09 ` Mikael Pettersson
2006-08-06 23:37 ` David Miller
2006-08-06 23:37   ` David Miller
2006-07-28 10:35 Mikael Pettersson
2006-07-28 10:35 ` Mikael Pettersson
2006-07-28 11:13 ` David Miller
2006-07-28 11:13   ` David Miller
2006-08-01  5:42 ` David Miller
2006-08-01  5:42   ` David Miller
2006-08-28  7:39   ` David Miller
2006-08-28  7:39     ` David Miller
2006-07-06  9:37 Mikael Pettersson
2006-07-06  9:37 ` Mikael Pettersson
2006-07-07  7:05 ` David Miller
2006-07-07  7:05   ` David Miller
2006-07-28  1:13   ` David Miller
2006-07-28  1:13     ` David Miller
2006-07-28  3:38     ` David Miller
2006-07-28  3:38       ` David Miller
2006-07-04 10:03 Mikael Pettersson
2006-07-04 10:03 ` Mikael Pettersson
2006-07-03 11:17 Mikael Pettersson
2006-07-03 11:17 ` Mikael Pettersson
2006-07-04  7:41 ` Rene Rebe
2006-07-04  7:41   ` Rene Rebe
2006-07-04  9:32   ` Rene Rebe
2006-07-04  9:32     ` Rene Rebe
2006-07-05 21:07 ` Ludovic Courtès
2006-07-06  1:46 ` Mikael Pettersson
2006-07-06  3:40 ` David Miller
2006-07-06  3:40   ` David Miller
2006-07-07  8:40 ` René Rebe
2006-07-07  9:32 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.