linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bisected boot regression post 2.6.25-rc3.. please revert
@ 2008-03-01 18:56 Arjan van de Ven
  2008-03-03  7:46 ` Ingo Molnar
  2008-03-03 17:15 ` Nish Aravamudan
  0 siblings, 2 replies; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-01 18:56 UTC (permalink / raw)
  To: mingo, torvalds, hans.rosenfeld; +Cc: linux-kernel

Hi Linus, Ingo, Hans,

Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
( x86: fix pmd_bad and pud_bad to support huge pages )
since it prevents the kernel to finish booting on my (Penryn based)
laptop. The boot stops right after freeing the init memory.
Took a while to bisect (due to it touching page*.h, which forces a
full recompile), but it definitely is caused by this commit...

Please revert.



[arjan@localhost linux.trees.git]$ git-bisect good
cded932b75ab0a5f9181ee3da34a0a488d1a14fd is first bad commit
commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
Author: Hans Rosenfeld <hans.rosenfeld@amd.com>
Date:   Mon Feb 18 18:10:47 2008 +0100

    x86: fix pmd_bad and pud_bad to support huge pages
    
    I recently stumbled upon a problem in the support for huge pages. If a
    program using huge pages does not explicitly unmap them, they remain
    mapped (and therefore, are lost) after the program exits.
    
    I observed that the free huge page count in /proc/meminfo decreased when
    running my program, and it did not increase after the program exited.
    After running the program a few times, no more huge pages could be
    allocated.
    
    The reason for this seems to be that the x86 pmd_bad and pud_bad
    consider pmd/pud entries having the PSE bit set invalid. I think there
    is nothing wrong with this bit being set, it just indicates that the
    lowest level of translation has been reached. This bit has to be (and
    is) checked after the basic validity of the entry has been checked, like
    in this fragment from follow_page() in mm/memory.c:
    
      if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
              goto no_page_table;
    
      if (pmd_huge(*pmd)) {
              BUG_ON(flags & FOLL_GET);
              page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
              goto out;
      }
    
    Note that this code currently doesn't work as intended if the pmd refers
    to a huge page, the pmd_huge() check can not be reached if the page is
    huge.
    
    Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
    allow for the PSE bit being set fixes this. For similar reasons,
    allowing the NX bit being set is necessary, too. I have seen huge pages
    having the NX bit set in their pmd entry, which would cause the same
    problem.
    
    Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

:040000 040000 1051a11e76304c0fa9229af65cbbd288a8125651
fd91df38defe41df09dde3c0955fb32a4476467a M	include


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-01 18:56 bisected boot regression post 2.6.25-rc3.. please revert Arjan van de Ven
@ 2008-03-03  7:46 ` Ingo Molnar
  2008-03-03  9:13   ` Ingo Molnar
  2008-03-03 17:15 ` Nish Aravamudan
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03  7:46 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin


* Arjan van de Ven <arjan@linux.intel.com> wrote:

> Hi Linus, Ingo, Hans,
> 
> Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd ( x86: 
> fix pmd_bad and pud_bad to support huge pages ) since it prevents the 
> kernel to finish booting on my (Penryn based) laptop. The boot stops 
> right after freeing the init memory. Took a while to bisect (due to it 
> touching page*.h, which forces a full recompile), but it definitely is 
> caused by this commit...

hm, lets figure out why this patch breaks your box, ok? We obviously 
have to revert it if we cannot figure it out, but lets at least try - 
because the patch itself fixes a real regression and it's not obviously 
wrong either. I think there might be some bug hiding somewhere that we 
really want to fix instead of this revert.

Could you try to hack up a debug patch perhaps? Uninline the fucntion(s) 
then add two versions (one is that breaks on your box and one is that 
works on your box) of this same pmd_bad()/pud_bad() functions and do 
something like this (pseudocode):

pmd_bad()
{
	if (pmd_bad_working(x) != pmd_bad_broken(x))
		panic_timeout++;

	return pmd_bad_working(x);
}

i.e. we actually use the working function so your box should boot up 
just fine - but we instrument things with the broken function as well 
and detect the cases where the two values differ. It is an anomaly if 
either function ever returns true instead of false.

if after bootup you have a non-zero panic_timeout then there is a 
material difference somewhere. In that case try to stick a dump_stack() 
into the above case as well - and if the box still boots, send us the 
stackdumps. (if the box doesnt boot then perhaps the printout itself 
hangs - in that case try a save_stack_trace() hack and print it out 
later)

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03  7:46 ` Ingo Molnar
@ 2008-03-03  9:13   ` Ingo Molnar
  2008-03-03 16:41     ` Arjan van de Ven
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03  9:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin


* Ingo Molnar <mingo@elte.hu> wrote:

> Could you try to hack up a debug patch perhaps? Uninline the 
> fucntion(s) then add two versions (one is that breaks on your box and 
> one is that works on your box) of this same pmd_bad()/pud_bad() 
> functions and do something like this (pseudocode):

i.e. something like the (tested) patch below. It is not clear whether 
your testcase is on 32-bit or 64-bit - so i went for the more likely 
32-bit case first.

	Ingo

------------------>
Subject: x86: patches/x86-debug-bad-page.patch
From: Ingo Molnar <mingo@elte.hu>
Date: Mon Mar 03 09:53:17 CET 2008

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/mm/pgtable_32.c     |    7 +++++++
 include/asm-x86/pgtable_32.h |    6 +++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Index: linux-x86.q/arch/x86/mm/pgtable_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pgtable_32.c
+++ linux-x86.q/arch/x86/mm/pgtable_32.c
@@ -381,3 +381,10 @@ void __pmd_free_tlb(struct mmu_gather *t
 }
 
 #endif
+
+int pmd_bad(pmd_t pmd)
+{
+	WARN_ON_ONCE(pmd_bad_v1(pmd) != pmd_bad_v2(pmd));
+
+	return pmd_bad_v1(pmd);
+}
Index: linux-x86.q/include/asm-x86/pgtable_32.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/pgtable_32.h
+++ linux-x86.q/include/asm-x86/pgtable_32.h
@@ -92,7 +92,11 @@ extern unsigned long pg0[];
 /* To avoid harmful races, pmd_none(x) should check only the lower when PAE */
 #define pmd_none(x)	(!(unsigned long)pmd_val(x))
 #define pmd_present(x)	(pmd_val(x) & _PAGE_PRESENT)
-#define	pmd_bad(x)	((pmd_val(x) \
+
+extern int pmd_bad(pmd_t pmd);
+
+#define pmd_bad_v1(x)	((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
+#define	pmd_bad_v2(x)	((pmd_val(x) \
 			  & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)) \
 			 != _KERNPG_TABLE)
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03  9:13   ` Ingo Molnar
@ 2008-03-03 16:41     ` Arjan van de Ven
  2008-03-03 17:40       ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-03 16:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> Could you try to hack up a debug patch perhaps? Uninline the 
>> fucntion(s) then add two versions (one is that breaks on your box and 
>> one is that works on your box) of this same pmd_bad()/pud_bad() 
>> functions and do something like this (pseudocode):
> 
> i.e. something like the (tested) patch below. It is not clear whether 
> your testcase is on 32-bit or 64-bit - so i went for the more likely 
> 32-bit case first.


------------[ cut here ]------------
WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
  [<c0424ba5>] warn_on_slowpath+0x41/0x67
  [<c0408c5c>] ? native_sched_clock+0x94/0xa6
  [<c043f432>] ? lock_release_holdtime+0x1a/0x115
  [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
  [<c063eee6>] ? _spin_unlock+0x1d/0x20
  [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
  [<c04851c1>] ? do_sync_read+0xab/0xe9
  [<c0417223>] pmd_bad+0x44/0x53
  [<c046f37f>] follow_page+0x8b/0x1f2
  [<c0470aa0>] get_user_pages+0x281/0x2ef
  [<c0488dec>] get_arg_page+0x2d/0x79
  [<c04f63cd>] ? strnlen_user+0x2f/0x4d
  [<c0488efb>] copy_strings+0xc3/0x160
  [<c0488fb4>] copy_strings_kernel+0x1c/0x2b
  [<c048a109>] do_execve+0x11a/0x1f0
  [<c0403351>] sys_execve+0x29/0x51
  [<c0404ac6>] syscall_call+0x7/0xb
  [<c0407697>] ? kernel_execve+0x17/0x1c
  [<c04021d1>] ? _stext+0x69/0x12c
  [<c040574f>] ? kernel_thread_helper+0x7/0x10
  =======================
---[ end trace 63b854b89f6f375d ]---




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-01 18:56 bisected boot regression post 2.6.25-rc3.. please revert Arjan van de Ven
  2008-03-03  7:46 ` Ingo Molnar
@ 2008-03-03 17:15 ` Nish Aravamudan
  1 sibling, 0 replies; 27+ messages in thread
From: Nish Aravamudan @ 2008-03-03 17:15 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: mingo, torvalds, hans.rosenfeld, linux-kernel

On 3/1/08, Arjan van de Ven <arjan@linux.intel.com> wrote:
> Hi Linus, Ingo, Hans,
>
>  Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
>  ( x86: fix pmd_bad and pud_bad to support huge pages )
>  since it prevents the kernel to finish booting on my (Penryn based)
>  laptop. The boot stops right after freeing the init memory.
>  Took a while to bisect (due to it touching page*.h, which forces a
>  full recompile), but it definitely is caused by this commit...
>
>  Please revert.
>
>
>
>  [arjan@localhost linux.trees.git]$ git-bisect good
>  cded932b75ab0a5f9181ee3da34a0a488d1a14fd is first bad commit
>  commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
>  Author: Hans Rosenfeld <hans.rosenfeld@amd.com>
>  Date:   Mon Feb 18 18:10:47 2008 +0100
>
>     x86: fix pmd_bad and pud_bad to support huge pages
>
>     I recently stumbled upon a problem in the support for huge pages. If a
>     program using huge pages does not explicitly unmap them, they remain
>     mapped (and therefore, are lost) after the program exits.
>
>     I observed that the free huge page count in /proc/meminfo decreased when
>     running my program, and it did not increase after the program exited.
>     After running the program a few times, no more huge pages could be
>     allocated.
>
>     The reason for this seems to be that the x86 pmd_bad and pud_bad
>     consider pmd/pud entries having the PSE bit set invalid. I think there
>     is nothing wrong with this bit being set, it just indicates that the
>     lowest level of translation has been reached. This bit has to be (and
>     is) checked after the basic validity of the entry has been checked, like
>     in this fragment from follow_page() in mm/memory.c:
>
>       if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
>               goto no_page_table;
>
>       if (pmd_huge(*pmd)) {
>               BUG_ON(flags & FOLL_GET);
>               page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
>               goto out;
>       }
>
>     Note that this code currently doesn't work as intended if the pmd refers
>     to a huge page, the pmd_huge() check can not be reached if the page is
>     huge.
>
>     Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
>     allow for the PSE bit being set fixes this. For similar reasons,
>     allowing the NX bit being set is necessary, too. I have seen huge pages
>     having the NX bit set in their pmd entry, which would cause the same
>     problem.
>
>     Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>

Hrm, not that I necessarily have the right to ask for this, but
shouldn't hugepage related patches at least be cc'd to linux-mm? it
seems this went via LKML to the x86 tree? There are a few of us that
work on hugepages that would have seen this there, but it got lost in
the noise (for me) on LKML.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 16:41     ` Arjan van de Ven
@ 2008-03-03 17:40       ` Ingo Molnar
  2008-03-03 17:51         ` Nish Aravamudan
  2008-03-03 18:36         ` Arjan van de Ven
  0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03 17:40 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin


* Arjan van de Ven <arjan@linux.intel.com> wrote:

> ------------[ cut here ]------------
> WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
>  [<c0424ba5>] warn_on_slowpath+0x41/0x67
>  [<c0408c5c>] ? native_sched_clock+0x94/0xa6
>  [<c043f432>] ? lock_release_holdtime+0x1a/0x115
>  [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
>  [<c063eee6>] ? _spin_unlock+0x1d/0x20
>  [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
>  [<c04851c1>] ? do_sync_read+0xab/0xe9
>  [<c0417223>] pmd_bad+0x44/0x53
>  [<c046f37f>] follow_page+0x8b/0x1f2
>  [<c0470aa0>] get_user_pages+0x281/0x2ef

hm. I suspect some gcc related difference related to the handling of 
this masking:

   pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)

versus:

   pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)

perhaps it will work if you change it to:

   pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)

?

in any case, the commit has to be reverted as it clearly isnt a NOP on 
your box as it was intended to be. (it should only have made a 
difference in a rare hugetlbfs case)

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 17:40       ` Ingo Molnar
@ 2008-03-03 17:51         ` Nish Aravamudan
  2008-03-03 17:55           ` Ingo Molnar
  2008-03-03 18:36         ` Arjan van de Ven
  1 sibling, 1 reply; 27+ messages in thread
From: Nish Aravamudan @ 2008-03-03 17:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, torvalds, hans.rosenfeld, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

On 3/3/08, Ingo Molnar <mingo@elte.hu> wrote:
>
>  * Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>
> > ------------[ cut here ]------------
>  > WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
>  > Modules linked in:
>  > Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
>  >  [<c0424ba5>] warn_on_slowpath+0x41/0x67
>  >  [<c0408c5c>] ? native_sched_clock+0x94/0xa6
>  >  [<c043f432>] ? lock_release_holdtime+0x1a/0x115
>  >  [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
>  >  [<c063eee6>] ? _spin_unlock+0x1d/0x20
>  >  [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
>  >  [<c04851c1>] ? do_sync_read+0xab/0xe9
>  >  [<c0417223>] pmd_bad+0x44/0x53
>  >  [<c046f37f>] follow_page+0x8b/0x1f2
>  >  [<c0470aa0>] get_user_pages+0x281/0x2ef
>
>
> hm. I suspect some gcc related difference related to the handling of
>  this masking:
>
>
>    pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>
>
> versus:
>
>
>    pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
>
>
> perhaps it will work if you change it to:
>
>
>    pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
>
> ?
>
>  in any case, the commit has to be reverted as it clearly isnt a NOP on
>  your box as it was intended to be. (it should only have made a
>  difference in a rare hugetlbfs case)

On x86/{,_64}, _PAGE_PSE and _PAGE_PROTNONE are the same bit. Would
that have any effect here? We encountered that collision when adding
mprotect() support for hugepages.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 17:51         ` Nish Aravamudan
@ 2008-03-03 17:55           ` Ingo Molnar
  2008-03-03 17:58             ` H. Peter Anvin
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03 17:55 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: Arjan van de Ven, torvalds, hans.rosenfeld, linux-kernel,
	Thomas Gleixner, H. Peter Anvin


* Nish Aravamudan <nish.aravamudan@gmail.com> wrote:

> >  in any case, the commit has to be reverted as it clearly isnt a NOP 
> >  on your box as it was intended to be. (it should only have made a 
> >  difference in a rare hugetlbfs case)
> 
> On x86/{,_64}, _PAGE_PSE and _PAGE_PROTNONE are the same bit. Would 
> that have any effect here? We encountered that collision when adding 
> mprotect() support for hugepages.

well, this is the PMD, and PROTNONE is a pte property.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 17:55           ` Ingo Molnar
@ 2008-03-03 17:58             ` H. Peter Anvin
  0 siblings, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2008-03-03 17:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nish Aravamudan, Arjan van de Ven, torvalds, hans.rosenfeld,
	linux-kernel, Thomas Gleixner

Ingo Molnar wrote:
> * Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
> 
>>>  in any case, the commit has to be reverted as it clearly isnt a NOP 
>>>  on your box as it was intended to be. (it should only have made a 
>>>  difference in a rare hugetlbfs case)
>> On x86/{,_64}, _PAGE_PSE and _PAGE_PROTNONE are the same bit. Would 
>> that have any effect here? We encountered that collision when adding 
>> mprotect() support for hugepages.
> 
> well, this is the PMD, and PROTNONE is a pte property.

The PSE bit in the PTE becomes the PATX bit.  It shouldn't ever be set 
in Linux.

	-hpa

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 17:40       ` Ingo Molnar
  2008-03-03 17:51         ` Nish Aravamudan
@ 2008-03-03 18:36         ` Arjan van de Ven
  2008-03-03 18:44           ` Linus Torvalds
  2008-03-03 21:13           ` Segher Boessenkool
  1 sibling, 2 replies; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-03 18:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin

Ingo Molnar wrote:
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
> 
>> ------------[ cut here ]------------
>> WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
>> Modules linked in:
>> Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
>>  [<c0424ba5>] warn_on_slowpath+0x41/0x67
>>  [<c0408c5c>] ? native_sched_clock+0x94/0xa6
>>  [<c043f432>] ? lock_release_holdtime+0x1a/0x115
>>  [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
>>  [<c063eee6>] ? _spin_unlock+0x1d/0x20
>>  [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
>>  [<c04851c1>] ? do_sync_read+0xab/0xe9
>>  [<c0417223>] pmd_bad+0x44/0x53
>>  [<c046f37f>] follow_page+0x8b/0x1f2
>>  [<c0470aa0>] get_user_pages+0x281/0x2ef
> 
> hm. I suspect some gcc related difference related to the handling of 
> this masking:
> 
>    pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
> 
> versus:
> 
>    pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
> 
> perhaps it will work if you change it to:
> 
>    pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
> 
> ?
> 
> in any case, the commit has to be reverted as it clearly isnt a NOP on 
> your box as it was intended to be. (it should only have made a 
> difference in a rare hugetlbfs case)

interesting observation: if I turn the macros into inlines... the difference goes away.

I'm half tempted to just do it as inline period ... any objections ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 18:36         ` Arjan van de Ven
@ 2008-03-03 18:44           ` Linus Torvalds
  2008-03-03 22:00             ` Arjan van de Ven
  2008-03-09 11:56             ` Ingo Molnar
  2008-03-03 21:13           ` Segher Boessenkool
  1 sibling, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2008-03-03 18:44 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin



On Mon, 3 Mar 2008, Arjan van de Ven wrote:
> 
> interesting observation: if I turn the macros into inlines... the difference
> goes away.
> 
> I'm half tempted to just do it as inline period ... any objections ?

Yes, I object. I want to understand why it would matter. If this is a 
compiler bug, it's a really rather bad one. And if it's just some stupid 
bug in our pmd_bad() macro, I still want to know what the problem was.

Can you compile both ways and look at what changed at the offending site 
(which is apparently "follow_page()")?

And do you have some odd compiler version?

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 18:36         ` Arjan van de Ven
  2008-03-03 18:44           ` Linus Torvalds
@ 2008-03-03 21:13           ` Segher Boessenkool
  2008-03-03 21:22             ` Segher Boessenkool
  2008-03-04  6:58             ` Ingo Molnar
  1 sibling, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2008-03-03 21:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

>> hm. I suspect some gcc related difference related to the handling of 
>> this masking:
>>    pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>> versus:
>>    pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
>> perhaps it will work if you change it to:
>>    pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>> ?
>> in any case, the commit has to be reverted as it clearly isnt a NOP 
>> on your box as it was intended to be. (it should only have made a 
>> difference in a rare hugetlbfs case)
>
> interesting observation: if I turn the macros into inlines... the 
> difference goes away.

include/asm-x86/pgtable.h has

	#define _PAGE_BIT_PSE           7

	#define _PAGE_PSE       (_AC(1, L)<<_PAGE_BIT_PSE)

and

	#define _PAGE_BIT_NX           63

	#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
	#define _PAGE_NX        (_AC(1, ULL) << _PAGE_BIT_NX)

so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
64-bit is 0x00000000ffffff7f, so in

	(~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)

all the high bits are lost, while the original

	~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)

works as intended, since the bit inversion is done on a 64-bit number.


Maybe all those pagetable bit definitions should use 64-bit (ULL or a 
cast),
as it is now some dangerous detail is hidden behind the macros.  Using 
inline
functions for simple constants seems like overkill to me, but would 
also work
of course.


Segher


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 21:13           ` Segher Boessenkool
@ 2008-03-03 21:22             ` Segher Boessenkool
  2008-03-03 22:33               ` Segher Boessenkool
  2008-03-04  6:58             ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2008-03-03 21:22 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
	Arjan van de Ven, H. Peter Anvin, Thomas Gleixner

> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
> 64-bit is 0x00000000ffffff7f,

Actually, it is signed, so this isn't true.  Comments about unsafeness
still apply.


Segher


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 18:44           ` Linus Torvalds
@ 2008-03-03 22:00             ` Arjan van de Ven
  2008-03-04  1:05               ` Arjan van de Ven
  2008-03-09 11:56             ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-03 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin

Linus Torvalds wrote:
> 
> On Mon, 3 Mar 2008, Arjan van de Ven wrote:
>> interesting observation: if I turn the macros into inlines... the difference
>> goes away.
>>
>> I'm half tempted to just do it as inline period ... any objections ?
> 
> Yes, I object. I want to understand why it would matter. If this is a 
> compiler bug, it's a really rather bad one. And if it's just some stupid 
> bug in our pmd_bad() macro, I still want to know what the problem was.
> 
> Can you compile both ways and look at what changed at the offending site 
> (which is apparently "follow_page()")?
> 
> And do you have some odd compiler version?

it's the F9 gcc, so yeah it's really new.

I'm staring at the disassembly and it's quite different but follow_page() is rather large;
trying to  make a smaller testcase now

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 21:22             ` Segher Boessenkool
@ 2008-03-03 22:33               ` Segher Boessenkool
  2008-03-03 22:55                 ` H. Peter Anvin
  2008-03-03 22:56                 ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2008-03-03 22:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
	Arjan van de Ven, H. Peter Anvin, Thomas Gleixner

>> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
>> 64-bit is 0x00000000ffffff7f,
>
> Actually, it is signed, so this isn't true.  Comments about unsafeness
> still apply.

It turns out that PAGE_SIZE is unsigned.  So this gives us for

	(~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)

the types UL, L, L, ULL resp.

The associativity of & is left-to-right, so this in turn becomes

	UL, L, ULL

	UL, ULL

	ULL

and that cast from UL to ULL doesn't sign-extend.


Segher


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 22:33               ` Segher Boessenkool
@ 2008-03-03 22:55                 ` H. Peter Anvin
  2008-03-03 22:56                 ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2008-03-03 22:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
	Arjan van de Ven, Thomas Gleixner

Segher Boessenkool wrote:
>>> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
>>> 64-bit is 0x00000000ffffff7f,
>>
>> Actually, it is signed, so this isn't true.  Comments about unsafeness
>> still apply.
> 
> It turns out that PAGE_SIZE is unsigned.  So this gives us for
> 
>     (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
> 
> the types UL, L, L, ULL resp.
> 
> The associativity of & is left-to-right, so this in turn becomes
> 
>     UL, L, ULL
> 
>     UL, ULL
> 
>     ULL
> 
> and that cast from UL to ULL doesn't sign-extend.
> 

All the masks either need to be the proper size or signed.

	-hpa

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 22:33               ` Segher Boessenkool
  2008-03-03 22:55                 ` H. Peter Anvin
@ 2008-03-03 22:56                 ` Jeremy Fitzhardinge
  2008-03-03 23:04                   ` H. Peter Anvin
  1 sibling, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-03 22:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
	Arjan van de Ven, H. Peter Anvin, Thomas Gleixner

Segher Boessenkool wrote:
>>> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
>>> 64-bit is 0x00000000ffffff7f,
>>
>> Actually, it is signed, so this isn't true.  Comments about unsafeness
>> still apply.
>
> It turns out that PAGE_SIZE is unsigned.  So this gives us for
>
>     (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
> the types UL, L, L, ULL resp.
>
> The associativity of & is left-to-right, so this in turn becomes
>
>     UL, L, ULL
>
>     UL, ULL
>
>     ULL
>
> and that cast from UL to ULL doesn't sign-extend. 

I went through and made a bunch of these flags signed so that they would 
sign-extend properly in 32 vs 64 bit.  We could make them 
unconditionally 64-bit, but I'm worried that will have unforeseen 
consequences too.

In this case, PAGE_MASK should probably be signed too, so the sign 
extension happens properly.

Alternatively, they could be cast to pteval_t and/or phys_addr_t so that 
they will have an inherent size which matches the pagetable format 
(using _AT() rather than _AC()).

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 22:56                 ` Jeremy Fitzhardinge
@ 2008-03-03 23:04                   ` H. Peter Anvin
  0 siblings, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2008-03-03 23:04 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Segher Boessenkool, hans.rosenfeld, torvalds, linux-kernel,
	Ingo Molnar, Arjan van de Ven, Thomas Gleixner

Jeremy Fitzhardinge wrote:
> 
> I went through and made a bunch of these flags signed so that they would 
> sign-extend properly in 32 vs 64 bit.  We could make them 
> unconditionally 64-bit, but I'm worried that will have unforeseen 
> consequences too.
> 
> In this case, PAGE_MASK should probably be signed too, so the sign 
> extension happens properly.
> 

Yes, it should.

> Alternatively, they could be cast to pteval_t and/or phys_addr_t so that 
> they will have an inherent size which matches the pagetable format 
> (using _AT() rather than _AC()).

Either that, or we could use a custom constructor macro (PTEVAL_C()). 
This is arguably The Right Thing, but the sign-extending version is OK, too.

	-hpa

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 22:00             ` Arjan van de Ven
@ 2008-03-04  1:05               ` Arjan van de Ven
  2008-03-04  6:53                 ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-04  1:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin

Arjan van de Ven wrote:
> Linus Torvalds wrote:
>>
>> On Mon, 3 Mar 2008, Arjan van de Ven wrote:
>>> interesting observation: if I turn the macros into inlines... the 
>>> difference
>>> goes away.
>>>
>>> I'm half tempted to just do it as inline period ... any objections ?
>>
>> Yes, I object. I want to understand why it would matter. If this is a 
>> compiler bug, it's a really rather bad one. And if it's just some 
>> stupid bug in our pmd_bad() macro, I still want to know what the 
>> problem was.
>>
>> Can you compile both ways and look at what changed at the offending 
>> site (which is apparently "follow_page()")?
>>
>> And do you have some odd compiler version?
> 
> it's the F9 gcc, so yeah it's really new.
> 
> I'm staring at the disassembly and it's quite different but 
> follow_page() is rather large;
> trying to  make a smaller testcase now

sadly a more isolated testcase doesn't show the same behavior yet;
so it's some fun interaction ;(

more staring at the assembly for me

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-04  1:05               ` Arjan van de Ven
@ 2008-03-04  6:53                 ` Ingo Molnar
  2008-03-05 15:35                   ` Arjan van de Ven
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-04  6:53 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin


* Arjan van de Ven <arjan@linux.intel.com> wrote:

>> I'm staring at the disassembly and it's quite different but 
>> follow_page() is rather large; trying to make a smaller testcase now
>
> sadly a more isolated testcase doesn't show the same behavior yet; so 
> it's some fun interaction ;(
>
> more staring at the assembly for me

could you post the "bad" and the "good" disassembled functions, and the 
specific gcc version string?

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 21:13           ` Segher Boessenkool
  2008-03-03 21:22             ` Segher Boessenkool
@ 2008-03-04  6:58             ` Ingo Molnar
  1 sibling, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-03-04  6:58 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Arjan van de Ven, hans.rosenfeld, torvalds, linux-kernel,
	H. Peter Anvin, Thomas Gleixner


* Segher Boessenkool <segher@kernel.crashing.org> wrote:

> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to 
> 64-bit is 0x00000000ffffff7f, so in
>
> 	(~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
> all the high bits are lost, while the original
>
> 	~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>
> works as intended, since the bit inversion is done on a 64-bit number.

but we really are interested in the low bits here (lets ignore the NX 
bit for now) and the patch has been in the queue for a long time (more 
than a month), so if there was a trivial mask mixup problem it would 
have shown on the first day. So i suspect some gcc bug instead - and 
certainly the colorful mixture of types and signs in this expression 
might have surprised a new version of gcc somewhere.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-04  6:53                 ` Ingo Molnar
@ 2008-03-05 15:35                   ` Arjan van de Ven
  0 siblings, 0 replies; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-05 15:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin

Ingo Molnar wrote:
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
> 
>>> I'm staring at the disassembly and it's quite different but 
>>> follow_page() is rather large; trying to make a smaller testcase now
>> sadly a more isolated testcase doesn't show the same behavior yet; so 
>> it's some fun interaction ;(
>>
>> more staring at the assembly for me
> 
> could you post the "bad" and the "good" disassembled functions, and the 
> specific gcc version string?
> 

http://www.fenrus.org/memory.asm.macro   <-- failing one
http://www.fenrus.org/memory.asm.inline   <-- working one

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-03 18:44           ` Linus Torvalds
  2008-03-03 22:00             ` Arjan van de Ven
@ 2008-03-09 11:56             ` Ingo Molnar
  2008-03-09 17:27               ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-09 11:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > I'm half tempted to just do it as inline period ... any objections ?
> 
> Yes, I object. I want to understand why it would matter. If this is a 
> compiler bug, it's a really rather bad one. And if it's just some 
> stupid bug in our pmd_bad() macro, I still want to know what the 
> problem was.

ok, i think i figured it out: on PAE 32-bit we dont properly sign-extend 
to a 64-bit pmd value in the new pmd_bad() macro - so if any physical 
RAM is above 4GB (Arjan's laptop had 4GB of RAM in it?) then we'll start 
seeing those high bits. This definitely needs PAE and more than 4GB of 
RAM to trigger.

The best fix is the one below (it should solve Arjan's regression with 
that now-reverted patch redone), as it is the right thing to do [that 
way sign auto-extend trickles over into PAGE_MASK as well].

It boots fine on a >4GB box of mine but changing the type of PAGE_SIZE 
affects _everything_ so i'll keep testing it and i'd suggest to delay 
this fix to shortly after -rc5 is released instead of risking -rc5 with 
such a late commit. I'll send this and the re-done hugetlbfs fix 
together early next week.

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Index: linux/include/asm-x86/page.h
===================================================================
--- linux.orig/include/asm-x86/page.h
+++ linux/include/asm-x86/page.h
@@ -5,7 +5,7 @@
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT	12
-#define PAGE_SIZE	(_AC(1,UL) << PAGE_SHIFT)
+#define PAGE_SIZE	(_AC(1,L) << PAGE_SHIFT)
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 
 #ifdef __KERNEL__

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-09 11:56             ` Ingo Molnar
@ 2008-03-09 17:27               ` Linus Torvalds
  2008-03-09 18:57                 ` Ingo Molnar
                                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Linus Torvalds @ 2008-03-09 17:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin



On Sun, 9 Mar 2008, Ingo Molnar wrote:
> 
> The best fix is the one below (it should solve Arjan's regression with 
> that now-reverted patch redone), as it is the right thing to do [that 
> way sign auto-extend trickles over into PAGE_MASK as well].

This *really* makes me worry.

I do agree that there are good reasons that "PAGE_MASK" should be signed 
(since we do want the top bit to extend), but changing "PAGE_SIZE" to 
signed seems to be a rather big change, considering that it's used 
*everywhere*.

In particular, it's quite possibly used for things like

	offset = something % PAGE_SIZE;

etc, where a signed divide is positively wrong.

But even for PAGE_MASK, we literally have code like this:

	if ((size_avail & PAGE_MASK) < rg.size) {

where it so _happens_ that "size_avail" is unsigned, but what if it 
wasn't? It could turn a unsigned comparison into a signed one, and 
introduce any number of security bugs etc.

Sam goes even more for PAGE_SIZE. At least there are only about a thousand 
users of PAGE_MASK in the kernel, PAGE_SIZE is used about six times as 
many times, and just a _trivial_ grep like

	git grep 'PAGE_SIZE.* [<>][= ]'

finds a lot of cases where I'm not at all sure that it's safe to change 
PAGE_SIZE to a signed value.

In other words, there are lots of things like

	if (x < PAGE_SIZE)
		...

where we currently get a unsigned comparison, and where for all I know a 
signed PAGE_SIZE means that we should use 

	if (x >= 0 && x < PAGE_SIZE)

instead.

In short, I refuse to apply this patch after an -rc1 release. I suspect 
that I shouldn't apply something like this even *before* an -rc1, because 
I think it's just a really bad idea to make these types signed even if it 
were to give you magically easier sign extensions to "unsigned long long".

So I would *very* strongly instead argue:

 - "unsigned long" is the native kernel type for all address manipulation, 
   and thus "PAGE_SIZE" and "PAGE_MASK" should continue to have that type.

 - anything that uses any other type without explicitly making sure it's 
   safe is mis-using those macros. IOW, PAGE_MASk was *never* a type that 
   had anything what-so-ever to do with page table entry bits, and this is 
   purely a page table entry issue!

So my suggested patch would:

 - make the page table code use a specific mask that it builds up itself, 
   and makes sure it's of the right type and has the rigth value in 
   whatever type "struct pte_entry" is. The fact that "pte_val()" is 
   larger than "unsigned long" on x86-32 is very clearly a PTE issue, 
   *not* an issue for PAGE_SIZE or PAGE_MASK.

Btw, just one look at your other patch should have convinced you of that 
anyway. Do you really think this is a readable patch or that the result is 
clean:

	+#define pmd_bad_v1(x)  ((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
	+#define        pmd_bad_v2(x)   ((pmd_val(x) \
	                          & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)) \
	                         != _KERNPG_TABLE)

when the real problem is that the mask you build up here isn't safe o 
pretty to begin with!

So make that whole "~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)" 
expression a nice *clean* expression that is about page table entries 
instead, and make *that* one be the right type and have the right bits. 
And suddenly the problem just goes away.

In fact, if you look at that expression, you suddenly realize that 
PAGE_MASK was *totally* the wrong value to use in the first place, whether 
sign-extended o not! Notice how

	PAGE_MASK | _PAGE_NX

is already a totally senseless operation if PAGE_MASK has all high bits 
set! 

So I think your whole argument and the patch is UTTER AND UNBELIEVABLE 
CRAP!

Blaming it on PAGE_MASK was totally incorrect. It has nothing to do with 
PAGE_MASK, and everything to do with the fact that the page table checking 
patch was utterly failed and pure shit.

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-09 17:27               ` Linus Torvalds
@ 2008-03-09 18:57                 ` Ingo Molnar
  2008-03-10  2:45                 ` Jeremy Fitzhardinge
  2008-03-10  4:35                 ` Paul Mackerras
  2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-03-09 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, hans.rosenfeld, linux-kernel, Thomas Gleixner,
	H. Peter Anvin


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So I would *very* strongly instead argue:
> 
>  - "unsigned long" is the native kernel type for all address manipulation, 
>    and thus "PAGE_SIZE" and "PAGE_MASK" should continue to have that type.
> 
>  - anything that uses any other type without explicitly making sure it's 
>    safe is mis-using those macros. IOW, PAGE_MASk was *never* a type that 
>    had anything what-so-ever to do with page table entry bits, and this is 
>    purely a page table entry issue!
> 
> So my suggested patch would:
> 
>  - make the page table code use a specific mask that it builds up itself, 
>    and makes sure it's of the right type and has the rigth value in 
>    whatever type "struct pte_entry" is. The fact that "pte_val()" is 
>    larger than "unsigned long" on x86-32 is very clearly a PTE issue, 
>    *not* an issue for PAGE_SIZE or PAGE_MASK.

yeah, indeed my patch was sloppy, i didnt think it through - i fell for 
the lure of the easy-looking 'PAGE_SIZE is small, sign-extend it' hack. 

Will do it cleanly and will also clean up all the pte/address/pgprot 
type mixing that currently goes on in this maze of macros.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-09 17:27               ` Linus Torvalds
  2008-03-09 18:57                 ` Ingo Molnar
@ 2008-03-10  2:45                 ` Jeremy Fitzhardinge
  2008-03-10  4:35                 ` Paul Mackerras
  2 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-10  2:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Arjan van de Ven, hans.rosenfeld, linux-kernel,
	Thomas Gleixner, H. Peter Anvin, Jan Beulich

Linus Torvalds wrote:
> So I would *very* strongly instead argue:
>
>  - "unsigned long" is the native kernel type for all address manipulation, 
>    and thus "PAGE_SIZE" and "PAGE_MASK" should continue to have that type.
>
>  - anything that uses any other type without explicitly making sure it's 
>    safe is mis-using those macros. IOW, PAGE_MASk was *never* a type that 
>    had anything what-so-ever to do with page table entry bits, and this is 
>    purely a page table entry issue!
>   

Yes.  PTE_MASK already exists for masking the address bits out of a 
pte.  It should have the type pteval_t, which will deal with the PAE 
case cleanly.  It should also only have the middle pfn bits set, so that 
_PAGE_NX is also not included.

It's currently defined in terms of PHYSICAL_PAGE_MASK, which 1) is only 
used to define PTE_MASK, and 2) defined (wrongly) in terms of PAGE_MASK 
(though __PHYSICAL_MASK is correctly defined).

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: bisected boot regression post 2.6.25-rc3.. please revert
  2008-03-09 17:27               ` Linus Torvalds
  2008-03-09 18:57                 ` Ingo Molnar
  2008-03-10  2:45                 ` Jeremy Fitzhardinge
@ 2008-03-10  4:35                 ` Paul Mackerras
  2 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2008-03-10  4:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Arjan van de Ven, hans.rosenfeld, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

Linus Torvalds writes:

> But even for PAGE_MASK, we literally have code like this:
> 
> 	if ((size_avail & PAGE_MASK) < rg.size) {
> 
> where it so _happens_ that "size_avail" is unsigned, but what if it 
> wasn't? It could turn a unsigned comparison into a signed one, and 
> introduce any number of security bugs etc.

We have had PAGE_MASK being signed on powerpc for ages, so if you do
find any such bugs, please let me know. :)  I'm not aware of any at
present, though.

The expression you quoted will be ok as long as either size_avail or
rg.size is unsigned, as far as I can see.

Our PAGE_SIZE is unsigned long.

Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-03-10  4:35 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-01 18:56 bisected boot regression post 2.6.25-rc3.. please revert Arjan van de Ven
2008-03-03  7:46 ` Ingo Molnar
2008-03-03  9:13   ` Ingo Molnar
2008-03-03 16:41     ` Arjan van de Ven
2008-03-03 17:40       ` Ingo Molnar
2008-03-03 17:51         ` Nish Aravamudan
2008-03-03 17:55           ` Ingo Molnar
2008-03-03 17:58             ` H. Peter Anvin
2008-03-03 18:36         ` Arjan van de Ven
2008-03-03 18:44           ` Linus Torvalds
2008-03-03 22:00             ` Arjan van de Ven
2008-03-04  1:05               ` Arjan van de Ven
2008-03-04  6:53                 ` Ingo Molnar
2008-03-05 15:35                   ` Arjan van de Ven
2008-03-09 11:56             ` Ingo Molnar
2008-03-09 17:27               ` Linus Torvalds
2008-03-09 18:57                 ` Ingo Molnar
2008-03-10  2:45                 ` Jeremy Fitzhardinge
2008-03-10  4:35                 ` Paul Mackerras
2008-03-03 21:13           ` Segher Boessenkool
2008-03-03 21:22             ` Segher Boessenkool
2008-03-03 22:33               ` Segher Boessenkool
2008-03-03 22:55                 ` H. Peter Anvin
2008-03-03 22:56                 ` Jeremy Fitzhardinge
2008-03-03 23:04                   ` H. Peter Anvin
2008-03-04  6:58             ` Ingo Molnar
2008-03-03 17:15 ` Nish Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).