All of lore.kernel.org
 help / color / mirror / Atom feed
* PAGE_SIZE Availability Inconsistency
@ 2007-03-05 23:55 David Brown
  2007-03-05 23:59 ` Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: David Brown @ 2007-03-05 23:55 UTC (permalink / raw)
  To: Linux Kernel Mailing List

I was rtfc'ing the code one day and noticed somethings about the
PAGE_SIZE define that is kinda inconsistent around its relative
location to the __KERNEL__ define.

On some architectures the PAGE_SIZE is outside the __KERNEL__ define
(i386 and x86_64) and on others its inside the define (ia64 and
powerpc).  I was wondering if this is because the powerpc and ia64
architectures have dynamic page sizes so that's why they can't export
PAGE_SIZE outside __KERNEL__.

I'm kinda wondering how I'm supposed to write portable user-space code
if I want to use the PAGE_SIZE define on different architectures.

Thanks,
- David Brown

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-05 23:55 PAGE_SIZE Availability Inconsistency David Brown
@ 2007-03-05 23:59 ` Eric Dumazet
  2007-03-06  0:01 ` Randy Dunlap
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2007-03-05 23:59 UTC (permalink / raw)
  To: David Brown; +Cc: Linux Kernel Mailing List

David Brown a écrit :
> I was rtfc'ing the code one day and noticed somethings about the
> PAGE_SIZE define that is kinda inconsistent around its relative
> location to the __KERNEL__ define.
> 
> On some architectures the PAGE_SIZE is outside the __KERNEL__ define
> (i386 and x86_64) and on others its inside the define (ia64 and
> powerpc).  I was wondering if this is because the powerpc and ia64
> architectures have dynamic page sizes so that's why they can't export
> PAGE_SIZE outside __KERNEL__.
> 
> I'm kinda wondering how I'm supposed to write portable user-space code
> if I want to use the PAGE_SIZE define on different architectures.

The real question is : why do you need PAGE_SIZE from user-space code ?

If it's for mmap() use, you should use getpagesize()


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-05 23:55 PAGE_SIZE Availability Inconsistency David Brown
  2007-03-05 23:59 ` Eric Dumazet
@ 2007-03-06  0:01 ` Randy Dunlap
  2007-03-06  0:03 ` David Miller
  2007-03-06  9:29 ` Christoph Hellwig
  3 siblings, 0 replies; 34+ messages in thread
From: Randy Dunlap @ 2007-03-06  0:01 UTC (permalink / raw)
  To: David Brown; +Cc: Linux Kernel Mailing List

On Mon, 5 Mar 2007 15:55:06 -0800 David Brown wrote:

> I was rtfc'ing the code one day and noticed somethings about the
> PAGE_SIZE define that is kinda inconsistent around its relative
> location to the __KERNEL__ define.
> 
> On some architectures the PAGE_SIZE is outside the __KERNEL__ define
> (i386 and x86_64) and on others its inside the define (ia64 and
> powerpc).  I was wondering if this is because the powerpc and ia64
> architectures have dynamic page sizes so that's why they can't export
> PAGE_SIZE outside __KERNEL__.
> 
> I'm kinda wondering how I'm supposed to write portable user-space code
> if I want to use the PAGE_SIZE define on different architectures.

use 'getpagesize(2)'

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-05 23:55 PAGE_SIZE Availability Inconsistency David Brown
  2007-03-05 23:59 ` Eric Dumazet
  2007-03-06  0:01 ` Randy Dunlap
@ 2007-03-06  0:03 ` David Miller
  2007-03-06  0:04   ` David Brown
  2007-03-06  9:29 ` Christoph Hellwig
  3 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2007-03-06  0:03 UTC (permalink / raw)
  To: dmlb2000; +Cc: linux-kernel

From: "David Brown" <dmlb2000@gmail.com>
Date: Mon, 5 Mar 2007 15:55:06 -0800

> I'm kinda wondering how I'm supposed to write portable user-space code
> if I want to use the PAGE_SIZE define on different architectures.

Call getpagesize().

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-06  0:03 ` David Miller
@ 2007-03-06  0:04   ` David Brown
  2007-03-06  0:26     ` David Miller
  0 siblings, 1 reply; 34+ messages in thread
From: David Brown @ 2007-03-06  0:04 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel

On 3/5/07, David Miller <davem@davemloft.net> wrote:
> From: "David Brown" <dmlb2000@gmail.com>
> Date: Mon, 5 Mar 2007 15:55:06 -0800
>
> > I'm kinda wondering how I'm supposed to write portable user-space code
> > if I want to use the PAGE_SIZE define on different architectures.
>
> Call getpagesize().
>

Thanks, but that still leaves PAGE_SIZE available for some
architectures and not for others shouldn't this be moved inside
__KERNEL__ in i386 and x86_64 then?

- David Brown

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-06  0:04   ` David Brown
@ 2007-03-06  0:26     ` David Miller
  2007-03-06  2:21       ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2007-03-06  0:26 UTC (permalink / raw)
  To: dmlb2000; +Cc: linux-kernel

From: "David Brown" <dmlb2000@gmail.com>
Date: Mon, 5 Mar 2007 16:04:24 -0800

> On 3/5/07, David Miller <davem@davemloft.net> wrote:
> > From: "David Brown" <dmlb2000@gmail.com>
> > Date: Mon, 5 Mar 2007 15:55:06 -0800
> >
> > > I'm kinda wondering how I'm supposed to write portable user-space code
> > > if I want to use the PAGE_SIZE define on different architectures.
> >
> > Call getpagesize().
> >
> 
> Thanks, but that still leaves PAGE_SIZE available for some
> architectures and not for others shouldn't this be moved inside
> __KERNEL__ in i386 and x86_64 then?

I definitely think so.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-06  0:26     ` David Miller
@ 2007-03-06  2:21       ` H. Peter Anvin
  2007-03-08 21:08         ` Avi Kivity
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-06  2:21 UTC (permalink / raw)
  To: David Miller; +Cc: dmlb2000, linux-kernel

David Miller wrote:
>>>
>> Thanks, but that still leaves PAGE_SIZE available for some
>> architectures and not for others shouldn't this be moved inside
>> __KERNEL__ in i386 and x86_64 then?
> 
> I definitely think so.

It definitely should, especially on x86-64, where the page size isn't 
guaranteed by the ABI (on i386, the ABI guarantees a 4K page size; on 
x86-64 it can be up to 64K.)

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-05 23:55 PAGE_SIZE Availability Inconsistency David Brown
                   ` (2 preceding siblings ...)
  2007-03-06  0:03 ` David Miller
@ 2007-03-06  9:29 ` Christoph Hellwig
  2007-03-08  2:18   ` Roman Zippel
  3 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2007-03-06  9:29 UTC (permalink / raw)
  To: David Brown; +Cc: Linux Kernel Mailing List

On Mon, Mar 05, 2007 at 03:55:06PM -0800, David Brown wrote:
> I was rtfc'ing the code one day and noticed somethings about the
> PAGE_SIZE define that is kinda inconsistent around its relative
> location to the __KERNEL__ define.
> 
> On some architectures the PAGE_SIZE is outside the __KERNEL__ define
> (i386 and x86_64) and on others its inside the define (ia64 and
> powerpc).  I was wondering if this is because the powerpc and ia64
> architectures have dynamic page sizes so that's why they can't export
> PAGE_SIZE outside __KERNEL__.
> 
> I'm kinda wondering how I'm supposed to write portable user-space code
> if I want to use the PAGE_SIZE define on different architectures.

PAGE_SIZE should not be available at all.  Please use getpagesize()
instead.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-06  9:29 ` Christoph Hellwig
@ 2007-03-08  2:18   ` Roman Zippel
  2007-03-08  5:28     ` David Brown
  2007-03-08  9:00     ` Christoph Hellwig
  0 siblings, 2 replies; 34+ messages in thread
From: Roman Zippel @ 2007-03-08  2:18 UTC (permalink / raw)
  To: Christoph Hellwig, David Brown, Linux Kernel Mailing List

Hi,

On Tuesday 06 March 2007 10:29, Christoph Hellwig wrote:

> PAGE_SIZE should not be available at all.  Please use getpagesize()
> instead.

While I agree, NBPG is a bit of a problem, although it's only needed for aout 
coredumps AFAICT, but still needed to compile e.g. gdb.

bye, Roman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08  2:18   ` Roman Zippel
@ 2007-03-08  5:28     ` David Brown
  2007-03-08  8:32       ` Christoph Hellwig
  2007-03-08  9:00     ` Christoph Hellwig
  1 sibling, 1 reply; 34+ messages in thread
From: David Brown @ 2007-03-08  5:28 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Christoph Hellwig, Linux Kernel Mailing List

> While I agree, NBPG is a bit of a problem, although it's only needed for aout
> coredumps AFAICT, but still needed to compile e.g. gdb.

Well then how does gdb deal with ia64? because PAGE_SIZE and friends
aren't available for that arch same with ppc.

Looking at the gdb code they do have places where they define a
PAGE_SIZE but they even mention its a bug
(gdb-6.6/libiberty/getpagesize.c:14) also grepped through their code
looking for includes of page.h come up with nothing.

- David Brown

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08  5:28     ` David Brown
@ 2007-03-08  8:32       ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2007-03-08  8:32 UTC (permalink / raw)
  To: David Brown; +Cc: Roman Zippel, Christoph Hellwig, Linux Kernel Mailing List

On Wed, Mar 07, 2007 at 09:28:15PM -0800, David Brown wrote:
> >While I agree, NBPG is a bit of a problem, although it's only needed for 
> >aout
> >coredumps AFAICT, but still needed to compile e.g. gdb.
> 
> Well then how does gdb deal with ia64? because PAGE_SIZE and friends
> aren't available for that arch same with ppc.

Neither of them support aout core dumps.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08  2:18   ` Roman Zippel
  2007-03-08  5:28     ` David Brown
@ 2007-03-08  9:00     ` Christoph Hellwig
  2007-03-08 15:53       ` Arjan van de Ven
  1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2007-03-08  9:00 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Christoph Hellwig, David Brown, Linux Kernel Mailing List

On Thu, Mar 08, 2007 at 03:18:04AM +0100, Roman Zippel wrote:
> Hi,
> 
> On Tuesday 06 March 2007 10:29, Christoph Hellwig wrote:
> 
> > PAGE_SIZE should not be available at all.  Please use getpagesize()
> > instead.
> 
> While I agree, NBPG is a bit of a problem, although it's only needed for aout 
> coredumps AFAICT, but still needed to compile e.g. gdb.

So we should export this one with an arbitrary value (on multiple page
size architectures) and a warning, maybe even an __deprecated attached to
it.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08  9:00     ` Christoph Hellwig
@ 2007-03-08 15:53       ` Arjan van de Ven
  2007-03-08 16:08         ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Arjan van de Ven @ 2007-03-08 15:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Roman Zippel, David Brown, Linux Kernel Mailing List

On Thu, 2007-03-08 at 09:00 +0000, Christoph Hellwig wrote:
> On Thu, Mar 08, 2007 at 03:18:04AM +0100, Roman Zippel wrote:
> > Hi,
> > 
> > On Tuesday 06 March 2007 10:29, Christoph Hellwig wrote:
> > 
> > > PAGE_SIZE should not be available at all.  Please use getpagesize()
> > > instead.
> > 
> > While I agree, NBPG is a bit of a problem, although it's only needed for aout 
> > coredumps AFAICT, but still needed to compile e.g. gdb.
> 
> So we should export this one with an arbitrary value (on multiple page
> size architectures) and a warning, maybe even an __deprecated attached to
> it.

if we think the kernel should export this one, we could do

#ifndef __KERNEL__
#define PAGE_SIZE getpagesize()
#endif


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 15:53       ` Arjan van de Ven
@ 2007-03-08 16:08         ` Christoph Hellwig
  2007-03-08 16:21           ` Daniel Jacobowitz
  2007-03-08 17:05           ` H. Peter Anvin
  0 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2007-03-08 16:08 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Hellwig, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

On Thu, Mar 08, 2007 at 07:53:49AM -0800, Arjan van de Ven wrote:
> > > > PAGE_SIZE should not be available at all.  Please use getpagesize()
> > > > instead.
> > > 
> > > While I agree, NBPG is a bit of a problem, although it's only needed for aout 
> > > coredumps AFAICT, but still needed to compile e.g. gdb.
> > 
> > So we should export this one with an arbitrary value (on multiple page
> > size architectures) and a warning, maybe even an __deprecated attached to
> > it.
> 
> if we think the kernel should export this one, we could do
> 
> #ifndef __KERNEL__
> #define PAGE_SIZE getpagesize()
> #endif

No, no no.  We should never export PAGE_SIZE.  We might export NBPG
as deprecated symbol for gdb if it really needs it, but that should
happen only on a.out systems, and it it should be a true constant,
not depending on PAGE_SIZE.

I've Cc'ed the gdb list on whether they have any comments on this
issue.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 16:08         ` Christoph Hellwig
@ 2007-03-08 16:21           ` Daniel Jacobowitz
  2007-03-08 17:05           ` H. Peter Anvin
  1 sibling, 0 replies; 34+ messages in thread
From: Daniel Jacobowitz @ 2007-03-08 16:21 UTC (permalink / raw)
  To: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

On Thu, Mar 08, 2007 at 04:08:52PM +0000, Christoph Hellwig wrote:
> No, no no.  We should never export PAGE_SIZE.  We might export NBPG
> as deprecated symbol for gdb if it really needs it, but that should
> happen only on a.out systems, and it it should be a true constant,
> not depending on PAGE_SIZE.
> 
> I've Cc'ed the gdb list on whether they have any comments on this
> issue.

Sounds reasonable.  I do not believe that GDB has any dependence on
PAGE_SIZE; bfd (i.e. both gdb and binutils) use NBPG on a large number
of systems.  Looks like i386, alpha, m68k, s390, vax - but don't quote
me on that, I had to guess from the configure script.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 16:08         ` Christoph Hellwig
  2007-03-08 16:21           ` Daniel Jacobowitz
@ 2007-03-08 17:05           ` H. Peter Anvin
  2007-03-08 17:12             ` Christoph Hellwig
  2007-03-08 17:57             ` Anton Blanchard
  1 sibling, 2 replies; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-08 17:05 UTC (permalink / raw)
  To: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

Christoph Hellwig wrote:
> No, no no.  We should never export PAGE_SIZE.  We might export NBPG
> as deprecated symbol for gdb if it really needs it, but that should
> happen only on a.out systems, and it it should be a true constant,
> not depending on PAGE_SIZE.
> 
> I've Cc'ed the gdb list on whether they have any comments on this
> issue.

By the way, it's a massive snafu that the swap area magic number is 
dependent on PAGE_SIZE.  There is absolutely no good reason for that.

	-hp

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 17:05           ` H. Peter Anvin
@ 2007-03-08 17:12             ` Christoph Hellwig
  2007-03-08 17:57             ` Anton Blanchard
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2007-03-08 17:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

On Thu, Mar 08, 2007 at 09:05:48AM -0800, H. Peter Anvin wrote:
> Christoph Hellwig wrote:
> >No, no no.  We should never export PAGE_SIZE.  We might export NBPG
> >as deprecated symbol for gdb if it really needs it, but that should
> >happen only on a.out systems, and it it should be a true constant,
> >not depending on PAGE_SIZE.
> >
> >I've Cc'ed the gdb list on whether they have any comments on this
> >issue.
> 
> By the way, it's a massive snafu that the swap area magic number is 
> dependent on PAGE_SIZE.  There is absolutely no good reason for that.

Yeah, now that you mention it I remember having problems with that
in the past.  We should probably create a new swap format that avoids
this problem.  I'll put it on my ever growing todo list.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 17:05           ` H. Peter Anvin
  2007-03-08 17:12             ` Christoph Hellwig
@ 2007-03-08 17:57             ` Anton Blanchard
  2007-03-08 18:04               ` H. Peter Anvin
  2007-03-08 21:03               ` David Brown
  1 sibling, 2 replies; 34+ messages in thread
From: Anton Blanchard @ 2007-03-08 17:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb


> By the way, it's a massive snafu that the swap area magic number is 
> dependent on PAGE_SIZE.  There is absolutely no good reason for that.

Agreed, its been a big problem booting between 4kB and 64kB kernels on
ppc64.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 17:57             ` Anton Blanchard
@ 2007-03-08 18:04               ` H. Peter Anvin
  2007-03-08 21:42                 ` Anton Blanchard
  2007-03-08 21:03               ` David Brown
  1 sibling, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-08 18:04 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

Anton Blanchard wrote:
>> By the way, it's a massive snafu that the swap area magic number is 
>> dependent on PAGE_SIZE.  There is absolutely no good reason for that.
> 
> Agreed, its been a big problem booting between 4kB and 64kB kernels on
> ppc64.

The easiest way to fix this would be to always park the swap magic at 
the offset of the smallest page size in use, which is 4K.  This is 
analogous how the offset for the ext2/3 superblock got fixed at 1K -- 
for 1K blocks, it's the second block, but for larger blocks, it's part 
of the first block.  If we fix the offset of the swap magic at 4096 
minus the offset that's already there, it will always fall in the first 
page regardless of page size.

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 17:57             ` Anton Blanchard
  2007-03-08 18:04               ` H. Peter Anvin
@ 2007-03-08 21:03               ` David Brown
  1 sibling, 0 replies; 34+ messages in thread
From: David Brown @ 2007-03-08 21:03 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: H. Peter Anvin, Christoph Hellwig, Arjan van de Ven,
	Roman Zippel, Linux Kernel Mailing List, gdb

On 3/8/07, Anton Blanchard <anton@samba.org> wrote:
>
> > By the way, it's a massive snafu that the swap area magic number is
> > dependent on PAGE_SIZE.  There is absolutely no good reason for that.
>
> Agreed, its been a big problem booting between 4kB and 64kB kernels on
> ppc64.

Okay this really seems like a couple of things need to happen, first
change swap dependency on PAGE_SIZE, then move the __KERNEL__ define
above the PAGE_SIZE and friends defines in the appropriate
asm-*/page.h files.

Do these tasks need to happen in this order? I haven't really looked
at the swap code at all...

Also, I'd be willing to help I've done kernel coding for experimental
projects and such but nothing for kernel.org so I might need some
shepherding.

- David Brown

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-06  2:21       ` H. Peter Anvin
@ 2007-03-08 21:08         ` Avi Kivity
  2007-03-08 22:21           ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2007-03-08 21:08 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: David Miller, dmlb2000, linux-kernel

H. Peter Anvin wrote:
> David Miller wrote:
>>>>
>>> Thanks, but that still leaves PAGE_SIZE available for some
>>> architectures and not for others shouldn't this be moved inside
>>> __KERNEL__ in i386 and x86_64 then?
>>
>> I definitely think so.
>
> It definitely should, especially on x86-64, where the page size isn't 
> guaranteed by the ABI (on i386, the ABI guarantees a 4K page size; on 
> x86-64 it can be up to 64K.)
>

Wouldn't that be ia64?


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 18:04               ` H. Peter Anvin
@ 2007-03-08 21:42                 ` Anton Blanchard
  2007-03-08 21:46                   ` Anton Blanchard
                                     ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Anton Blanchard @ 2007-03-08 21:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

 
Hi Peter,

> The easiest way to fix this would be to always park the swap magic at 
> the offset of the smallest page size in use, which is 4K.  This is 
> analogous how the offset for the ext2/3 superblock got fixed at 1K -- 
> for 1K blocks, it's the second block, but for larger blocks, it's part 
> of the first block.  If we fix the offset of the swap magic at 4096 
> minus the offset that's already there, it will always fall in the first 
> page regardless of page size.

Yeah that makes sense. I gave it a go by creating a MIN_PAGE_SIZE
define, and allowing an architecture to override it if required.

A couple of issues:

1. Parts of the swap header are in PAGE_SIZE chunks so I made them
MIN_PAGE_SIZE chunks too.

2. The badblocks stuff is PAGE_SIZEd too. Do we ever use it on modern
disks? Maybe we can just remove this support.

3. This will unfortunately break machines currently running a 64kB
kernel with swap space. We may just have to lump it and fix on upgade.

Anton
--

Our current swap layout has issues with variable page size kernels.
Instead of using the page size at runtime, base it on the minimum page
size the architecture supports.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h	2007-03-08 15:14:30.000000000 -0600
+++ linux-2.6/include/linux/swap.h	2007-03-08 15:14:33.000000000 -0600
@@ -48,14 +48,17 @@
  * old reserved area - some extra information. Note that the first
  * kilobyte is reserved for boot loader or disk label stuff...
  *
- * Having the magic at the end of the PAGE_SIZE makes detecting swap
- * areas somewhat tricky on machines that support multiple page sizes.
- * For 2.5 we'll probably want to move the magic to just beyond the
- * bootbits...
+ * Version 1 and 2 swap headers store the magic at the end of the
+ * PAGE_SIZE which causes problems for architectures with multiple
+ * page sizes. An architecture can define MIN_PAGE_SIZE to be used
+ * regardless of the kernel page size to get around this.
  */
+#ifndef MIN_PAGE_SIZE
+#define MIN_PAGE_SIZE PAGE_SIZE
+#endif
 union swap_header {
 	struct {
-		char reserved[PAGE_SIZE - 10];
+		char reserved[MIN_PAGE_SIZE - 10];
 		char magic[10];			/* SWAP-SPACE or SWAPSPACE2 */
 	} magic;
 	struct {
Index: linux-2.6/include/asm-powerpc/page.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/page.h	2007-03-08 15:14:30.000000000 -0600
+++ linux-2.6/include/asm-powerpc/page.h	2007-03-08 15:14:33.000000000 -0600
@@ -24,8 +24,10 @@
 #else
 #define PAGE_SHIFT		12
 #endif
+#define MIN_PAGE_SHIFT		12
 
 #define PAGE_SIZE		(ASM_CONST(1) << PAGE_SHIFT)
+#define MIN_PAGE_SIZE		(ASM_CONST(1) << MIN_PAGE_SHIFT)
 
 /* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */
 #define __HAVE_ARCH_GATE_AREA		1
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c	2007-03-08 14:48:03.000000000 -0600
+++ linux-2.6/mm/swapfile.c	2007-03-08 15:33:04.000000000 -0600
@@ -1568,6 +1568,12 @@
 		p->cluster_next = 1;
 
 		/*
+		 * last_page is in MIN_PAGE_SIZE chunks, scale to kernel
+		 * page size.
+		 */
+		swap_header->info.last_page >>= (PAGE_SHIFT - MIN_PAGE_SHIFT);
+
+		/*
 		 * Find out how many pages are allowed for a single swap
 		 * device. There are two limiting factors: 1) the number of
 		 * bits for the swap offset in the swp_entry_t type and

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 21:42                 ` Anton Blanchard
@ 2007-03-08 21:46                   ` Anton Blanchard
  2007-03-08 21:48                   ` David Miller
  2007-03-08 22:22                   ` H. Peter Anvin
  2 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2007-03-08 21:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb


Hi,

> Our current swap layout has issues with variable page size kernels.
> Instead of using the page size at runtime, base it on the minimum page
> size the architecture supports.

A hacked up patch to userspace utilities to test the kernel patch. BTW
It looks like there are some real bugs here:

            int pagesize = getpagesize();
            int rd;
            char buf[32768];

            rd = pagesize;
            if (rd < 8192)
                    rd = 8192;
            if (rd > sizeof(buf))
                    rd = sizeof(buf);
            if (lseek(fd, 0, SEEK_SET) != 0
                || read(fd, buf, rd) != rd)
                    goto io_error;
            if (may_be_swap(buf+pagesize) ||
                may_be_swap(buf+4096) || may_be_swap(buf+8192))
                    type = "swap";

If page size == 64kB wont we read past the end of buf?

Anton

Index: util-linux-2.12r/disk-utils/mkswap.c
===================================================================
--- util-linux-2.12r.orig/disk-utils/mkswap.c	2007-03-08 14:52:53.000000000 -0600
+++ util-linux-2.12r/disk-utils/mkswap.c	2007-03-08 14:58:09.000000000 -0600
@@ -169,7 +158,11 @@
 #ifdef PAGE_SIZE
 	defined_pagesize = PAGE_SIZE;
 #endif
+#ifdef __powerpc__
+	kernel_pagesize = 4096;
+#else
 	kernel_pagesize = getpagesize();
+#endif
 	pagesize = kernel_pagesize;
 
 	if (user_pagesize) {
Index: util-linux-2.12r/mount/get_label_uuid.c
===================================================================
--- util-linux-2.12r.orig/mount/get_label_uuid.c	2007-03-08 14:52:53.000000000 -0600
+++ util-linux-2.12r/mount/get_label_uuid.c	2007-03-08 14:58:09.000000000 -0600
@@ -79,7 +79,11 @@
 
 static int
 is_v1_swap_partition(int fd, char **label, char *uuid) {
+#ifdef __powerpc__
+	int n = 4096;
+#else
 	int n = getpagesize();
+#endif
 	char *buf = xmalloc(n);
 	struct swap_header_v1_2 *p = (struct swap_header_v1_2 *) buf;
 
Index: util-linux-2.12r/mount/mount_guess_fstype.c
===================================================================
--- util-linux-2.12r.orig/mount/mount_guess_fstype.c	2007-03-08 14:52:53.000000000 -0600
+++ util-linux-2.12r/mount/mount_guess_fstype.c	2007-03-08 14:54:25.000000000 -0600
@@ -462,7 +462,11 @@
     if (!type) {
 	    /* perhaps the user tries to mount the swap space
 	       on a new disk; warn her before she does mke2fs on it */
+#ifdef __powerpc__
+	    int pagesize = 4096;
+#else
 	    int pagesize = getpagesize();
+#endif
 	    int rd;
 	    char buf[32768];
 
Index: util-linux-2.12r/rescuept/rescuept.c
===================================================================
--- util-linux-2.12r.orig/rescuept/rescuept.c	2007-03-08 14:52:53.000000000 -0600
+++ util-linux-2.12r/rescuept/rescuept.c	2007-03-08 14:55:49.000000000 -0600
@@ -510,7 +510,11 @@
 		size = s.st_size / 512;
 	}
 
+#ifdef __powerpc__
+	pagesize = 4096;
+#else
 	pagesize = getpagesize();
+#endif
 	if (pagesize <= 0)
 		pagesize = 4096;
 	else if (pagesize > MAXPAGESZ) {

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 21:42                 ` Anton Blanchard
  2007-03-08 21:46                   ` Anton Blanchard
@ 2007-03-08 21:48                   ` David Miller
  2007-03-09  2:43                     ` Anton Blanchard
  2007-03-08 22:22                   ` H. Peter Anvin
  2 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2007-03-08 21:48 UTC (permalink / raw)
  To: anton; +Cc: hpa, hch, arjan, zippel, dmlb2000, linux-kernel, gdb

From: Anton Blanchard <anton@samba.org>
Date: Thu, 8 Mar 2007 15:42:36 -0600

> > The easiest way to fix this would be to always park the swap magic at 
> > the offset of the smallest page size in use, which is 4K.  This is 
> > analogous how the offset for the ext2/3 superblock got fixed at 1K -- 
> > for 1K blocks, it's the second block, but for larger blocks, it's part 
> > of the first block.  If we fix the offset of the swap magic at 4096 
> > minus the offset that's already there, it will always fall in the first 
> > page regardless of page size.
> 
> Yeah that makes sense. I gave it a go by creating a MIN_PAGE_SIZE
> define, and allowing an architecture to override it if required.

I might be missing something but doesn't this break every
SWAP partition that was created with something other than
MIN_PAGE_SIZE?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 21:08         ` Avi Kivity
@ 2007-03-08 22:21           ` H. Peter Anvin
  2007-03-19 19:39             ` Eric W. Biederman
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-08 22:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: David Miller, dmlb2000, linux-kernel

Avi Kivity wrote:
>>
>> It definitely should, especially on x86-64, where the page size isn't 
>> guaranteed by the ABI (on i386, the ABI guarantees a 4K page size; on 
>> x86-64 it can be up to 64K.)
> 
> Wouldn't that be ia64?

No, the x86-64 EFI ABI permits page sizes up to 64K.  Currently, of 
course, the only page size in use is 4K, but unlike i386 that's not 
guaranteed by the ABI.  At least AMD has indicated that they are 
considering introducing larger page size support in future hardware.

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 21:42                 ` Anton Blanchard
  2007-03-08 21:46                   ` Anton Blanchard
  2007-03-08 21:48                   ` David Miller
@ 2007-03-08 22:22                   ` H. Peter Anvin
  2 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-08 22:22 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Christoph Hellwig, Arjan van de Ven, Roman Zippel, David Brown,
	Linux Kernel Mailing List, gdb

Anton Blanchard wrote:
> 
> 2. The badblocks stuff is PAGE_SIZEd too. Do we ever use it on modern
> disks? Maybe we can just remove this support.
> 

Badblocks is definitely still used in some configurations.

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 21:48                   ` David Miller
@ 2007-03-09  2:43                     ` Anton Blanchard
  2007-03-09  4:18                       ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: Anton Blanchard @ 2007-03-09  2:43 UTC (permalink / raw)
  To: David Miller; +Cc: hpa, hch, arjan, zippel, dmlb2000, linux-kernel, gdb


Hi,

> I might be missing something but doesn't this break every
> SWAP partition that was created with something other than
> MIN_PAGE_SIZE?

It does. I was thinking we could work around it in ppc64 (64kB is quite
new), but I forgot there are options on sparc64 to change the page size :)

The other option is to create a v3 swap format that doesnt use any
PAGE_SIZE parameters.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-09  2:43                     ` Anton Blanchard
@ 2007-03-09  4:18                       ` H. Peter Anvin
  2007-03-09  4:27                         ` David Miller
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-09  4:18 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: David Miller, hch, arjan, zippel, dmlb2000, linux-kernel, gdb

Anton Blanchard wrote:
> Hi,
> 
>> I might be missing something but doesn't this break every
>> SWAP partition that was created with something other than
>> MIN_PAGE_SIZE?
> 
> It does. I was thinking we could work around it in ppc64 (64kB is quite
> new), but I forgot there are options on sparc64 to change the page size :)
> 
> The other option is to create a v3 swap format that doesnt use any
> PAGE_SIZE parameters.
> 

The best thing to do would be to look for the magic both at PAGE_SIZE 
(for compatibility) and MIN_PAGE_SIZE (for sanity.)

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-09  4:18                       ` H. Peter Anvin
@ 2007-03-09  4:27                         ` David Miller
  2007-03-09  4:31                           ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2007-03-09  4:27 UTC (permalink / raw)
  To: hpa; +Cc: anton, hch, arjan, zippel, dmlb2000, linux-kernel, gdb

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Thu, 08 Mar 2007 20:18:28 -0800

> Anton Blanchard wrote:
> > The other option is to create a v3 swap format that doesnt use any
> > PAGE_SIZE parameters.
> 
> The best thing to do would be to look for the magic both at PAGE_SIZE 
> (for compatibility) and MIN_PAGE_SIZE (for sanity.)

That might work, but a large part of me says to go for v3
and do it cleanly.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-09  4:27                         ` David Miller
@ 2007-03-09  4:31                           ` H. Peter Anvin
  2007-03-09  4:36                             ` David Miller
  2007-03-21  2:12                             ` Anton Blanchard
  0 siblings, 2 replies; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-09  4:31 UTC (permalink / raw)
  To: David Miller; +Cc: anton, hch, arjan, zippel, dmlb2000, linux-kernel, gdb

David Miller wrote:
> From: "H. Peter Anvin" <hpa@zytor.com>
> Date: Thu, 08 Mar 2007 20:18:28 -0800
> 
>> Anton Blanchard wrote:
>>> The other option is to create a v3 swap format that doesnt use any
>>> PAGE_SIZE parameters.
>> The best thing to do would be to look for the magic both at PAGE_SIZE 
>> (for compatibility) and MIN_PAGE_SIZE (for sanity.)
> 
> That might work, but a large part of me says to go for v3
> and do it cleanly.

The advantage would be that it wouldn't require a v3 for platforms for 
which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
percentage of systems.

You still have to look for the darn magic in two places, so there is no 
reason for it to be different.

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-09  4:31                           ` H. Peter Anvin
@ 2007-03-09  4:36                             ` David Miller
  2007-03-21  2:12                             ` Anton Blanchard
  1 sibling, 0 replies; 34+ messages in thread
From: David Miller @ 2007-03-09  4:36 UTC (permalink / raw)
  To: hpa; +Cc: anton, hch, arjan, zippel, dmlb2000, linux-kernel, gdb

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Thu, 08 Mar 2007 20:31:05 -0800

> The advantage would be that it wouldn't require a v3 for platforms for 
> which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
> percentage of systems.
> 
> You still have to look for the darn magic in two places, so there is no 
> reason for it to be different.

Good point.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-08 22:21           ` H. Peter Anvin
@ 2007-03-19 19:39             ` Eric W. Biederman
  0 siblings, 0 replies; 34+ messages in thread
From: Eric W. Biederman @ 2007-03-19 19:39 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Avi Kivity, David Miller, dmlb2000, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Avi Kivity wrote:
>>>
>>> It definitely should, especially on x86-64, where the page size isn't
>>> guaranteed by the ABI (on i386, the ABI guarantees a 4K page size; on x86-64
>>> it can be up to 64K.)
>>
>> Wouldn't that be ia64?
>
> No, the x86-64 EFI ABI permits page sizes up to 64K.  Currently, of course, the
> only page size in use is 4K, but unlike i386 that's not guaranteed by the ABI.
> At least AMD has indicated that they are considering introducing larger page
> size support in future hardware.

EFI ABI?  Don't you mean the SYSV ABI right.

That does seem to indicate that 64K is an option.  And at a quick
glance glibc is using 1M alignment on program segments so it
looks like there is at least a reasonable chance of being able
to make the transition to a bigger page size for user space.

Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-09  4:31                           ` H. Peter Anvin
  2007-03-09  4:36                             ` David Miller
@ 2007-03-21  2:12                             ` Anton Blanchard
  2007-03-21  2:48                               ` H. Peter Anvin
  1 sibling, 1 reply; 34+ messages in thread
From: Anton Blanchard @ 2007-03-21  2:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Miller, hch, arjan, zippel, dmlb2000, linux-kernel, gdb


Hi,

> The advantage would be that it wouldn't require a v3 for platforms for 
> which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
> percentage of systems.
> 
> You still have to look for the darn magic in two places, so there is no 
> reason for it to be different.

The problem is if you can hit in two places then what PAGE_SIZE should
you use to size the contents of the swap header while remaining backward
compatible.

Im leaning towards Dave suggestion of creating a clean v3 swap header.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: PAGE_SIZE Availability Inconsistency
  2007-03-21  2:12                             ` Anton Blanchard
@ 2007-03-21  2:48                               ` H. Peter Anvin
  0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2007-03-21  2:48 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: David Miller, hch, arjan, zippel, dmlb2000, linux-kernel, gdb

Anton Blanchard wrote:
> Hi,
> 
>> The advantage would be that it wouldn't require a v3 for platforms for 
>> which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
>> percentage of systems.
>>
>> You still have to look for the darn magic in two places, so there is no 
>> reason for it to be different.
> 
> The problem is if you can hit in two places then what PAGE_SIZE should
> you use to size the contents of the swap header while remaining backward
> compatible.
> 
> Im leaning towards Dave suggestion of creating a clean v3 swap header.
> 

Changing the header format doesn't make *ANY* difference whatsoever.

You have to write two copies of the swap header, and the kernel should 
check for a header at MIN_PAGE_SIZE first and then at PAGE_SIZE.

If there are fields (other than position) in the v2 swap header that are 
dependent on PAGE_SIZE, then the copy at MIN_PAGE_SIZE should be sized 
using MIN_PAGE_SIZE, and the copy at PAGE_SIZE should be sized at 
PAGE_SIZE.  It's that simple.

Creating a new format will not help that one iota, and will create 
gratuitous incompatiblity for the very common case of PAGE_SIZE == 
MIN_PAGE_SIZE.

	-hpa


	-hpa


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-03-21  2:52 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-05 23:55 PAGE_SIZE Availability Inconsistency David Brown
2007-03-05 23:59 ` Eric Dumazet
2007-03-06  0:01 ` Randy Dunlap
2007-03-06  0:03 ` David Miller
2007-03-06  0:04   ` David Brown
2007-03-06  0:26     ` David Miller
2007-03-06  2:21       ` H. Peter Anvin
2007-03-08 21:08         ` Avi Kivity
2007-03-08 22:21           ` H. Peter Anvin
2007-03-19 19:39             ` Eric W. Biederman
2007-03-06  9:29 ` Christoph Hellwig
2007-03-08  2:18   ` Roman Zippel
2007-03-08  5:28     ` David Brown
2007-03-08  8:32       ` Christoph Hellwig
2007-03-08  9:00     ` Christoph Hellwig
2007-03-08 15:53       ` Arjan van de Ven
2007-03-08 16:08         ` Christoph Hellwig
2007-03-08 16:21           ` Daniel Jacobowitz
2007-03-08 17:05           ` H. Peter Anvin
2007-03-08 17:12             ` Christoph Hellwig
2007-03-08 17:57             ` Anton Blanchard
2007-03-08 18:04               ` H. Peter Anvin
2007-03-08 21:42                 ` Anton Blanchard
2007-03-08 21:46                   ` Anton Blanchard
2007-03-08 21:48                   ` David Miller
2007-03-09  2:43                     ` Anton Blanchard
2007-03-09  4:18                       ` H. Peter Anvin
2007-03-09  4:27                         ` David Miller
2007-03-09  4:31                           ` H. Peter Anvin
2007-03-09  4:36                             ` David Miller
2007-03-21  2:12                             ` Anton Blanchard
2007-03-21  2:48                               ` H. Peter Anvin
2007-03-08 22:22                   ` H. Peter Anvin
2007-03-08 21:03               ` David Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.