linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: large page patch (fwd) (fwd)
       [not found] <E17ahdi-0001RC-00@w-gerrit2>
@ 2002-08-02 19:34 ` Linus Torvalds
  2002-08-03  3:19   ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-02 19:34 UTC (permalink / raw)
  To: Gerrit Huizenga; +Cc: Hubertus Franke, Martin.Bligh, wli, Kernel Mailing List


[ linux-kernel cc'd, simply because I don't want to write the same thing 
  over and over again ]

[ Executive summary: the current large-page-patch is nothing but a magic 
  interface to the TLB. Don't view it as anything else, or you'll just be
  confused by all the smoke and mirrors. ]

On Fri, 2 Aug 2002, Gerrit Huizenga wrote:
> > Because _none_ of the large-page codepaths are shared with _any_ of the
> > normal cases.
> 
> Isn't that currently an implementation detail?

Yes and no.

We may well expand the FS layer to bigger pages, but "bigger" is almost
certainly not going to include things like 256MB pages - if for no other
reason than the fact that memory fragmentation really means that the limit
on page sizes in practice is somewhere around 128kB for any reasonable
usage patterns even with gigabytes of RAM. 

And _maybe_ we might get to the single-digit megabytes. I doubt it, simply
because even with a good buddy allocator and a memory manager that
actively frees pages to get large contiguous chunks of RAM, it's basically
impossible to have something that can reliably give you that big chunks
without making normal performance go totally down the toiled.

(Yeah, once you have terabytes of memory, that worry probably ends up
largely going away. I don't think that is going to be a common enough
platform for Linux to care about in the next ten years, though).

So there are implementation issues, yes. In particular, there _is_ a push 
for larger pages in the FS and generic MM layers too, but the issues there 
are very different and have no basically no generality with the TLB and 
page table mapping issues of the current push.

What this VM/VFS push means is that we may actually have a _different_ 
"large page" support on that level, where the most likely implementation 
is that the "struct address_space" will at some point have a new member 
that specifies the "page allocation order" for that address space. This 
will allow us to do per-file allocations, so that some files (or some 
filesystems) migth want to do all IO in 64kB chunks, and they'd just make 
the address_space specify a page allocation order that matches that.

This is in fact one of the reasons I explicitly _want_ to keep the
interfaces separate - because there are two totally different issues at
play, and I suspect that we'll end up implementing _both_ of them, but
that they will _still_ have no commonalities.

The current largepage patch is really nothing but an interface to the TLB.  
Please view it as that - a direct TLB interface that has zero impact on
the VFS or VM layers, and that is meant _purely_ as a way to expose hw 
capabilities to the few applications that really really want them.

The important thing to take away from this is that _even_ if we could
change the FS and VM layers to know about a per-address_space variable-
sized PAGE_CACHE_SIZE (which I think it the long-term goal), that doesn't 
impact the fact that we _also_ want to have the TLB interface. 

Maybe the largepage patch could be improved upon by just renaming it, and
making clear that it's a "TLB_hugepage" thing. That's what a CPU designer
thinks of when you say "largepage" to him. Some of the confusion is 
probably because a VM/FS person in an OS group does _not_ necessarily 
think the same way, but thinks about doing big-granularity IO.

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-02 19:34 ` large page patch (fwd) (fwd) Linus Torvalds
@ 2002-08-03  3:19   ` David Mosberger
  2002-08-03  3:32     ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-03  3:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Gerrit Huizenga, Hubertus Franke, Martin.Bligh, wli, Kernel Mailing List

>>>>> On Fri, 2 Aug 2002 12:34:08 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

  Linus> We may well expand the FS layer to bigger pages, but "bigger"
  Linus> is almost certainly not going to include things like 256MB
  Linus> pages - if for no other reason than the fact that memory
  Linus> fragmentation really means that the limit on page sizes in
  Linus> practice is somewhere around 128kB for any reasonable usage
  Linus> patterns even with gigabytes of RAM.

  Linus> And _maybe_ we might get to the single-digit megabytes. I
  Linus> doubt it, simply because even with a good buddy allocator and
  Linus> a memory manager that actively frees pages to get large
  Linus> contiguous chunks of RAM, it's basically impossible to have
  Linus> something that can reliably give you that big chunks without
  Linus> making normal performance go totally down the toiled.

The Rice people avoided some of the fragmentation problems by
pro-actively allocating a max-order physical page, even when only a
(small) virtual page was being mapped.  This should work very well as
long as the total memory usage (including memory lost due to internal
fragmentation of max-order physical pages) doesn't exceed available
memory.  That's not a condition which will hold for every system in
the world, but I suspect it is true for lots of systems for large
periods of time.  And since superpages quickly become
counter-productive in tight-memory situations anyhow, this seems like
a very reasonable approach.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  3:19   ` David Mosberger
@ 2002-08-03  3:32     ` Linus Torvalds
  2002-08-03  4:17       ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-03  3:32 UTC (permalink / raw)
  To: davidm
  Cc: Gerrit Huizenga, Hubertus Franke, Martin.Bligh, wli, Kernel Mailing List


On Fri, 2 Aug 2002, David Mosberger wrote:
> 
> The Rice people avoided some of the fragmentation problems by
> pro-actively allocating a max-order physical page, even when only a
> (small) virtual page was being mapped.

This probably works ok if
 - the superpages are only slightly smaller than the smaller page
 - superpages are a nice optimization.

>				  And since superpages quickly become
> counter-productive in tight-memory situations anyhow, this seems like
> a very reasonable approach.

Ehh.. The only people who are _really_ asking for the superpages want 
almost nothing _but_ superpages. They are willing to use 80% of all memory 
for just superpages.

Yes, it's Oracle etc, and the whole point for these users is to avoid 
having any OS memory allocation for these areas.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  3:32     ` Linus Torvalds
@ 2002-08-03  4:17       ` David Mosberger
  2002-08-03  4:26         ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-03  4:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: davidm, Gerrit Huizenga, Hubertus Franke, Martin.Bligh, wli,
	Kernel Mailing List

>>>>> On Fri, 2 Aug 2002 20:32:10 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

  >> And since superpages quickly become counter-productive in
  >> tight-memory situations anyhow, this seems like a very reasonable
  >> approach.

  Linus> Ehh.. The only people who are _really_ asking for the
  Linus> superpages want almost nothing _but_ superpages. They are
  Linus> willing to use 80% of all memory for just superpages.

  Linus> Yes, it's Oracle etc, and the whole point for these users is
  Linus> to avoid having any OS memory allocation for these areas.

My terminology is perhaps a bit too subtle: I user "superpage"
exclusively for the case where multiple pages get coalesced into a
larger page.  The "large page" ("huge page") case that you were
talking about is different, since pages never get demoted or promoted.

I wasn't disagreeing with your case for separate large page syscalls.
Those syscalls certainly simplify implementation and, as you point
out, it well may be the case that a transparent superpage scheme never
will be able to replace the former.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  4:17       ` David Mosberger
@ 2002-08-03  4:26         ` Linus Torvalds
  2002-08-03  4:39           ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-03  4:26 UTC (permalink / raw)
  To: davidm
  Cc: Gerrit Huizenga, Hubertus Franke, Martin.Bligh, wli, Kernel Mailing List



On Fri, 2 Aug 2002, David Mosberger wrote:
>
> My terminology is perhaps a bit too subtle: I user "superpage"
> exclusively for the case where multiple pages get coalesced into a
> larger page.  The "large page" ("huge page") case that you were
> talking about is different, since pages never get demoted or promoted.

Ahh, ok.

> I wasn't disagreeing with your case for separate large page syscalls.
> Those syscalls certainly simplify implementation and, as you point
> out, it well may be the case that a transparent superpage scheme never
> will be able to replace the former.

Somebody already had patches for the transparent superpage thing for
alpha, which supports it. I remember seeing numbers implying that helped
noticeably.

But yes, that definitely doesn't work for humongous pages (or whatever we
should call the multi-megabyte-special-case-thing ;).

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  4:26         ` Linus Torvalds
@ 2002-08-03  4:39           ` David Mosberger
  2002-08-03  5:20             ` David S. Miller
  2002-08-03 18:41             ` Hubertus Franke
  0 siblings, 2 replies; 110+ messages in thread
From: David Mosberger @ 2002-08-03  4:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: davidm, Gerrit Huizenga, Hubertus Franke, Martin.Bligh, wli,
	Kernel Mailing List

>>>>> On Fri, 2 Aug 2002 21:26:52 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

  >> I wasn't disagreeing with your case for separate large page
  >> syscalls.  Those syscalls certainly simplify implementation and,
  >> as you point out, it well may be the case that a transparent
  >> superpage scheme never will be able to replace the former.

  Linus> Somebody already had patches for the transparent superpage
  Linus> thing for alpha, which supports it. I remember seeing numbers
  Linus> implying that helped noticeably.

Yes, I saw those.  I still like the Rice work a _lot_ better.  It's
just a thing of beauty, from a design point of view (disclaimer: I
haven't seen the implementation, so there may be ugly things
lurking...).

  Linus> But yes, that definitely doesn't work for humongous pages (or
  Linus> whatever we should call the multi-megabyte-special-case-thing
  Linus> ;).

Yes, you're probably right.  2MB was reported to be fine in the Rice
experiments, but I doubt 256MB (and much less 4GB, as supported by
some CPUs) would fly.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  4:39           ` David Mosberger
@ 2002-08-03  5:20             ` David S. Miller
  2002-08-03 17:35               ` Linus Torvalds
  2002-08-03 18:41             ` Hubertus Franke
  1 sibling, 1 reply; 110+ messages in thread
From: David S. Miller @ 2002-08-03  5:20 UTC (permalink / raw)
  To: davidm, davidm; +Cc: torvalds, gh, frankeh, Martin.Bligh, wli, linux-kernel

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Fri, 2 Aug 2002 21:39:36 -0700

   >>>>> On Fri, 2 Aug 2002 21:26:52 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:
   
     >> I wasn't disagreeing with your case for separate large page
     >> syscalls.  Those syscalls certainly simplify implementation and,
     >> as you point out, it well may be the case that a transparent
     >> superpage scheme never will be able to replace the former.
   
     Linus> Somebody already had patches for the transparent superpage
     Linus> thing for alpha, which supports it. I remember seeing numbers
     Linus> implying that helped noticeably.
   
   Yes, I saw those.  I still like the Rice work a _lot_ better.

Now here's the thing.  To me, we should be adding these superpage
syscalls to things like the implementation of malloc() :-) If you
allocate enough anonymous pages together, you should get a superpage
in the TLB if that is easy to do.  Once any hint of memory pressure
occurs, you just break up the large page clusters as you hit such
ptes.  This is what one of the Linux large-page implementations did
and I personally find it the most elegant way to handle the so called
"paging complexity" of transparent superpages.

At that point it's like "why the system call".  If it would rather be
more of a large-page reservation system than a "optimization hint"
then these syscalls would sit better with me.  Currently I think they
are superfluous.  To me the hint to use large-pages is a given :-)

Stated another way, if these syscalls said "gimme large pages for this
area and lock them into memory", this would be fine.  If the syscalls
say "use large pages if you can", that's crap.  And in fact we could
use mmap() attribute flags if we really thought that stating this was
necessary.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  5:20             ` David S. Miller
@ 2002-08-03 17:35               ` Linus Torvalds
  2002-08-03 19:30                 ` David Mosberger
  2002-08-04  0:28                 ` David S. Miller
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-03 17:35 UTC (permalink / raw)
  To: David S. Miller
  Cc: davidm, davidm, gh, frankeh, Martin.Bligh, wli, linux-kernel



On Fri, 2 Aug 2002, David S. Miller wrote:
>
> Now here's the thing.  To me, we should be adding these superpage
> syscalls to things like the implementation of malloc() :-) If you
> allocate enough anonymous pages together, you should get a superpage
> in the TLB if that is easy to do.

For architectures that have these "small" superpages, we can just do it
transparently. That's what the alpha patches did.

The problem space is roughly the same as just page coloring.

> At that point it's like "why the system call".  If it would rather be
> more of a large-page reservation system than a "optimization hint"
> then these syscalls would sit better with me.  Currently I think they
> are superfluous.  To me the hint to use large-pages is a given :-)

Yup.

David, you did page coloring once.

I bet your patches worked reasonably well to color into 4 or 8 colors.

How well do you think something like your old patches would work if

 - you _require_ 1024 colors in order to get the TLB speedup on some
   hypothetical machine (the same hypothetical machine that might
   hypothetically run on 95% of all hardware ;)

 - the machine is under heavy load, and heavy load is exactly when you
   want this optimization to trigger.

Can you explain this difficulty to people?

> Stated another way, if these syscalls said "gimme large pages for this
> area and lock them into memory", this would be fine.  If the syscalls
> say "use large pages if you can", that's crap.  And in fact we could
> use mmap() attribute flags if we really thought that stating this was
> necessary.

I agree 100%.

I think we can at some point do the small cases completely transparently,
with no need for a new system call, and not even any new hint flags. We'll
just silently do 4/8-page superpages and be done with it. Programs don't
need to know about it to take advantage of better TLB usage.

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03  4:39           ` David Mosberger
  2002-08-03  5:20             ` David S. Miller
@ 2002-08-03 18:41             ` Hubertus Franke
  2002-08-03 19:39               ` Linus Torvalds
  2002-08-03 19:41               ` David Mosberger
  1 sibling, 2 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-03 18:41 UTC (permalink / raw)
  To: davidm, David Mosberger, Linus Torvalds
  Cc: davidm, Gerrit Huizenga, Martin.Bligh, wli, Kernel Mailing List

On Saturday 03 August 2002 12:39 am, David Mosberger wrote:
> >>>>> On Fri, 2 Aug 2002 21:26:52 -0700 (PDT), Linus Torvalds
> >>>>> <torvalds@transmeta.com> said:
>   >>
>   >> I wasn't disagreeing with your case for separate large page
>   >> syscalls.  Those syscalls certainly simplify implementation and,
>   >> as you point out, it well may be the case that a transparent
>   >> superpage scheme never will be able to replace the former.
>
>   Linus> Somebody already had patches for the transparent superpage
>   Linus> thing for alpha, which supports it. I remember seeing numbers
>   Linus> implying that helped noticeably.
>
> Yes, I saw those.  I still like the Rice work a _lot_ better.  It's
> just a thing of beauty, from a design point of view (disclaimer: I
> haven't seen the implementation, so there may be ugly things
> lurking...).
>

I agree, the Rice solution is ellegant in the promotion and demotion.

>   Linus> But yes, that definitely doesn't work for humongous pages (or
>   Linus> whatever we should call the multi-megabyte-special-case-thing
>   Linus> ;).
>
> Yes, you're probably right.  2MB was reported to be fine in the Rice
> experiments, but I doubt 256MB (and much less 4GB, as supported by
> some CPUs) would fly.
>
> 	--david

As if the page coloring, it certainly helps.
But I'd like to point out that superpages are there to reduce the number of
TLB misses by providing larger coverage. Simply providing page coloring
will not get you there. 


-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 17:35               ` Linus Torvalds
@ 2002-08-03 19:30                 ` David Mosberger
  2002-08-03 19:43                   ` Linus Torvalds
  2002-08-04  0:28                 ` David S. Miller
  1 sibling, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-03 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David S. Miller, davidm, davidm, gh, frankeh, Martin.Bligh, wli,
	linux-kernel

>>>>> On Sat, 3 Aug 2002 10:35:00 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

  Linus> How well do you think something like your old patches would
  Linus> work if

  Linus>  - you _require_ 1024 colors in order to get the TLB speedup
  Linus> on some hypothetical machine (the same hypothetical machine
  Linus> that might hypothetically run on 95% of all hardware ;)

  Linus>  - the machine is under heavy load, and heavy load is exactly
  Linus> when you want this optimization to trigger.

Your point about wanting databases have access to giant pages even
under memory pressure is a good one.  I had not considered that
before.  However, what we really are talking about then is a security
or resource policy as to who gets to allocate from a reserved and
pinned pool of giant physical pages.  You don't need separate system
calls for that: with a transparent superpage framework and a
privileged & reserved giant-page pool, it's trivial to set up things
such that your favorite data base will always be able to get the giant
pages (and hence the giant TLB mappings) it wants.  The only thing you
lose in the transparent case is control over _which_ pages need to use
the pinned giant pages.  I can certainly imagine cases where this
would be an issue, but I kind of doubt it would be an issue for
databases.

As Dave Miller justly pointed out, it's stupid for a task not to ask
for giant pages for anonymous memory.  The only reason this is not a
smart thing overall is that globally it's not optimal (it is optimal
only locally, from the task's point of view).  So if the only barrier
to getting the giant pinned pages is needing to know about the new
system calls, I'll predict that very soon we'll have EVERY task in the
system allocating such pages (and LD_PRELOAD tricks make that pretty
much trivial).  Then we're back to square one, because the favorite
database may not even be able to start up, because all the "reserved"
memory is already used up by the other tasks.

Clearly there needs to be some additional policies in effect, no
matter what the implementation is (the normal VM policies don't work,
because, by definition, the pinned giant pages are not pageable).

In my opinion, the primary benefit of the separate syscalls is still
ease-of-implementation (which isn't unimportant, of course).

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 18:41             ` Hubertus Franke
@ 2002-08-03 19:39               ` Linus Torvalds
  2002-08-04  0:32                 ` David S. Miller
  2002-08-03 19:41               ` David Mosberger
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-03 19:39 UTC (permalink / raw)
  To: Hubertus Franke
  Cc: davidm, David Mosberger, Gerrit Huizenga, Martin.Bligh, wli,
	Kernel Mailing List



On Sat, 3 Aug 2002, Hubertus Franke wrote:
>
> But I'd like to point out that superpages are there to reduce the number of
> TLB misses by providing larger coverage. Simply providing page coloring
> will not get you there.

Superpages can from a memory allocation angle be seen as a very strict
form of page coloring - the problems are fairly closely related, I think
(superpages are just a lot stricter, in that it's not enough to get "any
page of color X", you have to get just the _right_ page).

Doing superpages will automatically do coloring (while the reverse is
obviously not true). And the way David did coloring a long time ago (if
I remember his implementation correctly) was the same way you'd do
superpages: just do higher order allocations.

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 18:41             ` Hubertus Franke
  2002-08-03 19:39               ` Linus Torvalds
@ 2002-08-03 19:41               ` David Mosberger
  2002-08-03 20:53                 ` Hubertus Franke
  2002-08-04  0:31                 ` David S. Miller
  1 sibling, 2 replies; 110+ messages in thread
From: David Mosberger @ 2002-08-03 19:41 UTC (permalink / raw)
  To: frankeh
  Cc: davidm, David Mosberger, Linus Torvalds, Gerrit Huizenga,
	Martin.Bligh, wli, Kernel Mailing List

>>>>> On Sat, 3 Aug 2002 14:41:29 -0400, Hubertus Franke <frankeh@watson.ibm.com> said:

  Hubertus> But I'd like to point out that superpages are there to
  Hubertus> reduce the number of TLB misses by providing larger
  Hubertus> coverage. Simply providing page coloring will not get you
  Hubertus> there.

Yes, I agree.

It appears that Juan Navarro, the primary author behind the Rice
project, is working on breaking down the superpage benefits they
observed.  That would tell us how much benefit is due to page-coloring
and how much is due to TLB effects.  Here in our lab, we do have some
(weak) empirical evidence that some of the SPECint benchmarks benefit
primarily from page-coloring, but clearly there are others that are
TLB limited.

	--daivd

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 19:30                 ` David Mosberger
@ 2002-08-03 19:43                   ` Linus Torvalds
  2002-08-03 21:18                     ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-03 19:43 UTC (permalink / raw)
  To: davidm
  Cc: David S. Miller, davidm, gh, frankeh, Martin.Bligh, wli, linux-kernel



On Sat, 3 Aug 2002, David Mosberger wrote:
>
> Your point about wanting databases have access to giant pages even
> under memory pressure is a good one.  I had not considered that
> before.  However, what we really are talking about then is a security
> or resource policy as to who gets to allocate from a reserved and
> pinned pool of giant physical pages.

Absolutely. We can't allow just anybody to allocate giant pages, since
they are a scarce resource (set up at boot time in both Ingo's and Intels
patches - with the potential to move things around later with additional
interfaces).

>			  You don't need separate system
> calls for that: with a transparent superpage framework and a
> privileged & reserved giant-page pool, it's trivial to set up things
> such that your favorite data base will always be able to get the giant
> pages (and hence the giant TLB mappings) it wants.  The only thing you
> lose in the transparent case is control over _which_ pages need to use
> the pinned giant pages.  I can certainly imagine cases where this
> would be an issue, but I kind of doubt it would be an issue for
> databases.

That's _probably_ true. There aren't that many allocations that ask for
megabytes of consecutive memory that wouldn't want to do it. However,
there might certainly be non-critical maintenance programs (with the same
privileges as the database program proper) that _do_ do large allocations,
and that we don't want to give large pages to.

Guessing is always bad, especially since the application certainly does
know what it wants.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 19:41               ` David Mosberger
@ 2002-08-03 20:53                 ` Hubertus Franke
  2002-08-03 21:26                   ` David Mosberger
  2002-08-04  0:34                   ` David S. Miller
  2002-08-04  0:31                 ` David S. Miller
  1 sibling, 2 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-03 20:53 UTC (permalink / raw)
  To: davidm, David Mosberger
  Cc: davidm, David Mosberger, Linus Torvalds, Gerrit Huizenga,
	Martin.Bligh, wli, Kernel Mailing List

On Saturday 03 August 2002 03:41 pm, David Mosberger wrote:
> >>>>> On Sat, 3 Aug 2002 14:41:29 -0400, Hubertus Franke
> >>>>> <frankeh@watson.ibm.com> said:
>
>   Hubertus> But I'd like to point out that superpages are there to
>   Hubertus> reduce the number of TLB misses by providing larger
>   Hubertus> coverage. Simply providing page coloring will not get you
>   Hubertus> there.
>
> Yes, I agree.
>
> It appears that Juan Navarro, the primary author behind the Rice
> project, is working on breaking down the superpage benefits they
> observed.  That would tell us how much benefit is due to page-coloring
> and how much is due to TLB effects.  Here in our lab, we do have some
> (weak) empirical evidence that some of the SPECint benchmarks benefit
> primarily from page-coloring, but clearly there are others that are
> TLB limited.
>
> 	--daivd

Cool. 
Does that mean that BSD already has page coloring implemented ?

The agony is: 
Page Coloring helps to reduce cache conflicts in low associative caches
while large pages may reduce TLB overhead.

One shouldn't rule out one for the other, there is a place for both.

How did you arrive to the (weak) empirical evidence?
You checked TLB misses and cache misses and turned
page coloring on and off and large pages on and off?

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 19:43                   ` Linus Torvalds
@ 2002-08-03 21:18                     ` David Mosberger
  2002-08-03 21:54                       ` Hubertus Franke
  0 siblings, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-03 21:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: davidm, David S. Miller, davidm, gh, frankeh, Martin.Bligh, wli,
	linux-kernel

>>>>> On Sat, 3 Aug 2002 12:43:47 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

  >> You don't need separate system calls for that: with a transparent
  >> superpage framework and a privileged & reserved giant-page pool,
  >> it's trivial to set up things such that your favorite data base
  >> will always be able to get the giant pages (and hence the giant
  >> TLB mappings) it wants.  The only thing you lose in the
  >> transparent case is control over _which_ pages need to use the
  >> pinned giant pages.  I can certainly imagine cases where this
  >> would be an issue, but I kind of doubt it would be an issue for
  >> databases.

  Linus> That's _probably_ true. There aren't that many allocations
  Linus> that ask for megabytes of consecutive memory that wouldn't
  Linus> want to do it. However, there might certainly be non-critical
  Linus> maintenance programs (with the same privileges as the
  Linus> database program proper) that _do_ do large allocations, and
  Linus> that we don't want to give large pages to.

  Linus> Guessing is always bad, especially since the application
  Linus> certainly does know what it wants.

Yes, but that applies even to a transparent superpage scheme: in those
instances where an application knows what page size is optimal, it's
better if the application can express that (saves time
promoting/demoting pages needlessly).  It's not unlike madvise() or
the readahead() syscall: use reasonable policies for the ordinary
apps, and provide the means to let the smart apps tell the kernel
exactly what they need.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 20:53                 ` Hubertus Franke
@ 2002-08-03 21:26                   ` David Mosberger
  2002-08-03 21:50                     ` Hubertus Franke
  2002-08-04  0:34                   ` David S. Miller
  1 sibling, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-03 21:26 UTC (permalink / raw)
  To: frankeh
  Cc: davidm, David Mosberger, Linus Torvalds, Gerrit Huizenga,
	Martin.Bligh, wli, Kernel Mailing List

>>>>> On Sat, 3 Aug 2002 16:53:39 -0400, Hubertus Franke <frankeh@watson.ibm.com> said:

  Hubertus> Cool.  Does that mean that BSD already has page coloring
  Hubertus> implemented ?

FreeBSD (at least on Alpha) makes some attempts at page-coloring, but
it's said to be far from perfect.

  Hubertus> The agony is: Page Coloring helps to reduce cache
  Hubertus> conflicts in low associative caches while large pages may
  Hubertus> reduce TLB overhead.

Why agony?  The latter helps the TLB _and_ solves the page coloring
problem (assuming the largest page size is bigger than the largest
cache; yeah, I see that could be a problem on some Power 4
machines... ;-)

  Hubertus> One shouldn't rule out one for the other, there is a place
  Hubertus> for both.

  Hubertus> How did you arrive to the (weak) empirical evidence?  You
  Hubertus> checked TLB misses and cache misses and turned page
  Hubertus> coloring on and off and large pages on and off?

Yes, that's basically what we did (there is a patch implementing a
page coloring kernel module floating around).

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 21:26                   ` David Mosberger
@ 2002-08-03 21:50                     ` Hubertus Franke
  0 siblings, 0 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-03 21:50 UTC (permalink / raw)
  To: davidm, David Mosberger
  Cc: Linus Torvalds, Gerrit Huizenga, Martin.Bligh, wli, Kernel Mailing List

On Saturday 03 August 2002 05:26 pm, David Mosberger wrote:
> >>>>> On Sat, 3 Aug 2002 16:53:39 -0400, Hubertus Franke
> >>>>> <frankeh@watson.ibm.com> said:
>
>   Hubertus> Cool.  Does that mean that BSD already has page coloring
>   Hubertus> implemented ?
>
> FreeBSD (at least on Alpha) makes some attempts at page-coloring, but
> it's said to be far from perfect.
>
>   Hubertus> The agony is: Page Coloring helps to reduce cache
>   Hubertus> conflicts in low associative caches while large pages may
>   Hubertus> reduce TLB overhead.
>
> Why agony?  The latter helps the TLB _and_ solves the page coloring
> problem (assuming the largest page size is bigger than the largest
> cache; yeah, I see that could be a problem on some Power 4
> machines... ;-)
>

In essense, remember page coloring preserves the same bits used
for cache indexing from virtual to physical. If these bits are covered 
by the large page, then ofcourse you will get page coloring for free
otherwise you won't.
Also, page coloring is mainly helpful in low associativity caches.
>From my recollection of the literature, for 4-way or higher its not
worth the trouble.

Just to rephrase:  
- Large pages almost always solve your page coloring problem.
- Page coloring never solves your TLB coverage problem.

>   Hubertus> One shouldn't rule out one for the other, there is a place
>   Hubertus> for both.
>
>   Hubertus> How did you arrive to the (weak) empirical evidence?  You
>   Hubertus> checked TLB misses and cache misses and turned page
>   Hubertus> coloring on and off and large pages on and off?
>
> Yes, that's basically what we did (there is a patch implementing a
> page coloring kernel module floating around).
>
> 	--david

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 21:18                     ` David Mosberger
@ 2002-08-03 21:54                       ` Hubertus Franke
  2002-08-04  0:35                         ` David S. Miller
  0 siblings, 1 reply; 110+ messages in thread
From: Hubertus Franke @ 2002-08-03 21:54 UTC (permalink / raw)
  To: davidm, David Mosberger, Linus Torvalds
  Cc: davidm, David S. Miller, davidm, gh, Martin.Bligh, wli, linux-kernel

On Saturday 03 August 2002 05:18 pm, David Mosberger wrote:
> >>>>> On Sat, 3 Aug 2002 12:43:47 -0700 (PDT), Linus Torvalds
> >>>>> <torvalds@transmeta.com> said:
>   >>
>   >> You don't need separate system calls for that: with a transparent
>   >> superpage framework and a privileged & reserved giant-page pool,
>   >> it's trivial to set up things such that your favorite data base
>   >> will always be able to get the giant pages (and hence the giant
>   >> TLB mappings) it wants.  The only thing you lose in the
>   >> transparent case is control over _which_ pages need to use the
>   >> pinned giant pages.  I can certainly imagine cases where this
>   >> would be an issue, but I kind of doubt it would be an issue for
>   >> databases.
>
>   Linus> That's _probably_ true. There aren't that many allocations
>   Linus> that ask for megabytes of consecutive memory that wouldn't
>   Linus> want to do it. However, there might certainly be non-critical
>   Linus> maintenance programs (with the same privileges as the
>   Linus> database program proper) that _do_ do large allocations, and
>   Linus> that we don't want to give large pages to.
>
>   Linus> Guessing is always bad, especially since the application
>   Linus> certainly does know what it wants.
>
> Yes, but that applies even to a transparent superpage scheme: in those
> instances where an application knows what page size is optimal, it's
> better if the application can express that (saves time
> promoting/demoting pages needlessly).  It's not unlike madvise() or
> the readahead() syscall: use reasonable policies for the ordinary
> apps, and provide the means to let the smart apps tell the kernel
> exactly what they need.
>
> 	--david

So that's what is/can-be done through the madvice call or a flag on MMAP().
Force a specific size and policy. Why do you need a new system call.

The Rice paper solved this reasonably elegant. Reservation and check 
after a while. If you didn't use reserved memory, you loose it, this is the 
auto promotion/demotion.

For special apps one provides the interface using madvice().
-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 17:35               ` Linus Torvalds
  2002-08-03 19:30                 ` David Mosberger
@ 2002-08-04  0:28                 ` David S. Miller
  2002-08-04 17:31                   ` Hubertus Franke
  1 sibling, 1 reply; 110+ messages in thread
From: David S. Miller @ 2002-08-04  0:28 UTC (permalink / raw)
  To: torvalds; +Cc: davidm, davidm, gh, frankeh, Martin.Bligh, wli, linux-kernel

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Sat, 3 Aug 2002 10:35:00 -0700 (PDT)

   David, you did page coloring once.
   
   I bet your patches worked reasonably well to color into 4 or 8 colors.
   
   How well do you think something like your old patches would work if
   
    - you _require_ 1024 colors in order to get the TLB speedup on some
      hypothetical machine (the same hypothetical machine that might
      hypothetically run on 95% of all hardware ;)
   
    - the machine is under heavy load, and heavy load is exactly when you
      want this optimization to trigger.
   
   Can you explain this difficulty to people?
   
Actually, we need some clarification here.  I tried coloring several
times, the problem with my diffs is that I tried to do the coloring
all the time no matter what.

I wanted strict coloring on the 2-color level for broken L1 caches
that have aliasing problems.  If I could make this work, all of the
dumb cache flushing I have to do on Sparcs could be deleted.  Because
of this, I couldn't legitimately change the cache flushing rules
unless I had absolutely strict coloring done on all pages where it
mattered (basically anything that could end up in the user's address
space).

So I kept track of color existence precisely in the page lists.  The
implementation was fast, but things got really bad fragmentation wise.

No matter how I tweaked things, just running a kernel build 40 or 50
times would fragment the free page lists to shreds such that 2-order
and up pages simply did not exist.

Another person did an implementation of coloring which basically
worked by allocating a big-order chunk and slicing that up.  It's not
strictly done and that is why his version works better.  In fact I
like that patch a lot and it worked quite well for L2 coloring on
sparc64.  Any time there is page pressure, he tosses away all of the
color carving big-order pages.

   I think we can at some point do the small cases completely transparently,
   with no need for a new system call, and not even any new hint flags. We'll
   just silently do 4/8-page superpages and be done with it. Programs don't
   need to know about it to take advantage of better TLB usage.
   
Ok.  I think even 64-page ones are viable to attempt but we'll see.
Most TLB's that do superpages seem to have a range from the base
page size to the largest supported superpage with 2-powers of two
being incrememnted between each supported size.

For example on Sparc64 this is:

8K	PAGE_SIZE
64K	PAGE_SIZE * 8
512K	PAGE_SIZE * 64
4M	PAGE_SIZE * 512

One of the transparent large page implementations just defined a
small array that the core code used to try and see "hey how big
a superpage can we try" and if the largest for the area failed
(because page orders that large weren't available) it would simply
fall back to the next smallest superpage size.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 19:41               ` David Mosberger
  2002-08-03 20:53                 ` Hubertus Franke
@ 2002-08-04  0:31                 ` David S. Miller
  2002-08-04 17:25                   ` Hubertus Franke
  1 sibling, 1 reply; 110+ messages in thread
From: David S. Miller @ 2002-08-04  0:31 UTC (permalink / raw)
  To: davidm, davidm; +Cc: frankeh, torvalds, gh, Martin.Bligh, wli, linux-kernel

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Sat, 3 Aug 2002 12:41:33 -0700

   It appears that Juan Navarro, the primary author behind the Rice
   project, is working on breaking down the superpage benefits they
   observed.  That would tell us how much benefit is due to page-coloring
   and how much is due to TLB effects.  Here in our lab, we do have some
   (weak) empirical evidence that some of the SPECint benchmarks benefit
   primarily from page-coloring, but clearly there are others that are
   TLB limited.

There was some comparison done between large-page vs. plain
page coloring for a bunch of scientific number crunchers.

Only one benefitted from page coloring and not from TLB
superpage use.

The ones that benefitted from both coloring and superpages, the
superpage gain was about equal to the coloring gain.  Basically,
superpages ended up giving the necessary coloring :-)

Search for the topic "Areas for superpage discussion" in the
sparclinux@vger.kernel.org list archives, it has pointers to
all the patches and test programs involved.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 19:39               ` Linus Torvalds
@ 2002-08-04  0:32                 ` David S. Miller
  0 siblings, 0 replies; 110+ messages in thread
From: David S. Miller @ 2002-08-04  0:32 UTC (permalink / raw)
  To: torvalds; +Cc: frankeh, davidm, davidm, gh, Martin.Bligh, wli, linux-kernel

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Sat, 3 Aug 2002 12:39:40 -0700 (PDT)
   
   And the way David did coloring a long time ago (if
   I remember his implementation correctly) was the same way you'd do
   superpages: just do higher order allocations.
   
Although it wasn't my implementation which did this,
one of them did do it this way.  I agree that it is
the nicest way to do coloring.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 20:53                 ` Hubertus Franke
  2002-08-03 21:26                   ` David Mosberger
@ 2002-08-04  0:34                   ` David S. Miller
  1 sibling, 0 replies; 110+ messages in thread
From: David S. Miller @ 2002-08-04  0:34 UTC (permalink / raw)
  To: frankeh; +Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

   From: Hubertus Franke <frankeh@watson.ibm.com>
   Date: Sat, 3 Aug 2002 16:53:39 -0400

   Does that mean that BSD already has page coloring implemented ?
   
FreeBSD has had page coloring for quite some time.

Because they don't use buddy lists and don't allow higher-order
allocations fundamentally in the page allocator, they don't have
to deal with all the buddy fragmentation issues we do.

On the other hand, since higher-order page allocations are not
a fundamental operation it might be more difficult for FreeBSD
to implement superpage support efficiently like we can with
the buddy lists.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-03 21:54                       ` Hubertus Franke
@ 2002-08-04  0:35                         ` David S. Miller
  2002-08-04  2:25                           ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: David S. Miller @ 2002-08-04  0:35 UTC (permalink / raw)
  To: frankeh; +Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

   From: Hubertus Franke <frankeh@watson.ibm.com>
   Date: Sat, 3 Aug 2002 17:54:30 -0400
   
   The Rice paper solved this reasonably elegant. Reservation and check 
   after a while. If you didn't use reserved memory, you loose it, this is the 
   auto promotion/demotion.

I keep seeing this Rice stuff being mentioned over and over,
can someone post a URL pointer to this work?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04  0:35                         ` David S. Miller
@ 2002-08-04  2:25                           ` David Mosberger
  2002-08-04 17:19                             ` Hubertus Franke
  0 siblings, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-04  2:25 UTC (permalink / raw)
  To: David S. Miller
  Cc: frankeh, davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

>>>>> On Sat, 03 Aug 2002 17:35:30 -0700 (PDT), "David S. Miller" <davem@redhat.com> said:

  DaveM>    From: Hubertus Franke <frankeh@watson.ibm.com> Date: Sat,
  DaveM> 3 Aug 2002 17:54:30 -0400

  DaveM>    The Rice paper solved this reasonably elegant. Reservation
  DaveM> and check after a while. If you didn't use reserved memory,
  DaveM> you loose it, this is the auto promotion/demotion.

  DaveM> I keep seeing this Rice stuff being mentioned over and over,
  DaveM> can someone post a URL pointer to this work?

Sure thing.  It's the first link under "Publications" at this URL:

	http://www.cs.rice.edu/~jnavarro/

  --david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04  2:25                           ` David Mosberger
@ 2002-08-04 17:19                             ` Hubertus Franke
  2002-08-09 15:20                               ` Daniel Phillips
  0 siblings, 1 reply; 110+ messages in thread
From: Hubertus Franke @ 2002-08-04 17:19 UTC (permalink / raw)
  To: davidm, David Mosberger, David S. Miller
  Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

On Saturday 03 August 2002 10:25 pm, David Mosberger wrote:
> >>>>> On Sat, 03 Aug 2002 17:35:30 -0700 (PDT), "David S. Miller"
> >>>>> <davem@redhat.com> said:
>
>   DaveM>    From: Hubertus Franke <frankeh@watson.ibm.com> Date: Sat,
>   DaveM> 3 Aug 2002 17:54:30 -0400
>
>   DaveM>    The Rice paper solved this reasonably elegant. Reservation
>   DaveM> and check after a while. If you didn't use reserved memory,
>   DaveM> you loose it, this is the auto promotion/demotion.
>
>   DaveM> I keep seeing this Rice stuff being mentioned over and over,
>   DaveM> can someone post a URL pointer to this work?
>
> Sure thing.  It's the first link under "Publications" at this URL:
>
> 	http://www.cs.rice.edu/~jnavarro/
>
>   --david

Also in this context:

"Implemenation of Multiple Pagesize Support in HP-UX"
http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/subramanian/subramanian.pdf

"General Purpose Operating System Support for Multiple Page Sizes"
htpp://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04  0:31                 ` David S. Miller
@ 2002-08-04 17:25                   ` Hubertus Franke
  0 siblings, 0 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-04 17:25 UTC (permalink / raw)
  To: David S. Miller, davidm, davidm
  Cc: torvalds, gh, Martin.Bligh, wli, linux-kernel

On Saturday 03 August 2002 08:31 pm, David S. Miller wrote:
>    From: David Mosberger <davidm@napali.hpl.hp.com>
>    Date: Sat, 3 Aug 2002 12:41:33 -0700
>
>    It appears that Juan Navarro, the primary author behind the Rice
>    project, is working on breaking down the superpage benefits they
>    observed.  That would tell us how much benefit is due to page-coloring
>    and how much is due to TLB effects.  Here in our lab, we do have some
>    (weak) empirical evidence that some of the SPECint benchmarks benefit
>    primarily from page-coloring, but clearly there are others that are
>    TLB limited.
>
> There was some comparison done between large-page vs. plain
> page coloring for a bunch of scientific number crunchers.
>
> Only one benefitted from page coloring and not from TLB
> superpage use.
>

I would expect that from scientific apps, which often go through their
dataset in a fairy regular pattern. If sequential, then page coloring
is at its best, because your cache can become the limiting factor, if
you can't squeeze data into the cache due to false sharing in the same
cache class.

The way I see page coloring is that any hard work done in virtual space
(either by compiler or by app writer [ latter holds for numerical apps ])
to be cache friendly, is not circumvented by a <stupid> physical page 
assignment by the OS that leads to less than complete cache utilization.
That's why the cache index bits from the address are carried over or
are kept the same in virtual and physical address. That's the purpose of
page coloring.....

This regular access pattern is not necessarily true in apps like JVM or other 
object oriented code where data accesses can be less predictive. There page 
coloring might not help you at all.

> The ones that benefitted from both coloring and superpages, the
> superpage gain was about equal to the coloring gain.  Basically,
> superpages ended up giving the necessary coloring :-)
>
> Search for the topic "Areas for superpage discussion" in the
> sparclinux@vger.kernel.org list archives, it has pointers to
> all the patches and test programs involved.


-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04  0:28                 ` David S. Miller
@ 2002-08-04 17:31                   ` Hubertus Franke
  2002-08-04 18:38                     ` Linus Torvalds
  2002-08-05  5:40                     ` David S. Miller
  0 siblings, 2 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-04 17:31 UTC (permalink / raw)
  To: David S. Miller, torvalds
  Cc: davidm, davidm, gh, Martin.Bligh, wli, linux-kernel

On Saturday 03 August 2002 08:28 pm, David S. Miller wrote:
>    From: Linus Torvalds <torvalds@transmeta.com>
>    Date: Sat, 3 Aug 2002 10:35:00 -0700 (PDT)
>
>    David, you did page coloring once.
>
>    I bet your patches worked reasonably well to color into 4 or 8 colors.
>
>    How well do you think something like your old patches would work if
>
>     - you _require_ 1024 colors in order to get the TLB speedup on some
>       hypothetical machine (the same hypothetical machine that might
>       hypothetically run on 95% of all hardware ;)
>
>     - the machine is under heavy load, and heavy load is exactly when you
>       want this optimization to trigger.
>
>    Can you explain this difficulty to people?
>
> Actually, we need some clarification here.  I tried coloring several
> times, the problem with my diffs is that I tried to do the coloring
> all the time no matter what.
>
> I wanted strict coloring on the 2-color level for broken L1 caches
> that have aliasing problems.  If I could make this work, all of the
> dumb cache flushing I have to do on Sparcs could be deleted.  Because
> of this, I couldn't legitimately change the cache flushing rules
> unless I had absolutely strict coloring done on all pages where it
> mattered (basically anything that could end up in the user's address
> space).
>
> So I kept track of color existence precisely in the page lists.  The
> implementation was fast, but things got really bad fragmentation wise.
>
> No matter how I tweaked things, just running a kernel build 40 or 50
> times would fragment the free page lists to shreds such that 2-order
> and up pages simply did not exist.
>
> Another person did an implementation of coloring which basically
> worked by allocating a big-order chunk and slicing that up.  It's not
> strictly done and that is why his version works better.  In fact I
> like that patch a lot and it worked quite well for L2 coloring on
> sparc64.  Any time there is page pressure, he tosses away all of the
> color carving big-order pages.
>
>    I think we can at some point do the small cases completely
> transparently, with no need for a new system call, and not even any new
> hint flags. We'll just silently do 4/8-page superpages and be done with it.
> Programs don't need to know about it to take advantage of better TLB usage.
>
> Ok.  I think even 64-page ones are viable to attempt but we'll see.
> Most TLB's that do superpages seem to have a range from the base
> page size to the largest supported superpage with 2-powers of two
> being incrememnted between each supported size.
>
> For example on Sparc64 this is:
>
> 8K	PAGE_SIZE
> 64K	PAGE_SIZE * 8
> 512K	PAGE_SIZE * 64
> 4M	PAGE_SIZE * 512
>
> One of the transparent large page implementations just defined a
> small array that the core code used to try and see "hey how big
> a superpage can we try" and if the largest for the area failed
> (because page orders that large weren't available) it would simply
> fall back to the next smallest superpage size.


Well, that's exactly what we do !!!!

We also ensure that if one process opens with basic page size and 
the next one opens with super page size that we appropriately map
the second one to smaller pages to avoid conflict in case of shared
memory or memory mapped files.

As of the page coloring !
Can we tweak the buddy allocator to give us this additional functionality?
Seems like we can have a free-list per color and if that's empty we go back to
the buddy sys. There we should be able to do some magic based on the bit maps
to figure out where which page is to be used that fits the right color?

Fragmentation is an issue.

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 17:31                   ` Hubertus Franke
@ 2002-08-04 18:38                     ` Linus Torvalds
  2002-08-04 19:23                       ` Andrew Morton
                                         ` (2 more replies)
  2002-08-05  5:40                     ` David S. Miller
  1 sibling, 3 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-04 18:38 UTC (permalink / raw)
  To: Hubertus Franke
  Cc: David S. Miller, davidm, davidm, gh, Martin.Bligh, wli, linux-kernel


On Sun, 4 Aug 2002, Hubertus Franke wrote:
> 
> As of the page coloring !
> Can we tweak the buddy allocator to give us this additional functionality?

I would really prefer to avoid this, and get "95% coloring" by just doing 
read-ahead with higher-order allocations instead of the current "loop 
allocation of one block".

I bet that you will get _practically_ perfect coloring with just two small 
changes:

 - do_anonymous_page() looks to see if the page tables are empty around 
   the faulting address (and check vma ranges too, of course), and 
   optimistically does a non-blocking order-X allocation.

   If the order-X allocation fails, we're likely low on memory (this is 
   _especially_ true since the very fact that we do lots of order-X
   allocations will probably actually help keep fragementation down
   normally), and we just allocate one page (with a regular GFP_USER this 
   time).

   Map in all pages.

 - do the same for page_cache_readahead() (this, btw, is where radix trees 
   will kick some serious ass - we'd have had a hard time doing the "is
   this range of order-X pages populated" efficiently with the old hashes.

I bet just those fairly small changes will give you effective coloring, 
_and_ they are also what you want for doing small superpages.

And no, I do not want separate coloring support in the allocator. I think 
coloring without superpage support is stupid and worthless (and 
complicates the code for no good reason).

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 18:38                     ` Linus Torvalds
@ 2002-08-04 19:23                       ` Andrew Morton
  2002-08-04 19:28                         ` Linus Torvalds
  2002-08-04 19:30                       ` Hubertus Franke
  2002-08-04 19:41                       ` Rik van Riel
  2 siblings, 1 reply; 110+ messages in thread
From: Andrew Morton @ 2002-08-04 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hubertus Franke, David S. Miller, davidm, davidm, gh,
	Martin.Bligh, wli, linux-kernel

Linus Torvalds wrote:
> 
> On Sun, 4 Aug 2002, Hubertus Franke wrote:
> >
> > As of the page coloring !
> > Can we tweak the buddy allocator to give us this additional functionality?
> 
> I would really prefer to avoid this, and get "95% coloring" by just doing
> read-ahead with higher-order allocations instead of the current "loop
> allocation of one block".
> 
> I bet that you will get _practically_ perfect coloring with just two small
> changes:
> 
>  - do_anonymous_page() looks to see if the page tables are empty around
>    the faulting address (and check vma ranges too, of course), and
>    optimistically does a non-blocking order-X allocation.
> 
>    If the order-X allocation fails, we're likely low on memory (this is
>    _especially_ true since the very fact that we do lots of order-X
>    allocations will probably actually help keep fragementation down
>    normally), and we just allocate one page (with a regular GFP_USER this
>    time).
> 
>    Map in all pages.

This would be a problem for short-lived processes. Because "map in
all pages" also means "zero them out".  And I think that performing
a 4k clear_user_highpage() immediately before returning to userspace
is optimal.  It's essentialy a cache preload for userspace.

If we instead clear out 4 or 8 pages, we trash a ton of cache and
the chances of userspace _using_ pages 1-7 in the short-term are
lower.   We could clear the pages with 7,6,5,4,3,2,1,0 ordering,
but the cache implications of faultahead are still there.

Could we establish the eight pte's but still arrange for pages 1-7
to trap, so the kernel can zero the out at the latest possible time?


>  - do the same for page_cache_readahead() (this, btw, is where radix trees
>    will kick some serious ass - we'd have had a hard time doing the "is
>    this range of order-X pages populated" efficiently with the old hashes.
> 

On the nopage path, yes.  That memory is cache-cold anyway.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 19:23                       ` Andrew Morton
@ 2002-08-04 19:28                         ` Linus Torvalds
  2002-08-05  5:42                           ` David S. Miller
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-04 19:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hubertus Franke, David S. Miller, davidm, davidm, gh,
	Martin.Bligh, wli, linux-kernel


On Sun, 4 Aug 2002, Andrew Morton wrote:
> 
> Could we establish the eight pte's but still arrange for pages 1-7
> to trap, so the kernel can zero the out at the latest possible time?

You could do that by marking the pages as being there, but PROT_NONE.

On the other hand, cutting down the number of initial pagefaults (by _not_ 
doing what you suggest) migth be a bigger speedup for process startup than 
the slowdown from occasionally doing unnecessary work.

I suspect that there is some non-zero order-X (probably 2 or 3), where you 
just win more than you lose. Even for small programs. 

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 18:38                     ` Linus Torvalds
  2002-08-04 19:23                       ` Andrew Morton
@ 2002-08-04 19:30                       ` Hubertus Franke
  2002-08-04 20:23                         ` William Lee Irwin III
  2002-08-05 16:59                         ` David Mosberger
  2002-08-04 19:41                       ` Rik van Riel
  2 siblings, 2 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-04 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David S. Miller, davidm, davidm, gh, Martin.Bligh, wli, linux-kernel

On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
> On Sun, 4 Aug 2002, Hubertus Franke wrote:
> > As of the page coloring !
> > Can we tweak the buddy allocator to give us this additional
> > functionality?
>
> I would really prefer to avoid this, and get "95% coloring" by just doing
> read-ahead with higher-order allocations instead of the current "loop
> allocation of one block".
>
Yes, if we (correctly) assume that page coloring only buys you significant 
benefits for small associative caches (e.g. <4 or <= 8).

> I bet that you will get _practically_ perfect coloring with just two small
> changes:
>
>  - do_anonymous_page() looks to see if the page tables are empty around
>    the faulting address (and check vma ranges too, of course), and
>    optimistically does a non-blocking order-X allocation.
>
As long as the alignments are observed, which you I guess imply by the range.

>    If the order-X allocation fails, we're likely low on memory (this is
>    _especially_ true since the very fact that we do lots of order-X
>    allocations will probably actually help keep fragementation down
>    normally), and we just allocate one page (with a regular GFP_USER this
>    time).
>
Correct.

>    Map in all pages.
>
>  - do the same for page_cache_readahead() (this, btw, is where radix trees
>    will kick some serious ass - we'd have had a hard time doing the "is
>    this range of order-X pages populated" efficiently with the old hashes.
>

Hey, we use the radix tree to track page cache mappings for large pages
particularly for this reason...

> I bet just those fairly small changes will give you effective coloring,
> _and_ they are also what you want for doing small superpages.
>

Well, in what you described above there is no concept of superpages
the way it is defined for the purpose of <tracking> and <TLB overhead 
reduction>. 
If you don't know about super pages at the VM level, then you need to
deal with them at TLB fault level to actually create the <large TLB> 
entry. That what the INTC patch will do, namely throughing all the 
complexity over the fence for the page fault.
In your case not keeping track of the super pages in the 
VM layer and PT layer requires to discover the large page at soft TLB 
time by scanning PT proximity for contigous pages if we are talking now 
about the read_ahead ....
In our case, we store the same physical address of the super page 
in the PTEs spanning the superpage together with the page order.
At software TLB time we simply extra the single PTE from the PT based
on the faulting address and move it into the TLB. This ofcourse works only
for software TLBs (PwrPC, MIPS, IA64). For HW TLB (x86) the PT structure
by definition overlaps the large page size support.
The HW TLB case can be extended to not store the same PA in all the PTEs,
but conceptually carry the superpage concept for the purpose described above.

We have that concept exactly the way you want it, but the dress code 
seems to be wrong. That can be worked on.
Our goal was in the long run 2.7 to explore the Rice approach to see
whether it yields benefits or whether we getting down the road of 
fragmentation reduction overhead that will kill all the benefits we get
from reduced TLB overhead. Time would tell.

But to go down this route we need the concept of a superpage in the VM,
not just at TLB time or a hack that throws these things over the fence. 


> And no, I do not want separate coloring support in the allocator. I think
> coloring without superpage support is stupid and worthless (and
> complicates the code for no good reason).
>
> 		Linus

That <stupid> seems premature. You are mixing the concept of 
superpage from a TLB miss reduction perspective 
with the concept of superpage for page coloring. 

In a low associative cache (<=4) you have a larger number of colors (~100s)
To be reasonable effective you need to provide these large 
number of colors, that could be quite a waste of memory if you do it 
only through super pages.
On the otherhand, if you simply try to get a page from a targeted class X
you can solve this problem one page at a time. This still makes sense.
Last you can move these two approaches together by providing small
conceptual super pages (nothing or not necessarily anything to do with your 
TLB at this point) and provide a smaller number of classes from where 
superpages will be allocated. I hope you meant the latter one when
referring to <stupid>.
Eitherway, you need the concept of a superpage IMHO in the VM to 
support all this stuff. 

And we got just the right stuff for you :-).
Again the final dress code and capabilities are still up for discussion.

Bill Irwin and I are working on moving Simon's 2.4.18 patch up to 2.5.30.
Clean up some of the stuff and make sure that the integration with the latest
radix tree and writeback functionality is proper.
There aren't that many major changes. We hope to have something for
review soon.

Cheers. 
-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 18:38                     ` Linus Torvalds
  2002-08-04 19:23                       ` Andrew Morton
  2002-08-04 19:30                       ` Hubertus Franke
@ 2002-08-04 19:41                       ` Rik van Riel
  2 siblings, 0 replies; 110+ messages in thread
From: Rik van Riel @ 2002-08-04 19:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hubertus Franke, David S. Miller, davidm, davidm, gh,
	Martin.Bligh, William Lee Irwin III, linux-kernel

On Sun, 4 Aug 2002, Linus Torvalds wrote:
> On Sun, 4 Aug 2002, Hubertus Franke wrote:
> >
> > As of the page coloring !
> > Can we tweak the buddy allocator to give us this additional functionality?
>
> I would really prefer to avoid this, and get "95% coloring" by just doing
> read-ahead with higher-order allocations instead of the current "loop
> allocation of one block".

OK, now I'm really going to start on some code to try and free
physically contiguous pages when a higher-order allocation comes
in ;)

(well, after this hamradio rpm I started)

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 19:30                       ` Hubertus Franke
@ 2002-08-04 20:23                         ` William Lee Irwin III
  2002-08-05 16:59                         ` David Mosberger
  1 sibling, 0 replies; 110+ messages in thread
From: William Lee Irwin III @ 2002-08-04 20:23 UTC (permalink / raw)
  To: Hubertus Franke
  Cc: Linus Torvalds, David S. Miller, davidm, davidm, gh,
	Martin.Bligh, linux-kernel

On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> As long as the alignments are observed, which you I guess imply by the range.

On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
>>    If the order-X allocation fails, we're likely low on memory (this is
>>    _especially_ true since the very fact that we do lots of order-X
>>    allocations will probably actually help keep fragementation down
>>    normally), and we just allocate one page (with a regular GFP_USER this
>>    time).

Later on I can redo one of the various online defragmentation things
that went around last October or so if it would help with this.


On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
>>    Map in all pages.
>>  - do the same for page_cache_readahead() (this, btw, is where radix trees
>>    will kick some serious ass - we'd have had a hard time doing the "is
>>    this range of order-X pages populated" efficiently with the old hashes.

On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> Hey, we use the radix tree to track page cache mappings for large pages
> particularly for this reason...

Proportion of radix tree populated beneath a given node can be computed
by means of traversals adding up ->count or by incrementally maintaining
a secondary counter for ancestors within the radix tree node. I can look
into this when I go over the path compression heuristics, which would
help the space consumption for access patterns fooling the current one.
Getting physical contiguity out of that is another matter, but the code
can be used for other things (e.g. exec()-time prefaulting) until that's
worked out, and it's not a focus or requirement of this code anyway.


On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
>> I bet just those fairly small changes will give you effective coloring,
>> _and_ they are also what you want for doing small superpages.

On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> The HW TLB case can be extended to not store the same PA in all the PTEs,
> but conceptually carry the superpage concept for the purpose described above.

Pagetable walking gets a tiny hook, not much interesting goes on there.
A specialized wrapper for extracting physical pfn's from the pmd's like
the one for testing whether they're terminal nodes might look more
polished, but that's mostly cosmetic.

Hmm, from looking at the "small" vs. "large" page bits, I have an
inkling this may be relative to the machine size. 256GB boxen will
probably think of 4MB pages as small.


On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> But to go down this route we need the concept of a superpage in the VM,
> not just at TLB time or a hack that throws these things over the fence. 

The bit throwing it over the fence is probably still useful, as Oracle
knows what it's doing and I suspect it's largely to dodge pagetable
space consumption OOM'ing machines as opposed to optimizing anything.
It pretty much wants the kernel out of the way aside from as a big bag
of device drivers, so I'm not surprised they're more than happy to have
the MMU in their hands too. The more I think about it, the less related
to superpages it seems. The motive for superpages is 100% TLB, not a
workaround for pagetable OOM.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 17:31                   ` Hubertus Franke
  2002-08-04 18:38                     ` Linus Torvalds
@ 2002-08-05  5:40                     ` David S. Miller
  1 sibling, 0 replies; 110+ messages in thread
From: David S. Miller @ 2002-08-05  5:40 UTC (permalink / raw)
  To: frankeh; +Cc: torvalds, davidm, davidm, gh, Martin.Bligh, wli, linux-kernel

   From: Hubertus Franke <frankeh@watson.ibm.com>
   Date: Sun, 4 Aug 2002 13:31:24 -0400
   
   Can we tweak the buddy allocator to give us this additional functionality?

Absolutely not, it's a total lose.

I have tried at least 5 times to make it work without fragmenting the
buddy lists to shit.  I channege you to code one up that works without
fragmenting things to shreds.  Just run an endless kernel build over
and over in a loop for a few hours to a day.  If the buddy lists are
not fragmented after these runs, then you have succeeded in my
challenge.

Do not even reply to this email without meeting the challenge as it
will fall on deaf ears.  I've been there and I've done that, and at
this point code talks bullshit walks when it comes to trying to
colorize the buddy allocator in a way that actually works and isn't
disgusting.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 19:28                         ` Linus Torvalds
@ 2002-08-05  5:42                           ` David S. Miller
  0 siblings, 0 replies; 110+ messages in thread
From: David S. Miller @ 2002-08-05  5:42 UTC (permalink / raw)
  To: torvalds
  Cc: akpm, frankeh, davidm, davidm, gh, Martin.Bligh, wli, linux-kernel

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Sun, 4 Aug 2002 12:28:54 -0700 (PDT)
   
   I suspect that there is some non-zero order-X (probably 2 or 3), where you 
   just win more than you lose. Even for small programs. 

Furthermore it would obviously help to enhance the clear_user_page()
interface to handle multiple pages because that would nullify the
startup/finish overhead of the copy loop.  (read as: things like TLB
loads and FPU save/restore on some platforms)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 19:30                       ` Hubertus Franke
  2002-08-04 20:23                         ` William Lee Irwin III
@ 2002-08-05 16:59                         ` David Mosberger
  2002-08-05 17:21                           ` Hubertus Franke
  1 sibling, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-05 16:59 UTC (permalink / raw)
  To: frankeh
  Cc: Linus Torvalds, David S. Miller, davidm, gh, Martin.Bligh, wli,
	linux-kernel

>>>>> On Sun, 4 Aug 2002 15:30:24 -0400, Hubertus Franke <frankeh@watson.ibm.com> said:

  Hubertus> Yes, if we (correctly) assume that page coloring only buys
  Hubertus> you significant benefits for small associative caches
  Hubertus> (e.g. <4 or <= 8).

This seems to be a popular misconception.  Yes, page-coloring
obviously plays no role as long as your cache no bigger than
PAGE_SIZE*ASSOCIATIVITY.  IIRC, Xeon can have up to 1MB of cache and I
bet that it doesn't have a 1MB/4KB=256-way associative cache.  Thus,
I'm quite confident that it's possible to observe significant
page-coloring effects even on a Xeon.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-05 16:59                         ` David Mosberger
@ 2002-08-05 17:21                           ` Hubertus Franke
  2002-08-05 21:10                             ` Jamie Lokier
  0 siblings, 1 reply; 110+ messages in thread
From: Hubertus Franke @ 2002-08-05 17:21 UTC (permalink / raw)
  To: davidm, David Mosberger
  Cc: Linus Torvalds, David S. Miller, davidm, gh, Martin.Bligh, wli,
	linux-kernel

On Monday 05 August 2002 12:59 pm, David Mosberger wrote:
> >>>>> On Sun, 4 Aug 2002 15:30:24 -0400, Hubertus Franke
> >>>>> <frankeh@watson.ibm.com> said:
>
>   Hubertus> Yes, if we (correctly) assume that page coloring only buys
>   Hubertus> you significant benefits for small associative caches
>   Hubertus> (e.g. <4 or <= 8).
>
> This seems to be a popular misconception.  Yes, page-coloring
> obviously plays no role as long as your cache no bigger than
> PAGE_SIZE*ASSOCIATIVITY.  IIRC, Xeon can have up to 1MB of cache and I
> bet that it doesn't have a 1MB/4KB=256-way associative cache.  Thus,
> I'm quite confident that it's possible to observe significant
> page-coloring effects even on a Xeon.
>
> 	--david

The wording was "significant" benefits.
The point is/was that as your associativity goes up, the likelihood of 
full cache occupancy increases, with cache thrashing in each class decreasing.
Would have to dig through the literature to figure out at what point 
the benefits are insignificant (<1 %) wrt page coloring.

I am probably missing something in your argument?
How is the Xeon cache indexed (bits), what's the cache line size ?
My assumptions are as follows. 

Take the bits of an address to be two different bit assignments.

< PG , PGOFS >  with PG=V,X  and PGOFS=<Y,Z>  =>   < <V, X>, Y, Z >
where Z is the cacheline size,   
<X,Y> is used to index the cache (that is not strictly required to be 
contiguous, but apparently many arch do it that way).
Page coloring should guarantee that X remains the same in the virtual and the 
physical address assigned to it.
As your associativity goes up, your number of rows (colors) in the cache comes 
down !!

We can take this offline as to not bother the rest, your call. Just interested 
in flushing out the arguments.

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-05 17:21                           ` Hubertus Franke
@ 2002-08-05 21:10                             ` Jamie Lokier
  0 siblings, 0 replies; 110+ messages in thread
From: Jamie Lokier @ 2002-08-05 21:10 UTC (permalink / raw)
  To: Hubertus Franke
  Cc: davidm, David Mosberger, Linus Torvalds, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel

Hubertus Franke wrote:
> The wording was "significant" benefits.  The point is/was that as your
> associativity goes up, the likelihood of full cache occupancy
> increases, with cache thrashing in each class decreasing.
> Would have to dig through the literature to figure out at what point 
> the benefits are insignificant (<1 %) wrt page coloring.

One of the benefits of page colouring may be that a program's run time
may be expected to vary less from run to run?

In the old days (6 years ago), I found that a video game I was working
on would vary in its peak frame rate by about 3-5% (I don't recall
exactly).  Once the program was started, it would remain operating at
the peak frame rate it had selected, and killing and restarting the
program didn't often make a difference either.  In DOS, the same program
always ran at a consistent frame rate (higher than Linux as it happens).
The actual number of objects executing in the program, and the amount of
memory allocated, were deterministic in these tests.

This is pointing at a cache colouring issue to me -- although quite
which cache I am not sure.  I suppose it could have been something to do
with Linux' VM page scanner access patterns into the page array instead.

-- Jamie

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 17:19                             ` Hubertus Franke
@ 2002-08-09 15:20                               ` Daniel Phillips
  2002-08-09 15:56                                 ` Linus Torvalds
                                                   ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Daniel Phillips @ 2002-08-09 15:20 UTC (permalink / raw)
  To: frankeh, davidm, David Mosberger, David S. Miller
  Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> "General Purpose Operating System Support for Multiple Page Sizes"
> htpp://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf

This reference describes roughly what I had in mind for active 
defragmentation, which depends on reverse mapping.  The main additional
wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
which means the caller promises not to pin the allocation unit for long
periods and does not mind if the underlying physical page changes
spontaneously.  Defragmenting in this zone is straightforward.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 15:20                               ` Daniel Phillips
@ 2002-08-09 15:56                                 ` Linus Torvalds
  2002-08-09 16:15                                   ` Daniel Phillips
                                                     ` (2 more replies)
  2002-08-09 18:32                                 ` Hubertus Franke
  2002-08-11 20:30                                 ` Alan Cox
  2 siblings, 3 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-09 15:56 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel


On Fri, 9 Aug 2002, Daniel Phillips wrote:
> 
> This reference describes roughly what I had in mind for active 
> defragmentation, which depends on reverse mapping.

Note that even active defrag won't be able to handle the case where you 
want have lots of big pages, consituting a large percentage of available 
memory.

Not unless you think I am crazy enough to do garbage collection on kernel
data structures (repeat after me: "garbage collection is stupid, slow, bad
for caches, and only for people who cannot count").

Also, I think the jury (ie Andrew) is still out on whether rmap is worth 
it.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 15:56                                 ` Linus Torvalds
@ 2002-08-09 16:15                                   ` Daniel Phillips
  2002-08-09 16:31                                     ` Rik van Riel
  2002-08-09 16:51                                     ` Linus Torvalds
  2002-08-09 16:27                                   ` Rik van Riel
  2002-08-09 21:38                                   ` Andrew Morton
  2 siblings, 2 replies; 110+ messages in thread
From: Daniel Phillips @ 2002-08-09 16:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel

On Friday 09 August 2002 17:56, Linus Torvalds wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > This reference describes roughly what I had in mind for active 
> > defragmentation, which depends on reverse mapping.
> 
> Note that even active defrag won't be able to handle the case where you 
> want have lots of big pages, consituting a large percentage of available 
> memory.

Perhaps I'm missing something, but I don't see why.

> Not unless you think I am crazy enough to do garbage collection on kernel
> data structures (repeat after me: "garbage collection is stupid, slow, bad
> for caches, and only for people who cannot count").

Slab allocations would not have GFP_DEFRAG (I mistakenly wrote GFP_LARGE 
earlier) and so would be allocated outside ZONE_LARGE.

> Also, I think the jury (ie Andrew) is still out on whether rmap is worth 
> it.

Tell me about it.  Well, I feel strongly enough about it to spend the next
week coding yet another pte chain optimization.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 15:56                                 ` Linus Torvalds
  2002-08-09 16:15                                   ` Daniel Phillips
@ 2002-08-09 16:27                                   ` Rik van Riel
  2002-08-09 16:52                                     ` Linus Torvalds
  2002-08-12  9:23                                     ` Helge Hafting
  2002-08-09 21:38                                   ` Andrew Morton
  2 siblings, 2 replies; 110+ messages in thread
From: Rik van Riel @ 2002-08-09 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel

On Fri, 9 Aug 2002, Linus Torvalds wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> >
> > This reference describes roughly what I had in mind for active
> > defragmentation, which depends on reverse mapping.
>
> Note that even active defrag won't be able to handle the case where you
> want have lots of big pages, consituting a large percentage of available
> memory.
>
> Not unless you think I am crazy enough to do garbage collection on kernel
> data structures (repeat after me: "garbage collection is stupid, slow, bad
> for caches, and only for people who cannot count").

It's also necessary if you want to prevent death by physical
memory exhaustion since it's pretty easy to construct workloads
where the page table memory requirement is larger than physical
memory.

OTOH, I also think that it's (probably, almost certainly) not
worth doing active defragmenting for really huge superpages.
This category of garbage collection just gets into the 'rediculous'
class ;)

> Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> it.

One problem we're running into here is that there are absolutely
no tools to measure some of the things rmap is supposed to fix,
like page replacement.

Sure, Craig Kulesa's tests all went faster on rmap than on the
virtual scanning VM, but that's just one application. There doesn't
seem to exist any kind of tool to quantify things like "quality
of page replacement" or even "efficiency of page replacement" ...

I suspect this is true for many pieces of the kernel, no tools
available to measure the benefits of the code, but only tools
to microbenchmark the _overhead_ of the code...

kind regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:15                                   ` Daniel Phillips
@ 2002-08-09 16:31                                     ` Rik van Riel
  2002-08-09 18:08                                       ` Daniel Phillips
  2002-08-09 16:51                                     ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Rik van Riel @ 2002-08-09 16:31 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Linus Torvalds, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

On Fri, 9 Aug 2002, Daniel Phillips wrote:
> On Friday 09 August 2002 17:56, Linus Torvalds wrote:

> > Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> > it.
>
> Tell me about it.  Well, I feel strongly enough about it to spend the
> next week coding yet another pte chain optimization.

Well yes, we've _seen_ that 2.4 -rmap improves system behaviour,
but we don't have any tools to _quantify_ that improvement.

As long as the only measurable thing is the overhead (which may
get close to zero, but will never become zero) the numbers will
continue being against rmap.  Not because of rmap, but just
because the overhead is the only thing being measured ;)

Personally I'll spend some more time just improving the behaviour
of the VM, even if we don't have tools to quantify the improvement.

Somehow there seems to be a lack of meaningful "macrobenchmarks" ;)

(as opposed to microbenchmarks, which can don't always have a
relation to how the performance of the system as a whole will
be influenced by some code change)

kind regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:15                                   ` Daniel Phillips
  2002-08-09 16:31                                     ` Rik van Riel
@ 2002-08-09 16:51                                     ` Linus Torvalds
  2002-08-09 17:11                                       ` Daniel Phillips
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-09 16:51 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel


On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > 
> > Note that even active defrag won't be able to handle the case where you 
> > want have lots of big pages, consituting a large percentage of available 
> > memory.
> 
> Perhaps I'm missing something, but I don't see why.

The statistics are against you. rmap won't help at all with all the other 
kernel allocations, and the dcache/icache is often large, and on big 
machines while there may be tens of thousands of idle entries, there will 
also be hundreds of _non_idle entries that you can't just remove.

> Slab allocations would not have GFP_DEFRAG (I mistakenly wrote GFP_LARGE 
> earlier) and so would be allocated outside ZONE_LARGE.

.. at which poin tyou then get zone balancing problems.

Or we end up with the same kind of special zone that we have _anyway_ in
the current large-page patch, in which case the point of doing this is
what?

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:27                                   ` Rik van Riel
@ 2002-08-09 16:52                                     ` Linus Torvalds
  2002-08-09 17:40                                       ` yodaiken
  2002-08-09 17:46                                       ` Bill Rugolsky Jr.
  2002-08-12  9:23                                     ` Helge Hafting
  1 sibling, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-09 16:52 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel


On Fri, 9 Aug 2002, Rik van Riel wrote:
> 
> > Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> > it.
> 
> One problem we're running into here is that there are absolutely
> no tools to measure some of the things rmap is supposed to fix,
> like page replacement.

Read up on positivism.

"If it can't be measured, it doesn't exist".

The point being that there are things we can measure, and until anything 
else comes around, those are the things that will have to guide us.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:51                                     ` Linus Torvalds
@ 2002-08-09 17:11                                       ` Daniel Phillips
  0 siblings, 0 replies; 110+ messages in thread
From: Daniel Phillips @ 2002-08-09 17:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel

On Friday 09 August 2002 18:51, Linus Torvalds wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > Slab allocations would not have GFP_DEFRAG (I mistakenly wrote GFP_LARGE 
> > earlier) and so would be allocated outside ZONE_LARGE.
> 
> .. at which poin tyou then get zone balancing problems.
> 
> Or we end up with the same kind of special zone that we have _anyway_ in
> the current large-page patch, in which case the point of doing this is
> what?

The current large-page patch doesn't have any kind of defragmentation in the 
special zone and that memory is just not available for other uses.  The thing 
is, when demand for large pages is low the zone should be allowed to fragment.

All of highmem also qualifies as defraggable memory, so certainly on these 
big memory machines we can easily get a majority of memory in large pages.

I don't see a fundamental reason for new zone balancing problems.  The fact 
that balancing has sucked by tradition is not a fundamental reason ;-)

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:52                                     ` Linus Torvalds
@ 2002-08-09 17:40                                       ` yodaiken
  2002-08-09 19:15                                         ` Rik van Riel
  2002-08-09 21:19                                         ` Marcin Dalecki
  2002-08-09 17:46                                       ` Bill Rugolsky Jr.
  1 sibling, 2 replies; 110+ messages in thread
From: yodaiken @ 2002-08-09 17:40 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel, Linus Torvalds

> On Fri, 9 Aug 2002, Rik van Riel wrote:
> One problem we're running into here is that there are absolutely
> no tools to measure some of the things rmap is supposed to fix,
> like page replacement.

But page replacement is a means to an end. One thing tht would be
very interesting to know is how well the basic VM assumptions about
locality work in a Linux server, desktop, and embedded environment.

You have a LRU approximation that is supposed to approximate working
sets that were originally understood and measured on < 1Meg machines
with static libraries, tiny cache,  no GUI and no mmap.

L.T. writes:

> Read up on positivism.

It's been discredited as recursively unsound reasoning.

---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:52                                     ` Linus Torvalds
  2002-08-09 17:40                                       ` yodaiken
@ 2002-08-09 17:46                                       ` Bill Rugolsky Jr.
  1 sibling, 0 replies; 110+ messages in thread
From: Bill Rugolsky Jr. @ 2002-08-09 17:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel

On Fri, Aug 09, 2002 at 09:52:53AM -0700, Linus Torvalds wrote:
> Read up on positivism.

Please don't.  Read Karl Popper instead.
 
> "If it can't be measured, it doesn't exist".
 
The positivist Copenhagen interpretation stifled important areas of
physics for half a century.  There is a distinction to be made between
an explanatory construct (whereby I mean to imply nothing fancy, no
quarks, just a brick), and the evidence that supports that construct
in the form of observable quantities.  It's all there in Popper's work.

> The point being that there are things we can measure, and until anything 
> else comes around, those are the things that will have to guide us.

True, as far as it goes.  Measurement=good, idle-speculation=bad.
 
But it pays to keep in mind that progress is nonlinear.  In 1988, Van
Jabobsen noted (http://www.kohala.com/start/vanj.88jul20.txt):

   (I had one test case that went like
 
       Basic system:    600 KB/s
       add feature A:    520 KB/s
       drop A, add B:    530 KB/s
       add both A & B:    700 KB/s
 
   Obviously, any statement of the form "feature A/B is good/bad"
   is bogus.)  But, in spite of the ambiguity, some of the network
   design folklore I've heard seems to be clearly wrong.
 
Such anomalies abound.

Regards,

   Bill Rugolsky

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:31                                     ` Rik van Riel
@ 2002-08-09 18:08                                       ` Daniel Phillips
  0 siblings, 0 replies; 110+ messages in thread
From: Daniel Phillips @ 2002-08-09 18:08 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

On Friday 09 August 2002 18:31, Rik van Riel wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > On Friday 09 August 2002 17:56, Linus Torvalds wrote:
> 
> > > Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> > > it.
> >
> > Tell me about it.  Well, I feel strongly enough about it to spend the
> > next week coding yet another pte chain optimization.
> 
> Well yes, we've _seen_ that 2.4 -rmap improves system behaviour,
> but we don't have any tools to _quantify_ that improvement.
> 
> As long as the only measurable thing is the overhead (which may
> get close to zero, but will never become zero) the numbers will
> continue being against rmap.  Not because of rmap, but just
> because the overhead is the only thing being measured ;)

You know what to do, instead of moaning about it.  Just code up a test load 
that blatantly favors rmap and post the results.  In effect, that's what 
Andrew's 'doitlots' benchmark does, in the other direction.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 15:20                               ` Daniel Phillips
  2002-08-09 15:56                                 ` Linus Torvalds
@ 2002-08-09 18:32                                 ` Hubertus Franke
  2002-08-09 18:43                                   ` Daniel Phillips
  2002-08-11 20:30                                 ` Alan Cox
  2 siblings, 1 reply; 110+ messages in thread
From: Hubertus Franke @ 2002-08-09 18:32 UTC (permalink / raw)
  To: Daniel Phillips, davidm, David Mosberger, David S. Miller
  Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > "General Purpose Operating System Support for Multiple Page Sizes"
> > htpp://www.usenix.org/publications/library/proceedings/usenix98/full_pape
> >rs/ganapathy/ganapathy.pdf
>
> This reference describes roughly what I had in mind for active
> defragmentation, which depends on reverse mapping.  The main additional
> wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> which means the caller promises not to pin the allocation unit for long
> periods and does not mind if the underlying physical page changes
> spontaneously.  Defragmenting in this zone is straightforward.

I think the objection to that is that in many cases the cost of 
defragmentation is to heavy to be recollectable through TLB miss handling 
alone.
What the above paper does is a reservation protocol with timeouts
which decide that either (a) the reserved mem was used in time and hence
the page is upgraded to a large page OR (b) the reserved mem is not used and
hence unused parts are released. 
It relies on the fact that within the given timeout, most/mamy pages are 
typically referenced.

In our patch we have the ZONE_LARGE into which we allocate the
large page. Currently they are effectively pinned down, but in 2.4.18
we had it backed by the page cache.

My gut feeling right now would be to follow the reservation based scheme, 
but as said its a gut feeling.
Defragmenting to me seems a matter of last resort, Copying pages is expensive.
If you however simply target the superpages for smaller clusters, then its an
option. But at the same time one might contemplate to simply make 
the base page 16K or 32K and page fault time simply map / swap / read / 
writeback the whole cluster.
What studies has been done on this wrt to benefits of such an approach.
I talked to Ted Tso who would really like small super pages for better I/O 
performance...

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 18:32                                 ` Hubertus Franke
@ 2002-08-09 18:43                                   ` Daniel Phillips
  2002-08-09 19:17                                     ` Hubertus Franke
  0 siblings, 1 reply; 110+ messages in thread
From: Daniel Phillips @ 2002-08-09 18:43 UTC (permalink / raw)
  To: frankeh, davidm, David Mosberger, David S. Miller
  Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

On Friday 09 August 2002 20:32, Hubertus Franke wrote:
> On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> > On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > > "General Purpose Operating System Support for Multiple Page Sizes"
> > > htpp://www.usenix.org/publications/library/proceedings/usenix98/full_pape
> > >rs/ganapathy/ganapathy.pdf
> >
> > This reference describes roughly what I had in mind for active
> > defragmentation, which depends on reverse mapping.  The main additional
> > wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> > which means the caller promises not to pin the allocation unit for long
> > periods and does not mind if the underlying physical page changes
> > spontaneously.  Defragmenting in this zone is straightforward.
> 
> I think the objection to that is that in many cases the cost of 
> defragmentation is to heavy to be recollectable through TLB miss handling 
> alone.

You pay the cost only on transition from a load that doesn't use many large
pages to one that does, it is not an ongoing cost.

> [...]
>
> Defragmenting to me seems a matter of last resort, Copying pages is expensive.

It is the only way to ever have a seamless implementation.  Really, I don't
understand this fear of active defragmentation.  Oh well, like davem said,
code talks.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 17:40                                       ` yodaiken
@ 2002-08-09 19:15                                         ` Rik van Riel
  2002-08-09 21:20                                           ` Linus Torvalds
  2002-08-09 21:19                                         ` Marcin Dalecki
  1 sibling, 1 reply; 110+ messages in thread
From: Rik van Riel @ 2002-08-09 19:15 UTC (permalink / raw)
  To: yodaiken
  Cc: Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel, Linus Torvalds

On Fri, 9 Aug 2002 yodaiken@fsmlabs.com wrote:
> > On Fri, 9 Aug 2002, Rik van Riel wrote:
> > One problem we're running into here is that there are absolutely
> > no tools to measure some of the things rmap is supposed to fix,
> > like page replacement.
>
> But page replacement is a means to an end. One thing tht would be
> very interesting to know is how well the basic VM assumptions about
> locality work in a Linux server, desktop, and embedded environment.
>
> You have a LRU approximation that is supposed to approximate working
> sets that were originally understood and measured on < 1Meg machines
> with static libraries, tiny cache,  no GUI and no mmap.

Absolutely, it would be interesting to know this.
However, up to now I haven't seen any programs that
measure this.

In this case we know what we want to measure, know we
want to measure it for all workloads, but don't know
how to do this in a quantifyable way.

> L.T. writes:
>
> > Read up on positivism.
>
> It's been discredited as recursively unsound reasoning.

To further this point, by how much has the security number
of Linux improved as a result of the inclusion of the Linux
Security Module framework ?  ;)

I'm sure even Linus will agree that the security potential
has increased, even though he can't measure or quantify it.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 18:43                                   ` Daniel Phillips
@ 2002-08-09 19:17                                     ` Hubertus Franke
  0 siblings, 0 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-09 19:17 UTC (permalink / raw)
  To: Daniel Phillips, davidm, David Mosberger, David S. Miller
  Cc: davidm, davidm, torvalds, gh, Martin.Bligh, wli, linux-kernel

On Friday 09 August 2002 02:43 pm, Daniel Phillips wrote:
> On Friday 09 August 2002 20:32, Hubertus Franke wrote:
> > On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> > > On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > > > "General Purpose Operating System Support for Multiple Page Sizes"
> > > > htpp://www.usenix.org/publications/library/proceedings/usenix98/full_
> > > >pape rs/ganapathy/ganapathy.pdf
> > >
> > > This reference describes roughly what I had in mind for active
> > > defragmentation, which depends on reverse mapping.  The main additional
> > > wrinkle I'd contemplated is introducing a new ZONE_LARGE, and
> > > GPF_LARGE, which means the caller promises not to pin the allocation
> > > unit for long periods and does not mind if the underlying physical page
> > > changes spontaneously.  Defragmenting in this zone is straightforward.
> >
> > I think the objection to that is that in many cases the cost of
> > defragmentation is to heavy to be recollectable through TLB miss handling
> > alone.
>
> You pay the cost only on transition from a load that doesn't use many large
> pages to one that does, it is not an ongoing cost.
>

Correct. Maybe I misunderstood, when are you doing the coalloction of 
adjacent pages (page-clusters, super pages).
Our intend was to do it at page fault time and breakup only during 
memory pressure. 

> > [...]
> >
> > Defragmenting to me seems a matter of last resort, Copying pages is
> > expensive.
>
> It is the only way to ever have a seamless implementation.  Really, I don't
> understand this fear of active defragmentation.  Oh well, like davem said,
> code talks.

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 17:40                                       ` yodaiken
  2002-08-09 19:15                                         ` Rik van Riel
@ 2002-08-09 21:19                                         ` Marcin Dalecki
  1 sibling, 0 replies; 110+ messages in thread
From: Marcin Dalecki @ 2002-08-09 21:19 UTC (permalink / raw)
  To: yodaiken
  Cc: Rik van Riel, Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel, Linus Torvalds

yodaiken@fsmlabs.com wrote:
>>On Fri, 9 Aug 2002, Rik van Riel wrote:
>>One problem we're running into here is that there are absolutely
>>no tools to measure some of the things rmap is supposed to fix,
>>like page replacement.
> 
> 
> But page replacement is a means to an end. One thing tht would be
> very interesting to know is how well the basic VM assumptions about
> locality work in a Linux server, desktop, and embedded environment.
> 
> You have a LRU approximation that is supposed to approximate working
> sets that were originally understood and measured on < 1Meg machines
> with static libraries, tiny cache,  no GUI and no mmap.
> 
> L.T. writes:
> 
> 
>>Read up on positivism.
> 
> 
> It's been discredited as recursively unsound reasoning.

Well not taking the "axiom of choice" for granted is really
really narrowing what can be reasoned about in a really really not
funny way. It makes it for example very "difficult" to invent real 
numbers. Well apparently recently some guy published a book which is
basically proposing that the world is just a FSA, so we can see again
that this inconvenience appears to be still very compelling to people
who never had to deal with complicated stuff like for example fluid
dynamics and the associated differential equations :-).

But if talking about actual computers, and since those are in esp.
finite, it may very well be possible to get around without it. ;-)


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 19:15                                         ` Rik van Riel
@ 2002-08-09 21:20                                           ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-09 21:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: yodaiken, Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel


On Fri, 9 Aug 2002, Rik van Riel wrote:
> 
> To further this point, by how much has the security number
> of Linux improved as a result of the inclusion of the Linux
> Security Module framework ?  ;)
> 
> I'm sure even Linus will agree that the security potential
> has increased, even though he can't measure or quantify it.

Actually, the security number is irrelevant to me - the "noise index" from
people who think security protocols are interesting is what drove that
patch (and that one is definitely measurable).

This way, the security noise is now in somebody elses court ;)

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 15:56                                 ` Linus Torvalds
  2002-08-09 16:15                                   ` Daniel Phillips
  2002-08-09 16:27                                   ` Rik van Riel
@ 2002-08-09 21:38                                   ` Andrew Morton
  2002-08-10 18:20                                     ` Eric W. Biederman
  2 siblings, 1 reply; 110+ messages in thread
From: Andrew Morton @ 2002-08-09 21:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

Linus Torvalds wrote:
> 
> ...
> Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> it.

The most glaring problem has been the fork/exec/exit overhead.  

Anton had a program which did 10,000 forks and we were looking at
the time it took for them all to exit.  Initial rmap slowed the exitting
by 400%, and we now have that down to 70%.

I've been treating a gcc configure script as the most forky workload
which we're likely to care about.  rmap slowed configure down by 7%
and the work Daniel and I have done has reduced that to 2.8%.

(Not that rmap is the biggest problem for configure:

c013c07c 176      1.93046     __page_add_rmap         
c013c194 225      2.46792     __page_remove_rmap      
c012a274 236      2.58857     free_one_pgd            
c012a7f8 405      4.44225     __constant_c_and_count_memset 
c01055fc 917      10.0581     poll_idle               
c012a6cc 1253     13.7436     __constant_memcpy       

It's that i387 struct copy.)

There don't seem to be any catastrophic failure modes here, and
I expect tests could be concocted against the virtual scan which
_do_ have gross performance problems.

So.  Not great, but OK if the reverse map gives us something back.
And I don't agree that the quality of page replacement is all too
hard to measure.  It's just that nobody has got off their butt
and tried to measure it.

The other worry is the ZONE_NORMAL space consumption of pte_chains.
We've halved that, but it will still make high sharing levels
unfeasible on the big ia32 machines.  We are dependant upon large
pages to solve that problem.  (Resurrection of pte_highmem is in
progress, but it doesn't work yet).

I don't see a sufficient case for reverting rmap at present, and
it's time to move on with other work.  There is nothing in the
queue at present which _requires_ rmap, so if we do hit a
showstopper then going back to a virtual scan will be feasible
for at least the next month.

Two points:

1) It would be most useful to have *some* damn test on the table
   which works better with 2.4-rmap, along with a believable
   description of why it's better.

2) If would be most irritating to reach 2.6.5 before discovering
   that there is some terrible resource consumption problem
   arising from the reverse map.  Now is a good time for people
   with large machines to be testing 2.5, please.  This is 
   happening, and I expect we'll be in better shape in a month
   or so.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 21:38                                   ` Andrew Morton
@ 2002-08-10 18:20                                     ` Eric W. Biederman
  2002-08-10 18:59                                       ` Daniel Phillips
  2002-08-10 19:55                                       ` Rik van Riel
  0 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2002-08-10 18:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Daniel Phillips, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh, wli,
	linux-kernel

Andrew Morton <akpm@zip.com.au> writes:
> 
> The other worry is the ZONE_NORMAL space consumption of pte_chains.
> We've halved that, but it will still make high sharing levels
> unfeasible on the big ia32 machines.  We are dependant upon large
> pages to solve that problem.  (Resurrection of pte_highmem is in
> progress, but it doesn't work yet).

There is a second method to address this.  Pages can be swapped out
of the page tables and still remain in the page cache, the virtual
scan does this all of the time.  This should allow for arbitrary
amounts of sharing.  There is some overhead, in faulting the pages
back in but it is much better than cases that do not work.  A simple
implementation would have a maximum pte_chain length.

For any page that is not backed by anonymous memory we do not need to
keep the pte entries after the page has been swapped of the page
table.  Which should show a reduction in page table size.  In a highly
shared setting with anonymous pages it is likely worth it to promote
those pages to being posix shared memory.

All of the above should allow us to keep a limit on the amount of
resources that go towards sharing, reducing the need for something
like pte_highmem, and keeping memory pressure down in general.

For the cases you describe I have trouble seeing pte_highmem as
anything other than a performance optimization.  Only placing shmem
direct and indirect entries in high memory or in swap can I see as
limit to feasibility.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-10 18:20                                     ` Eric W. Biederman
@ 2002-08-10 18:59                                       ` Daniel Phillips
  2002-08-10 19:55                                       ` Rik van Riel
  1 sibling, 0 replies; 110+ messages in thread
From: Daniel Phillips @ 2002-08-10 18:59 UTC (permalink / raw)
  To: Eric W. Biederman, Andrew Morton
  Cc: Linus Torvalds, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

On Saturday 10 August 2002 20:20, Eric W. Biederman wrote:
> Andrew Morton <akpm@zip.com.au> writes:
> > The other worry is the ZONE_NORMAL space consumption of pte_chains.
> > We've halved that, but it will still make high sharing levels
> > unfeasible on the big ia32 machines.  We are dependant upon large
> > pages to solve that problem.  (Resurrection of pte_highmem is in
> > progress, but it doesn't work yet).
> 
> There is a second method to address this.  Pages can be swapped out
> of the page tables and still remain in the page cache, the virtual
> scan does this all of the time.  This should allow for arbitrary
> amounts of sharing.  There is some overhead, in faulting the pages
> back in but it is much better than cases that do not work.  A simple
> implementation would have a maximum pte_chain length.

Oh gosh, nice point.  We could put together a lovely cooked benchmark where 
copy_page_range just fails to copy all the mmap pages, which are most of them 
in the bash test.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-10 19:55                                       ` Rik van Riel
@ 2002-08-10 19:54                                         ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2002-08-10 19:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Linus Torvalds, Daniel Phillips, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

Rik van Riel <riel@conectiva.com.br> writes:

> On 10 Aug 2002, Eric W. Biederman wrote:
> > Andrew Morton <akpm@zip.com.au> writes:
> > >
> > > The other worry is the ZONE_NORMAL space consumption of pte_chains.
> > > We've halved that, but it will still make high sharing levels
> > > unfeasible on the big ia32 machines.
> 
> > There is a second method to address this.  Pages can be swapped out
> > of the page tables and still remain in the page cache, the virtual
> > scan does this all of the time.  This should allow for arbitrary
> > amounts of sharing.  There is some overhead, in faulting the pages
> > back in but it is much better than cases that do not work.  A simple
> > implementation would have a maximum pte_chain length.
> 
> Indeed.  We need this same thing for page tables too, otherwise
> a high sharing situation can easily "require" more page table
> memory than the total amount of physical memory in the system ;)

It's exactly the same situation.  To remove a pte from the chain you must
remove it from the page table as well.  Then we just need to free
pages with no interesting pte entries.
Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-10 18:20                                     ` Eric W. Biederman
  2002-08-10 18:59                                       ` Daniel Phillips
@ 2002-08-10 19:55                                       ` Rik van Riel
  2002-08-10 19:54                                         ` Eric W. Biederman
  1 sibling, 1 reply; 110+ messages in thread
From: Rik van Riel @ 2002-08-10 19:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Linus Torvalds, Daniel Phillips, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On 10 Aug 2002, Eric W. Biederman wrote:
> Andrew Morton <akpm@zip.com.au> writes:
> >
> > The other worry is the ZONE_NORMAL space consumption of pte_chains.
> > We've halved that, but it will still make high sharing levels
> > unfeasible on the big ia32 machines.

> There is a second method to address this.  Pages can be swapped out
> of the page tables and still remain in the page cache, the virtual
> scan does this all of the time.  This should allow for arbitrary
> amounts of sharing.  There is some overhead, in faulting the pages
> back in but it is much better than cases that do not work.  A simple
> implementation would have a maximum pte_chain length.

Indeed.  We need this same thing for page tables too, otherwise
a high sharing situation can easily "require" more page table
memory than the total amount of physical memory in the system ;)

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 15:20                               ` Daniel Phillips
  2002-08-09 15:56                                 ` Linus Torvalds
  2002-08-09 18:32                                 ` Hubertus Franke
@ 2002-08-11 20:30                                 ` Alan Cox
  2002-08-11 22:33                                   ` Daniel Phillips
  2 siblings, 1 reply; 110+ messages in thread
From: Alan Cox @ 2002-08-11 20:30 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: frankeh, davidm, David Mosberger, David S. Miller,
	Linus Torvalds, gh, Martin.Bligh, wli, linux-kernel

On Fri, 2002-08-09 at 16:20, Daniel Phillips wrote:
> On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > "General Purpose Operating System Support for Multiple Page Sizes"
> > htpp://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf
> 
> This reference describes roughly what I had in mind for active 
> defragmentation, which depends on reverse mapping.  The main additional
> wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> which means the caller promises not to pin the allocation unit for long
> periods and does not mind if the underlying physical page changes
> spontaneously.  Defragmenting in this zone is straightforward.

Slight problem. This paper is about a patented SGI method for handling
defragmentation into large pages (6,182,089). They patented it before
the presentation.

They also hold patents on the other stuff that you've recently been
discussing about not keeping seperate rmap structures until there are
more than some value 'n' when they switch from direct to indirect lists
of reverse mappings (6,112,286)

If you are going read and propose things you find on Usenix at least
check what the authors policies on patents are.

Perhaps someone should first of all ask SGI to give the Linux community
permission to use it in a GPL'd operating system ?



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 20:30                                 ` Alan Cox
@ 2002-08-11 22:33                                   ` Daniel Phillips
  2002-08-11 22:55                                     ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Daniel Phillips @ 2002-08-11 22:33 UTC (permalink / raw)
  To: Alan Cox
  Cc: frankeh, davidm, David Mosberger, David S. Miller,
	Linus Torvalds, gh, Martin.Bligh, wli, linux-kernel

On Sunday 11 August 2002 22:30, Alan Cox wrote:
> On Fri, 2002-08-09 at 16:20, Daniel Phillips wrote:
> > On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > > "General Purpose Operating System Support for Multiple Page Sizes"
> > > htpp://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf
> > 
> > This reference describes roughly what I had in mind for active 
> > defragmentation, which depends on reverse mapping.  The main additional
> > wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> > which means the caller promises not to pin the allocation unit for long
> > periods and does not mind if the underlying physical page changes
> > spontaneously.  Defragmenting in this zone is straightforward.
> 
> Slight problem. This paper is about a patented SGI method for handling
> defragmentation into large pages (6,182,089). They patented it before
> the presentation.

See 'straightforward' above, i.e., obvious to a practitioner of the art.
This is another one-click patent.

Look at claim 16, it covers our buddy allocator quite nicely:

   http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1='6182089'.WKU.&OS=PN/6182089&RS=PN/6182089
 
Claim 1 covers the idea of per-size freelist thresholds, below which no
coalescing is done.

Claim 13 covers the idea of having a buddy system on each node of a numa
system.  Bill is going to be somewhat disappointed to find out he can't do
that any more.

It goes on in this vein.  I suggest all vm hackers have a close look at
this.  Yes, it's stupid, but we can't just ignore it.

> They also hold patents on the other stuff that you've recently been
> discussing about not keeping seperate rmap structures until there are
> more than some value 'n' when they switch from direct to indirect lists
> of reverse mappings (6,112,286)

This is interesting.  By setting their 'm' to 1, you get essentially the
scheme implemented by Dave a few weeks ago, and by setting 'm' to 0, the
patent covers pretty much every imaginable reverse mapping scheme.  Gee,
so SGI thought of reverse mapping in 1997 or thereabouts, and nobody ever
did before?

   http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1='6112286'.WKU.&OS=PN/6112286&RS=PN/6112286

Claim 2 covers use of their reverse mapping scheme, which as we have seen,
includes all reverse mapping schemes, for migrating the data content of
pages, and updating the page table pointers.

Claim 4 goes on to cover migration of data pages between nodes of a numa
system.  (Got that wli?)

This patent goes on to claim just about everything you can do with a
reverse map.  It's sure lucky for SGI that they were the first to think
of the idea of reverse mapping.

> If you are going read and propose things you find on Usenix at least
> check what the authors policies on patents are.

As always, I developed my ideas from first principles.  I never saw or
heard of the paper until a few days ago.  I don't need their self-serving
paper to figure this stuff out, and if they are going to do blatantly
commercial stuff like that, I'd rather the paper were not published at
all.  Perhaps Usenix needs to establish a policy about that.

> Perhaps someone should first of all ask SGI to give the Linux community
> permission to use it in a GPL'd operating system ?

Yes, we should ask nicely, if we run into something that matters.  Asking
nicely isn't the only option though.

And yes, I'm trying to be polite.  It's just so stupid.

--
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 22:33                                   ` Daniel Phillips
@ 2002-08-11 22:55                                     ` Linus Torvalds
  2002-08-11 22:56                                       ` Linus Torvalds
  2002-08-11 23:15                                       ` Larry McVoy
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-11 22:55 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Alan Cox, frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel


On Mon, 12 Aug 2002, Daniel Phillips wrote:
> 
> It goes on in this vein.  I suggest all vm hackers have a close look at
> this.  Yes, it's stupid, but we can't just ignore it.

Actually, we can, and I will.

I do not look up any patents on _principle_, because (a) it's a horrible 
waste of time and (b) I don't want to know. 

The fact is, technical people are better off not looking at patents. If
you don't know what they cover and where they are, you won't be knowingly
infringing on them. If somebody sues you, you change the algorithm or you 
just hire a hit-man to whack the stupid git.

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 22:55                                     ` Linus Torvalds
@ 2002-08-11 22:56                                       ` Linus Torvalds
  2002-08-11 23:36                                         ` William Lee Irwin III
  2002-08-12  0:46                                         ` Alan Cox
  2002-08-11 23:15                                       ` Larry McVoy
  1 sibling, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-11 22:56 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Alan Cox, frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel


On Sun, 11 Aug 2002, Linus Torvalds wrote:
> 
> If somebody sues you, you change the algorithm or you just hire a
> hit-man to whack the stupid git.

Btw, I'm not a lawyer, and I suspect this may not be legally tenable 
advice. Whatever. I refuse to bother with the crap.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 22:55                                     ` Linus Torvalds
  2002-08-11 22:56                                       ` Linus Torvalds
@ 2002-08-11 23:15                                       ` Larry McVoy
  2002-08-12  1:26                                         ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Larry McVoy @ 2002-08-11 23:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, Alan Cox, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

On Sun, Aug 11, 2002 at 03:55:08PM -0700, Linus Torvalds wrote:
> 
> On Mon, 12 Aug 2002, Daniel Phillips wrote:
> > 
> > It goes on in this vein.  I suggest all vm hackers have a close look at
> > this.  Yes, it's stupid, but we can't just ignore it.
> 
> Actually, we can, and I will.
> 
> I do not look up any patents on _principle_, because (a) it's a horrible 
> waste of time and (b) I don't want to know. 
> 
> The fact is, technical people are better off not looking at patents. If
> you don't know what they cover and where they are, you won't be knowingly
> infringing on them. If somebody sues you, you change the algorithm or you 
> just hire a hit-man to whack the stupid git.

This issue is more complicated than you might think.  Big companies with 
big pockets are very nervous about being too closely associated with 
Linux because of this problem.  Imagine that IBM, for example, starts
shipping IBM Linux.  Somewhere in the code there is something that 
infringes on a patent.  Given that it is IBM Linux, people can make 
the case that IBM should have known and should have fixed it and 
since they didn't, they get sued.  Notice that IBM doesn't ship 
their own version of Linux, they ship / support Red Hat or Suse
(maybe others, doesn't matter).  So if they ever get hassled, they'll
vector the problem to those little guys and the issue will likely
get dropped because the little guys have no money to speak of.

Maybe this is all good, I dunno, but be aware that the patents 
have long arms and effects.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 22:56                                       ` Linus Torvalds
@ 2002-08-11 23:36                                         ` William Lee Irwin III
  2002-08-12  0:46                                         ` Alan Cox
  1 sibling, 0 replies; 110+ messages in thread
From: William Lee Irwin III @ 2002-08-11 23:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, Alan Cox, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, linux-kernel

On Sun, 11 Aug 2002, Linus Torvalds wrote:
>> If somebody sues you, you change the algorithm or you just hire a
>> hit-man to whack the stupid git.

On Sun, Aug 11, 2002 at 03:56:10PM -0700, Linus Torvalds wrote:
> Btw, I'm not a lawyer, and I suspect this may not be legally tenable 
> advice. Whatever. I refuse to bother with the crap.

I'm not really sure what to think of all this patent stuff myself, but
I may need to get some directions from lawyerish types before moving on
here. OTOH I certainly like the suggested approach more than my
conservative one, even though I'm still too chicken to follow it. =)

On a more practical note, though, someone left out an essential 'h'
from my email address. Please adjust the cc: list. =)


Thanks,
Bill

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-12  0:46                                         ` Alan Cox
@ 2002-08-11 23:42                                           ` Rik van Riel
  2002-08-11 23:50                                             ` Larry McVoy
  2002-08-11 23:44                                           ` large page patch (fwd) (fwd) Daniel Phillips
  1 sibling, 1 reply; 110+ messages in thread
From: Rik van Riel @ 2002-08-11 23:42 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Daniel Phillips, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On 12 Aug 2002, Alan Cox wrote:

> Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> would be decent enough to explicitly state they will license this stuff
> freely for GPL use

I seem to remember Apple having a clause for this in
their Darwin sources, forbidding people who contribute
code from suing them about patent violations due to
the code they themselves contributed.

kind regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-12  0:46                                         ` Alan Cox
  2002-08-11 23:42                                           ` Rik van Riel
@ 2002-08-11 23:44                                           ` Daniel Phillips
  2002-08-13  8:51                                             ` Rob Landley
  1 sibling, 1 reply; 110+ messages in thread
From: Daniel Phillips @ 2002-08-11 23:44 UTC (permalink / raw)
  To: Alan Cox, Linus Torvalds
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel

On Monday 12 August 2002 02:46, Alan Cox wrote:
> On Sun, 2002-08-11 at 23:56, Linus Torvalds wrote:
> > 
> > On Sun, 11 Aug 2002, Linus Torvalds wrote:
> > > 
> > > If somebody sues you, you change the algorithm or you just hire a
> > > hit-man to whack the stupid git.
> > 
> > Btw, I'm not a lawyer, and I suspect this may not be legally tenable 
> > advice. Whatever. I refuse to bother with the crap.
> 
> In which case you might as well do the rest of the world a favour and
> restrict US usage of Linux in the license file while you are at it.
> Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> would be decent enough to explicitly state they will license this stuff
> freely for GPL use (although having shipping Linux themselves the
> question is partly moot as the GPL says they can't impose additional
> restrictions)

I do not agree that it is enough to license it for 'GPL' use.  If there is
a license, it should impose no restrictions that the GPL does not.  There
is a big distinction.  Anything else, and the licensor is sending the message 
that they reserve the right to enforce against Linux users.

In other words, a license grant has to cover *all* uses of Linux and not just 
GPL uses.

In my opinion, RedHat has set a bad example by stopping short of promising 
free use of Ingo's patents for all Linux users.  We are entering a difficult 
time, and such a wrong-footed move simply makes it more difficult.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 23:42                                           ` Rik van Riel
@ 2002-08-11 23:50                                             ` Larry McVoy
  2002-08-12  8:22                                               ` Daniel Phillips
  0 siblings, 1 reply; 110+ messages in thread
From: Larry McVoy @ 2002-08-11 23:50 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Linus Torvalds, Daniel Phillips, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Sun, Aug 11, 2002 at 08:42:16PM -0300, Rik van Riel wrote:
> On 12 Aug 2002, Alan Cox wrote:
> 
> > Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> > would be decent enough to explicitly state they will license this stuff
> > freely for GPL use
> 
> I seem to remember Apple having a clause for this in
> their Darwin sources, forbidding people who contribute
> code from suing them about patent violations due to
> the code they themselves contributed.

IBM has a fantastic clause in their open source license.  The license grants
you various rights to use, etc., and then goes on to say something in 
the termination section (I think) along the lines of 

	In the event that You or your affiliates instigate patent, trademark,
	and/or any other intellectual property suits, this license terminates
	as of the filing date of said suit[s].

You get the idea.  It's basically "screw me, OK, then screw you too" language.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 22:56                                       ` Linus Torvalds
  2002-08-11 23:36                                         ` William Lee Irwin III
@ 2002-08-12  0:46                                         ` Alan Cox
  2002-08-11 23:42                                           ` Rik van Riel
  2002-08-11 23:44                                           ` large page patch (fwd) (fwd) Daniel Phillips
  1 sibling, 2 replies; 110+ messages in thread
From: Alan Cox @ 2002-08-12  0:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

On Sun, 2002-08-11 at 23:56, Linus Torvalds wrote:
> 
> On Sun, 11 Aug 2002, Linus Torvalds wrote:
> > 
> > If somebody sues you, you change the algorithm or you just hire a
> > hit-man to whack the stupid git.
> 
> Btw, I'm not a lawyer, and I suspect this may not be legally tenable 
> advice. Whatever. I refuse to bother with the crap.

In which case you might as well do the rest of the world a favour and
restrict US usage of Linux in the license file while you are at it.
Unfortunately the USA forces people to deal with this crap. I'd hope SGI
would be decent enough to explicitly state they will license this stuff
freely for GPL use (although having shipping Linux themselves the
question is partly moot as the GPL says they can't impose additional
restrictions)

Alan


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 23:15                                       ` Larry McVoy
@ 2002-08-12  1:26                                         ` Linus Torvalds
  2002-08-12  5:05                                           ` Larry McVoy
  2002-08-12 10:31                                           ` Alan Cox
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-12  1:26 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Daniel Phillips, Alan Cox, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel


On Sun, 11 Aug 2002, Larry McVoy wrote:
> 
> This issue is more complicated than you might think.

No, it's not. You miss the point.

>					  Big companies with 
> big pockets are very nervous about being too closely associated with 
> Linux because of this problem. 

The point being that that is _their_ problem, and at a level that has 
nothing to do with technology.

I'm saying that technical people shouldn't care. I certainly don't. The 
people who _should_ care are patent attourneys etc, since they actually 
get paid for it, and can better judge the matter anyway.

Everybody in the whole software industry knows that any non-trivial
program (and probably most trivial programs too, for that matter) will
infringe on _some_ patent. Ask anybody. It's apparently an accepted fact,
or at least a saying that I've heard too many times. 

I just don't care. Clearly, if all significant programs infringe on 
something, the issue is no longer "do we infringe", but "is it an issue"?

And that's _exactly_ why technical people shouldn't care. The "is it an 
issue" is not something a technical guy can answer, since the answer 
depends on totally non-technical things.

Ask your legal counsel, and I strongly suspect that if he is any good, he
will tell you the same thing. Namely that it's _his_ problem, and that
your engineers should not waste their time trying to find existing
patents.

			Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-12  1:26                                         ` Linus Torvalds
@ 2002-08-12  5:05                                           ` Larry McVoy
  2002-08-12 10:31                                           ` Alan Cox
  1 sibling, 0 replies; 110+ messages in thread
From: Larry McVoy @ 2002-08-12  5:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Larry McVoy, Daniel Phillips, Alan Cox, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh, wli,
	linux-kernel

> Ask your legal counsel, and I strongly suspect that if he is any good, he
> will tell you the same thing. Namely that it's _his_ problem, and that
> your engineers should not waste their time trying to find existing
> patents.

Partially true for us.  We do do patent searches to make sure we aren't
doing anything blatently stupid.

I do agree with you 100% that it is impossible to ship any software that
does not infringe on some patent.  It's a big point of contention in 
contract negotiations because everyone wants you to warrant that your
software doesn't infringe and indemnify them if it does.  
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 23:50                                             ` Larry McVoy
@ 2002-08-12  8:22                                               ` Daniel Phillips
  2002-08-13  8:40                                                 ` Rob Landley
  0 siblings, 1 reply; 110+ messages in thread
From: Daniel Phillips @ 2002-08-12  8:22 UTC (permalink / raw)
  To: Larry McVoy, Rik van Riel
  Cc: Alan Cox, Linus Torvalds, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel

On Monday 12 August 2002 01:50, Larry McVoy wrote:
> On Sun, Aug 11, 2002 at 08:42:16PM -0300, Rik van Riel wrote:
> > On 12 Aug 2002, Alan Cox wrote:
> > 
> > > Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> > > would be decent enough to explicitly state they will license this stuff
> > > freely for GPL use
> > 
> > I seem to remember Apple having a clause for this in
> > their Darwin sources, forbidding people who contribute
> > code from suing them about patent violations due to
> > the code they themselves contributed.
> 
> IBM has a fantastic clause in their open source license.  The license grants
> you various rights to use, etc., and then goes on to say something in 
> the termination section (I think) along the lines of 
> 
> 	In the event that You or your affiliates instigate patent, trademark,
> 	and/or any other intellectual property suits, this license terminates
> 	as of the filing date of said suit[s].
> 
> You get the idea.  It's basically "screw me, OK, then screw you too" language.

Yes.  I would like to add my current rmap optimization work, if it is worthy
for the usual reasons, to the kernel under a DPL license which is in every
respect the GPL, except that it adds one additional restriction along the
lines:

  "If you enforce a patent against a user of this code, or you have a
   beneficial relationship with someone who does, then your licence to
   use or distribute this code is automatically terminated"

with more language to extend the protection to the aggregate work, and to
specify that we are talking about enforcement of patents concerned with any
part of the aggregate work.  Would something like that fly?

In other words, use copyright law as a lever against patent law.

This would tend to provide protection against 'our friends', who on the one
hand, depend on Linux in their businesses, and on the other hand, do seem to
be holding large portfolios of equivalently stupid patents.

As far as protection against those who would have no intention or need to use
the aggregate work anyway, that's an entirely separate question.  Frankly, I
enjoy the sport of undermining a patent much more when it is held by someone
who is not a friend.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-09 16:27                                   ` Rik van Riel
  2002-08-09 16:52                                     ` Linus Torvalds
@ 2002-08-12  9:23                                     ` Helge Hafting
  2002-08-13  3:15                                       ` Bill Davidsen
  1 sibling, 1 reply; 110+ messages in thread
From: Helge Hafting @ 2002-08-12  9:23 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel

Rik van Riel wrote:

> One problem we're running into here is that there are absolutely
> no tools to measure some of the things rmap is supposed to fix,
> like page replacement.
> 
There are things like running vmstat while running tests or production.

My office desktop machine (256M RAM) rarely swaps more than 10M
during work with 2.5.30.  It used to go some 70M into swap
after a few days of writing, browsing, and those updatedb runs.  

Helge Hafting

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-12  1:26                                         ` Linus Torvalds
  2002-08-12  5:05                                           ` Larry McVoy
@ 2002-08-12 10:31                                           ` Alan Cox
  1 sibling, 0 replies; 110+ messages in thread
From: Alan Cox @ 2002-08-12 10:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Larry McVoy, Daniel Phillips, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, wli, linux-kernel

On Mon, 2002-08-12 at 02:26, Linus Torvalds wrote:
> Ask your legal counsel, and I strongly suspect that if he is any good, he
> will tell you the same thing. Namely that it's _his_ problem, and that
> your engineers should not waste their time trying to find existing
> patents.

Wasn't a case of wasting time. That one is extremely well known because
there were upset people when SGI patented it and then submitted a usenix
paper on it.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-12  9:23                                     ` Helge Hafting
@ 2002-08-13  3:15                                       ` Bill Davidsen
  2002-08-13  3:31                                         ` Rik van Riel
  2002-08-13  7:28                                         ` Helge Hafting
  0 siblings, 2 replies; 110+ messages in thread
From: Bill Davidsen @ 2002-08-13  3:15 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Rik van Riel, linux-kernel

On Mon, 12 Aug 2002, Helge Hafting wrote:

> Rik van Riel wrote:
> 
> > One problem we're running into here is that there are absolutely
> > no tools to measure some of the things rmap is supposed to fix,
> > like page replacement.
> > 
> There are things like running vmstat while running tests or production.
> 
> My office desktop machine (256M RAM) rarely swaps more than 10M
> during work with 2.5.30.  It used to go some 70M into swap
> after a few days of writing, browsing, and those updatedb runs.  

Now tell us how someone who isn't a VM developer can tell if that's bad or
good. Is it good because it didn't swap more than it needed to, or bad
because there were more things it could have swapped to make more buffer
room? 

Serious question, tuning the -aa VM sometimes makes the swap use higher,
even as the response to starting small jobs while doing kernel compiles or
mkisofs gets better. I don't normally tune -ac kernels much, so I can't
comment there.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13  3:15                                       ` Bill Davidsen
@ 2002-08-13  3:31                                         ` Rik van Riel
  2002-08-13  7:28                                         ` Helge Hafting
  1 sibling, 0 replies; 110+ messages in thread
From: Rik van Riel @ 2002-08-13  3:31 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Helge Hafting, linux-kernel

On Mon, 12 Aug 2002, Bill Davidsen wrote:

> Now tell us how someone who isn't a VM developer can tell if that's bad
> or good. Is it good because it didn't swap more than it needed to, or
> bad because there were more things it could have swapped to make more
> buffer room?

Good point, just looking at the swap usage doesn't mean
much because we're interested in the _consequences_ of
that number and not in the number itself.

> Serious question, tuning the -aa VM sometimes makes the swap use higher,
> even as the response to starting small jobs while doing kernel compiles
> or mkisofs gets better. I don't normally tune -ac kernels much, so I
> can't comment there.

The key word here is "response", benchmarks really need
to be able to measure responsiveness.

Some benchmarks (eg. irman by Bob Matthews) do this
already, but we're still focussing too much on throughput.


In 1990 Margo Selzer wrote an excellent paper on disk IO
sorting and its effects on throughput and latency.  The
end result was that in order to get decent throughput by
doing just disk IO sorting you would need queues so deep
that IO latency would grow to about 30 seconds. ;)

Of course, if databases or online shops would increase
their throughput by going to deep queueing and every
request would get 30 second latencies ... they would
immediately lose their users (or customers) !!!

I'm pretty convinced that sysadmins aren't interested
in throughput, at least not until throughput is so low
that it starts affecting system response latency.


regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13  3:15                                       ` Bill Davidsen
  2002-08-13  3:31                                         ` Rik van Riel
@ 2002-08-13  7:28                                         ` Helge Hafting
  1 sibling, 0 replies; 110+ messages in thread
From: Helge Hafting @ 2002-08-13  7:28 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Rik van Riel, linux-kernel

Bill Davidsen wrote:
> 
> On Mon, 12 Aug 2002, Helge Hafting wrote:

> > My office desktop machine (256M RAM) rarely swaps more than 10M
> > during work with 2.5.30.  It used to go some 70M into swap
> > after a few days of writing, browsing, and those updatedb runs.
> 
> Now tell us how someone who isn't a VM developer can tell if that's bad or
> good. Is it good because it didn't swap more than it needed to, or bad
> because there were more things it could have swapped to make more buffer
> room?

It feels more responsive too - which is no surprise.  Like most users,
I don't _expect_ to wait for swapin when pressing a key or something.
Waiting for file io seems to be less of a problem, that stuff
_is_ on disk after all.  I guess many people who knows a little about
computers feel this way.  People that don't know what a "disk" is
may be different and more interested in total waiting.

On the serious side: vmstat provides more than swap info.  It also
lists block io, where one might see if the block io goes up or down.
I suggest to find some repeatable workload with lots of file & swap
io, and see how much we get of each.  My guess is that rmap
results in less io to to the same job.  Not only swap io, but
swap+file io too.  The design is more capable of selecting
the _right_ page to evict. (Assuming that page usage may
tell us something useful.)  So the only questions left is
if the current implementation is good, and if the
improved efficiency makes up for the memory overhead.

> 
> Serious question, tuning the -aa VM sometimes makes the swap use higher,
> even as the response to starting small jobs while doing kernel compiles or
> mkisofs gets better. I don't normally tune -ac kernels much, so I can't
> comment there.

Swap is good if there's lots of file io and
lots of unused apps sitting around.  And bad if there's a large working
set and little _repeating_ file io.  Such as the user switching between
a bunch of big apps working on few files.  And perhaps some
non-repeating
io like updatedb or mail processing...

Helge Hafting

Helge Hafting

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-12  8:22                                               ` Daniel Phillips
@ 2002-08-13  8:40                                                 ` Rob Landley
  2002-08-13 15:06                                                   ` Alan Cox
  0 siblings, 1 reply; 110+ messages in thread
From: Rob Landley @ 2002-08-13  8:40 UTC (permalink / raw)
  To: Daniel Phillips, Larry McVoy, Rik van Riel
  Cc: Alan Cox, Linus Torvalds, frankeh, davidm, David Mosberger,
	David S. Miller, gh, Martin.Bligh, William Lee Irwin III,
	linux-kernel

On Monday 12 August 2002 04:22 am, Daniel Phillips wrote:

> Yes.  I would like to add my current rmap optimization work, if it is
> worthy for the usual reasons, to the kernel under a DPL license which is in
> every respect the GPL, except that it adds one additional restriction along
> the lines:
>
>   "If you enforce a patent against a user of this code, or you have a
>    beneficial relationship with someone who does, then your licence to
>    use or distribute this code is automatically terminated"
>
> with more language to extend the protection to the aggregate work, and to
> specify that we are talking about enforcement of patents concerned with any
> part of the aggregate work.  Would something like that fly?
>
> In other words, use copyright law as a lever against patent law.

More than that, the GPL could easily be used to form a "patent pool".  Just 
say "This patent is licensed for use in GPL code.  If you want to use it 
outside of GPL code, you need a seperate license."

The purpose of modern patents is Mutually Assured Destruction: If you sue me, 
I have 800 random patents you're bound to have infringed just by breating, 
and even though they won't actually hold up to scrutiny I can keep you tied 
up in court for years and force you to spend millions on legal fees.  So why 
don't you just cross-license your entire patent portfolio with us, and that 
way we can make the whole #*%(&#% patent issue just go away.  (Notice: when 
aybody DOES sue, the result is usually a cross-licensing agreement of the 
entire patent portfolio.  Even in those rare cases when the patent 
infringement is LEGITIMATE, the patent system is too screwed up to function 
against large corporations due to the zillions of frivolous patents and the 
tendency for corporations to have lawyers on staff so defending a lawsuit 
doesn't really cost them anything.)

This is how companies like IBM and even Microsoft think.  They get as many 
patents as possible to prevent anybody ELSE from suing them, because the 
patent office is stupid enough to give out a patent on scrollbars a decade 
after the fact and they don't want to be on the receiving end of this 
nonsense.  And then they blanket cross-license with EVERYBODY, so nobody can 
sue them.

People do NOT want to give a blanket license to everybody for any use on 
these patents because it gives up the one thing they're good for: mutually 
assured destruction.  Licensing for "open source licenses" could mean "BSD 
license but we never gave anybody any source code, so ha ha."

But if people with patents were to license all their patents FOR USE IN GPL 
CODE, then any proprietary infringement (or attempt to sue) still gives them 
leverage for a counter-suit.  (IBM retained counter-suit ability in a 
different way: you sue, the license terminates.  That's not bad, but I think 
sucking the patent system into the GPL the same way copyright gets inverted 
would be more useful.)

This is more or less what Red Hat's done with its patents, by the way.  
Blanket license for use under GPL-type licenses, but not BSD because that 
would disarm mutually assured destruction.  Now if we got somebody like IBM 
on board a GPL patent pool (with more patents than anybody else, as far as I 
know), that would realy mean something...

Unfortunately, the maintainer of the GPL is Stallman, so he's the logical guy 
to spearhead a "GPL patent pool" project, but any time anybody mentions the 
phrase "intellectual property" to him he goes off on a tangent about how you 
shouldn't call anything "intellectual property", so how can you have a 
discussion about it, and nothing ever gets done.  It's FRUSTRATING to see 
somebody with such brilliant ideas hamstrung not just by idealism, but 
PEDANTIC idealism.

Sigh...

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-11 23:44                                           ` large page patch (fwd) (fwd) Daniel Phillips
@ 2002-08-13  8:51                                             ` Rob Landley
  2002-08-13 16:47                                               ` Daniel Phillips
  0 siblings, 1 reply; 110+ messages in thread
From: Rob Landley @ 2002-08-13  8:51 UTC (permalink / raw)
  To: Daniel Phillips, Alan Cox, Linus Torvalds
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel

On Sunday 11 August 2002 07:44 pm, Daniel Phillips wrote:

> In other words, a license grant has to cover *all* uses of Linux and not
> just GPL uses.

Including a BSD license where source code is never released?  Or dot-net 
application servers hosted on a Linux system under lock and key in a vault 
somewhere?  And no termination clause, so this jerk can still sue you over 
other frivolous patents?

So you would object to microsoft granting rights to its patents saying "you 
can use this patent in software that runs on windows, but use it on any other 
platform and we'll sue you", but you don't mind going the other way?

Either way BSD gets the shaft, of course.  But then BSDI was doing that them 
a decade ago, and Sun hired away Bill Joy and forked off SunOS years before 
that, so they should be used to it by now... :)  (And BSD runs plenty of GPL 
application code...)

> In my opinion, RedHat has set a bad example by stopping short of promising
> free use of Ingo's patents for all Linux users.  We are entering a
> difficult time, and such a wrong-footed move simply makes it more
> difficult.

Imagine a slimeball company that puts out proprietary software, gets a patent 
on turning a computer on, and sues everybody in the northern hemisphere ala 
rambus.  They run a Linux system in the corner in their office, therefore 
they are "a linux user".  How do you stop somebody with that mindset from 
finding a similarly trivial loophole in your language?  (Think Spamford 
Wallace.  Think the CEO of Rambus.  Think Unisys and the gif patent.  Think 
the people who recently got a patent on JPEG.  Think the british telecom 
idiots trying to patent hyperlinking a decade after Tim Berners-Lee's first 
code drop to usenet...)

Today, all these people do NOT sue IBM, unless they're really stupid.  (And 
if they do, they will have cross-licensed their patent portfolio with IBM in 
a year or two.  Pretty much guaranteed.)

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 15:06                                                   ` Alan Cox
@ 2002-08-13 11:36                                                     ` Rob Landley
  2002-08-13 16:51                                                       ` Linus Torvalds
       [not found]                                                       ` <Pine.LNX.4.44.0208130942130.7411-100000@home.transmeta.com >
  0 siblings, 2 replies; 110+ messages in thread
From: Rob Landley @ 2002-08-13 11:36 UTC (permalink / raw)
  To: Alan Cox
  Cc: Daniel Phillips, Larry McVoy, Rik van Riel, Linus Torvalds,
	frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, William Lee Irwin III, linux-kernel

On Tuesday 13 August 2002 11:06 am, Alan Cox wrote:
> On Tue, 2002-08-13 at 09:40, Rob Landley wrote:
> > Unfortunately, the maintainer of the GPL is Stallman, so he's the logical
> > guy to spearhead a "GPL patent pool" project, but any time anybody
> > mentions the phrase "intellectual property" to him he goes off on a
> > tangent about how you shouldn't call anything "intellectual property", so
> > how can you have a discussion about it, and nothing ever gets done.  It's
> > FRUSTRATING to see somebody with such brilliant ideas hamstrung not just
> > by idealism, but PEDANTIC idealism.
>
> Richard isnt daft on this one. The FSF does not have the 30 million
> dollars needed to fight a *single* US patent lawsuit. The problem also
> reflects back on things like Debian, because Debian certainly cannot
> afford to play the patent game either.

Agreed, but they can try to give standing to companies that have either the 
resources or the need to do it themselves, and also to placate people who see 
patent applications by SGI and Red Hat as evil proprietary encroachment 
rather than an attempt to scrape together some kind of defense against the 
insanity of the patent system.

Like politics: it's a game you can't win by ignoring, you can only try to use 
it against itself.  The GPL did a great job of this with copyright law: it 
doesn't abandon stuff into the public domain for other people to copyright 
and claim, but keeps it copyrighted and uses that copyright against the 
copyright system.  But at the time software patents weren't enforceable yet 
and I'm guessing the wording of the license didn't want to lend credibility 
to the concept.  This situation has changed since: now software patents are 
themselves an IP threat to free software that needs a copyleft solution.

Releasing a GPL 2.1 with an extra clause about a patent pool wouldn't cost 
$30 million.  (I.E. patents used in GPL code are copyleft style licensed and 
not BSD style licensed: they can be used in GPL code but use outside it 
requires a seperate license.  Right now it says something like "free for use 
by all" which makes the mutually assured destruction people cringe.)

By the way, the average figure I've heard to defend against a patent suit is 
about $2 1/2 million.  That's defend and not pursue, and admittedly that's 
not near the upper limit, but it CAN be done for less.  And what you're 
looking for in a patent pool is something to countersue with in a defense, 
not something to initiate action with.  (Obviously, I'm not a professional 
intellectual property lawyer.  I know who to ask, but to get more than an off 
the cuff remark I'd have to sponsor some research...)

Last time I really looked into all this, Stallman was trying to do an 
enormous new GPL 3.0, addressing application service providers.  That seems 
to have fallen though (as has the ASP business model), but the patent issue 
remains unresolved.

Red Hat would certainly be willing to play in a GPL patent pool.  The 
statement on their website already gives blanket permission to use patents in 
GPL code (and a couple similar licenses; this would be a subset of the 
permission they've already given).  Red Hat's participation might convince 
other distributors to do a "me too" thing (there's certainly precedent for 
it).  SGI could probably be talked into it as well, since they need the 
goodwill of the Linux community unless they want to try to resurrect Irix.  
IBM would take some convincing, it took them a couple years to get over their 
distaste for the GPL in the first place, and they hate to be first on 
anything, but if they weren't first...  HP I haven't got a CLUE about with 
Fiorina at the helm.  Dell is being weird too...

Dunno.  But ANY patent pool is better than none.  If suing somebody for the 
use of a patent in GPL code terminates your right to participate in a GPL 
patent pool and makes you vulnerable to a suit over violating any patent in 
the pool, then the larger the pool is the more incentive there is NOT to 
sue...

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 16:51                                                       ` Linus Torvalds
@ 2002-08-13 12:53                                                         ` Rob Landley
  2002-08-13 17:14                                                         ` Ruth Ivimey-Cook
  2002-08-13 17:29                                                         ` Rik van Riel
  2 siblings, 0 replies; 110+ messages in thread
From: Rob Landley @ 2002-08-13 12:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Daniel Phillips, Larry McVoy, Rik van Riel, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Tuesday 13 August 2002 12:51 pm, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rob Landley wrote:
> > Last time I really looked into all this, Stallman was trying to do an
> > enormous new GPL 3.0, addressing application service providers.  That
> > seems to have fallen though (as has the ASP business model), but the
> > patent issue remains unresolved.
>
> At least one problem is exactly the politics played by the FSF, which
> means that a lot of people (not just me), do not trust such new versions
> of the GPL. Especially since the last time this happened, it all happened
> in dark back-rooms, and I got to hear about it not off any of the lists,
> but because I had an insider snitch on it.
>
> I lost all respect I had for the FSF due to its sneakiness.

Exactly why I was thinking a minimalist version (I.E. one more paragraph) was 
about the biggest change the community would be likely to accept.  I strongly 
suspeced GPL 3.0 was going nowhere long before it actually got bogged down...

And the politics being played by the FSF seem to be why they're NOT 
interested in a specific patch to fix a specific problem (lack of addressing 
patents).  If you want bug fixes, they want to log-roll huge new 
infrastructure changes and force you to swallow the whole upgrade.  That's 
been a problem on this list before. :)

> The kernel explicitly states that it is under the _one_ particular version
> of the "GPL v2" that is included with the kernel. Exactly because I do not
> want to have politics dragged into the picture by an external party (and
> I'm anal enough that I made sure that "version 2" cannot be misconstrued
> to include "version 2.1".

Sure.  But it's been re-licensed before.  Version 0.12, if I recall.  (And 
the statement of restriction to 2.0 could also be considered a re-licensing, 
albeit a minor one.)  How much leverage even YOU have to fiddle with the 
license at this point is an open question, but if a version 2.1 WAS 
acceptable (if, if, important word that, and obviously this would be after 
seeing it), and you decided to relax the 2.0 restriction to a 2.1 restriction 
(still operating under the "if" here, I can include semicolons if you 
like...), it probably wouldn't muddy the legal waters too much if the sucker 
later had to be upheld in court (<- nested if).

"Probably" meaning "ask a lawyer", of course...

> Also, a license is a two-way street. I do not think it is morally right to
> change an _existing_ license for any other reason than the fact that it
> has some technical legal problem. I intensely dislike the fact that many
> people seem to want to extend the current GPL as a way to take advantage
> of people who used the old GPL and agreed with _that_ - but not
> necessarily the new one.

The only reason I'd worry about trying to integrate it is to ensure that a 
"patent pool" adendum was compatible with the GPL itself.  It's not an 
additional restricition that would violate the GPL, it's a grant of license 
on an area not explicitly addressed by the GPL, and it's a grant of 
permissions giving you rights you wouldn't otherwise necessarily have.

The problem comes with the "if you sue, your rights terminate" clause.  On 
the one hand, the GPL is generally incompatable with additional termination 
clauses.  On the other hand, it's a termination clause only of the additional 
rights granted by the patent license, not of the rights granted by the GPL 
itself, which is a copyright license...

It's a bit of legal hacking that would definitely require vetting by a 
professional...

On the other hand, cross-licensing ALL your patents with a GPL patent pool 
would probably have to be a seperate statement from the license, that's a 
bigger decision than simply releasing GPL code that might use one or two 
patents, and it's best to have that decision explicitly made and explicitly 
stated.  (The GPL only applies to what you specifically release under it...)  
So making an external statement be compatable with the GPL is definitely a 
good thing anyway.

A case could be made that section 7 sort of implies an intent that enforcing 
patent restrictions violate the license and thus terminate your rights to 
distribute under section 4, and could be argued to mean that you can't put 
code under the GPL without at least implying a license to your own patents.  
But that doesn't solve the "third party who never contributed" problem.  
(That's what requires the patent license termination clause, thus making you 
vulnerable to suits for infringing other patents in the pool...)

I THINK it could be made to work as a seperate supplementary licensing 
statement, compatable with the GPL.  I know it could be made to work as an 
upgrade to the GPL, but you're right there's huge problems with that 
approach...

Either way, it's vaporware until acceptable language is stitched together and 
run by a competent IP attourney...

> As a result, every time this comes up, I ask for any potential new
> "patent-GPL" to be a _new_ license, and not try to feed off existing
> works. Please dopn't make it "GPL". Make it the GPPL for "General Public
> Patent License" or something. And let people buy into it on its own
> merits, not on some "the FSF decided unilaterally to make this decision
> for us".

GPL+, possibly...

In either case it would be a new license.  The people putting "or later" in 
their copyright notices trust the FSF and thus the FSF's new licenses (if 
any).  The people who specify a specific version don't.  The license seems to 
have been intentionally written to leave the option of making this 
distinction open.

> I don't like patents. But I absolutely _hate_ people who play politics
> with other peoples code. Be up-front, not sneaky after-the-fact.

Well, GPL section 9 did plant this particular land mine in 1991, so this is 
probably a case of being sneaky up front. :)  But it's still being sneaky...

That said, section 9 just states that the FSF will put out new versions and 
that code that says a version number "or later", or who don't specify any 
version, can automatically be used under the new version.  The ones that 
specify a specific version don't automatically get re-licensed in future by 
section 9, so the linux case is pretty clear.  (Well, disregarding the binary 
module thing, anyway. 8)

> 		Linus

Rob

P.S.  Yes everybody, RTFL:  http://www.gnu.org/copyleft/gpl.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 16:47                                               ` Daniel Phillips
@ 2002-08-13 13:09                                                 ` Rob Landley
  0 siblings, 0 replies; 110+ messages in thread
From: Rob Landley @ 2002-08-13 13:09 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

On Tuesday 13 August 2002 12:47 pm, Daniel Phillips wrote:
> On Tuesday 13 August 2002 10:51, Rob Landley wrote:
> > On Sunday 11 August 2002 07:44 pm, Daniel Phillips wrote:
> > So you would object to microsoft granting rights to its patents saying
> > "you can use this patent in software that runs on windows, but use it on
> > any other platform and we'll sue you", but you don't mind going the other
> > way?
>
> You missed the point.  I was talking about using copyright against patents,
> and specifically in the case where patents are held by people who also want
> to use the copyrighted code.  The intention is to help keep our friends
> honest.

Does the little company that recently got a patent on JPEG actually used open 
source code in-house?  They might be a windows-only shop.  I don't know.

> Dealing with Microsoft, or anyone else whose only motivation is to
> obstruct, is an entirely separate issue.

Oddly enough, Microsoft isn't a major threat here.  They don't seem to want 
to lob the first nuke any more than anybody else here.  They have too much to 
lose.  If they unleashed their patent portfolio upon the Linux community, 
there are enough big players with their own patent portfolios and a vested 
interest in Linux to respond in kind.  (Microsoft is happy to rattled their 
saber about the unenforceability of the GPL, and threaten to use patents to 
stop it, but you'll notice they haven't DONE it yet.  Threats are cheap, in 
the legal world.  As far as I can tell, at least 90% of any legal maneuvering 
is posturing and seeing if the other guy blinks.  It's mostly a game of 
chicken, you never know WHAT a judge or jury will actually say, when it comes 
down to it.)

They can't go after the big players with patents anyway, they've already got 
cross-licensing agreements with most of them (which is the point of patent 
portfolios in the first place).  So it's only the small players they could 
really go up against, and they simply don't see those as their real 
competition except as allies to big players like IBM, HPaq, Dell...

And if they're going after the small fry, having already been convicted in 
court of being an abusive monopoly, they open themselves to a class-action 
suit by ambulance chasers working on retainer against the prospect of tapping 
Microsoft's deep pockets in a judgement or settlement.  (Sort of like suing 
the tobacco industry: it's not easy but lawyers still sign on because there's 
so much MONEY to be gained if they win...)  An explicit patent infringement 
suit does NOT give plausible deniability of the "you can't prove we didn't 
win simply because we were better in the marketplace" kind.  (You can't prove 
the tooth fairy doesn't exist, either.  Hard to prove a negative.)

Other than FUD, the more likely pragmatic problem is some small fry with no 
stake in anything who thinks he can get rich quick by being really slimy.  
Anybody remember the origin of the Linux trademark?  THAT is the most 
annoying problem patents pose, being nibbled to death by ants...

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:29                                                         ` Rik van Riel
@ 2002-08-13 13:18                                                           ` Rob Landley
  2002-08-13 18:32                                                             ` Linus Torvalds
  2002-08-13 17:45                                                           ` Alexander Viro
                                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 110+ messages in thread
From: Rob Landley @ 2002-08-13 13:18 UTC (permalink / raw)
  To: Rik van Riel, Linus Torvalds
  Cc: Alan Cox, Daniel Phillips, Larry McVoy, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Tuesday 13 August 2002 01:29 pm, Rik van Riel wrote:
> On Tue, 13 Aug 2002, Linus Torvalds wrote:
> > Also, a license is a two-way street. I do not think it is morally right
> > to change an _existing_ license for any other reason than the fact that
> > it has some technical legal problem.
>
> Agreed, but we might be running into one of these.
>
> > I don't like patents. But I absolutely _hate_ people who play politics
> > with other peoples code. Be up-front, not sneaky after-the-fact.
>
> Suppose somebody sends you a patch which implements a nice
> algorithm that just happens to be patented by that same
> somebody.  You don't know about the patent.

That would be entrapment.  When they submit the patch, they're giving you an 
implied license to use it, even if they don't SAY so, just because they 
voluntarily submitted it and can't claim to be surpised it was then used, or 
that they didn't want it to be.  You could put up a heck of a defense in 
court on that one.

It's people who submit patches that use OTHER people's patents you have to 
worry about, and that's something you just can't filter for with the patent 
numbers rapidly approaching what, eight digits?

> Having a license that explicitly states that people who
> contribute and use Linux shouldn't sue you over it might
> prevent some problems.

Such a clause is what IBM insisted on having in ITS open source license.  You 
sue, your rights under this license terminate, which is basically automatic 
grounds for a countersuit for infringement.

(IBM has a lot of lawyers, and they pay them a lot of money.  It's 
conceivable they may actually have a point from time to time... :)

> regards,
>
> Rik

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:59                                                             ` Rik van Riel
@ 2002-08-13 13:35                                                               ` Rob Landley
  0 siblings, 0 replies; 110+ messages in thread
From: Rob Landley @ 2002-08-13 13:35 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

On Tuesday 13 August 2002 01:59 pm, Rik van Riel wrote:
> On Tue, 13 Aug 2002, Linus Torvalds wrote:
> > On Tue, 13 Aug 2002, Rik van Riel wrote:
> > > Having a license that explicitly states that people who
> > > contribute and use Linux shouldn't sue you over it might
> > > prevent some problems.
> >
> > The thing is, if you own the patent, and you sneaked the code into the
> > kernel, you will almost certainly be laughed out of court for trying to
> > enforce it.
>
> Apparently not everybody agrees on this:
>
> http://zdnet.com.com/2100-1106-884681.html

This is just a case of IBM's left hand not knowing what the right hand is 
doing.  An official representative of IBM gave statements to the committee 
that their contributions were unencumbered.  If he honestly was acting in his 
capacity as a representative of IBM, and had the authority to make that 
statement, then that statement IS permission equivalent to a royalty-free 
license to use the patent.

Going through court to prove this could, of course, take years and millions 
of dollars, and nobody's going to use the standard until it's resolved, which 
is why everybody's groaning that big blue is being either evil or really 
really stupid by not just giving in on this one.

It's a PR black eye for IBM ("We're big, we're blue, we're dumb") but doesn't 
change the nature of the legal arguments...

Any time ANYBODY sues you, no matter how frivolous, it could easily be long 
and exensive.  That's why you countersue for damages and get them to pay your 
costs for the trial if you win, plus punitive damages, plus pain and 
suffering, plus a stupidity tax, plus...)

This topic's wandering a bit far afield.  CC: list trimmed...

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 18:32                                                             ` Linus Torvalds
@ 2002-08-13 13:50                                                               ` Rob Landley
  0 siblings, 0 replies; 110+ messages in thread
From: Rob Landley @ 2002-08-13 13:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Alan Cox, Daniel Phillips, Larry McVoy, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Tuesday 13 August 2002 02:32 pm, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rob Landley wrote:
> > > Having a license that explicitly states that people who
> > > contribute and use Linux shouldn't sue you over it might
> > > prevent some problems.
> >
> > Such a clause is what IBM insisted on having in ITS open source license. 
> > You sue, your rights under this license terminate, which is basically
> > automatic grounds for a countersuit for infringement.
>
> Note that I personally think the "you screw with me, I screw with you"
> approach is a fine one. After all, the GPL is based on "you help me, I'll
> help you", so it fits fine.
>
> However, it doesn't work due to the distributed nature of the GPL. The FSF
> tried to do something like it in the GPL 3.0 discussions, and the end
> result was a total disaster. The GPL 3.0 suggestion was something along
> the lines of "you sue any GPL project, you lose all GPL rights". Which to
> me makes no sense at all - I could imagine that there might be some GPL
> project out there that _deserves_ getting sued(*) and it has nothing to do
> with Linux.

So this is another argument in favor of having the patent addendum be 
separate then.  Software patents as a class are basically evil, and valid 
ones are clearly the exception.  Copyrights are NOT evil (or at least are 
inherently more tightly focused), and valid ones are the rule.

There is also the legal precent of patent pools, which are an established 
legal concept as far as I know.  Joining a patent pool means you license all 
your patents to get a license to all their patents, and bringing a patent 
suit within the pool would violate your agreement and cut you off from the 
pool.  (If I'm wrong, somebody correct me on this please.)

The open source community's problem is that it historically hasn't had the 
entry fee to participate in this sort of arrangement, and solving it on a 
company by company basis doesn't help the community.  These days open source 
has a lot more resources than it used to.

I think Red Hat is actually trying to help on this front by getting patents 
and licensing them for use in GPL code.  By itself, this is not a solution, 
but it could be the seed of one...

Right, at this point I need to go bug a lawyer, I think...

> 		Linus
>
> (*) "GNU Emacs, the defendent, did inefariously conspire to play
> towers-of-hanoy, while under the guise of a harmless editor".

But remember, you can't spell "evil" without "vi"... :)

Rob

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13  8:40                                                 ` Rob Landley
@ 2002-08-13 15:06                                                   ` Alan Cox
  2002-08-13 11:36                                                     ` Rob Landley
  0 siblings, 1 reply; 110+ messages in thread
From: Alan Cox @ 2002-08-13 15:06 UTC (permalink / raw)
  To: Rob Landley
  Cc: Daniel Phillips, Larry McVoy, Rik van Riel, Linus Torvalds,
	frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, William Lee Irwin III, linux-kernel

On Tue, 2002-08-13 at 09:40, Rob Landley wrote:
> Unfortunately, the maintainer of the GPL is Stallman, so he's the logical guy 
> to spearhead a "GPL patent pool" project, but any time anybody mentions the 
> phrase "intellectual property" to him he goes off on a tangent about how you 
> shouldn't call anything "intellectual property", so how can you have a 
> discussion about it, and nothing ever gets done.  It's FRUSTRATING to see 
> somebody with such brilliant ideas hamstrung not just by idealism, but 
> PEDANTIC idealism.
> 

Richard isnt daft on this one. The FSF does not have the 30 million
dollars needed to fight a *single* US patent lawsuit. The problem also
reflects back on things like Debian, because Debian certainly cannot
afford to play the patent game either.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13  8:51                                             ` Rob Landley
@ 2002-08-13 16:47                                               ` Daniel Phillips
  2002-08-13 13:09                                                 ` Rob Landley
  0 siblings, 1 reply; 110+ messages in thread
From: Daniel Phillips @ 2002-08-13 16:47 UTC (permalink / raw)
  To: Rob Landley, Alan Cox, Linus Torvalds
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel

On Tuesday 13 August 2002 10:51, Rob Landley wrote:
> On Sunday 11 August 2002 07:44 pm, Daniel Phillips wrote:
> So you would object to microsoft granting rights to its patents saying "you 
> can use this patent in software that runs on windows, but use it on any other 
> platform and we'll sue you", but you don't mind going the other way?

You missed the point.  I was talking about using copyright against patents,
and specifically in the case where patents are held by people who also want
to use the copyrighted code.  The intention is to help keep our friends
honest.

Dealing with Microsoft, or anyone else whose only motivation is to obstruct,
is an entirely separate issue.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 11:36                                                     ` Rob Landley
@ 2002-08-13 16:51                                                       ` Linus Torvalds
  2002-08-13 12:53                                                         ` Rob Landley
                                                                           ` (2 more replies)
       [not found]                                                       ` <Pine.LNX.4.44.0208130942130.7411-100000@home.transmeta.com >
  1 sibling, 3 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-13 16:51 UTC (permalink / raw)
  To: Rob Landley
  Cc: Alan Cox, Daniel Phillips, Larry McVoy, Rik van Riel, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel


On Tue, 13 Aug 2002, Rob Landley wrote:
>
> Last time I really looked into all this, Stallman was trying to do an 
> enormous new GPL 3.0, addressing application service providers.  That seems 
> to have fallen though (as has the ASP business model), but the patent issue 
> remains unresolved.

At least one problem is exactly the politics played by the FSF, which
means that a lot of people (not just me), do not trust such new versions
of the GPL. Especially since the last time this happened, it all happened
in dark back-rooms, and I got to hear about it not off any of the lists,
but because I had an insider snitch on it.

I lost all respect I had for the FSF due to its sneakiness.

The kernel explicitly states that it is under the _one_ particular version 
of the "GPL v2" that is included with the kernel. Exactly because I do not 
want to have politics dragged into the picture by an external party (and 
I'm anal enough that I made sure that "version 2" cannot be misconstrued 
to include "version 2.1".

Also, a license is a two-way street. I do not think it is morally right to 
change an _existing_ license for any other reason than the fact that it 
has some technical legal problem. I intensely dislike the fact that many 
people seem to want to extend the current GPL as a way to take advantage 
of people who used the old GPL and agreed with _that_ - but not 
necessarily the new one.

As a result, every time this comes up, I ask for any potential new
"patent-GPL" to be a _new_ license, and not try to feed off existing
works. Please dopn't make it "GPL". Make it the GPPL for "General Public
Patent License" or something. And let people buy into it on its own
merits, not on some "the FSF decided unilaterally to make this decision
for us".

I don't like patents. But I absolutely _hate_ people who play politics 
with other peoples code. Be up-front, not sneaky after-the-fact.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 16:51                                                       ` Linus Torvalds
  2002-08-13 12:53                                                         ` Rob Landley
@ 2002-08-13 17:14                                                         ` Ruth Ivimey-Cook
  2002-08-13 17:29                                                         ` Rik van Riel
  2 siblings, 0 replies; 110+ messages in thread
From: Ruth Ivimey-Cook @ 2002-08-13 17:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rob Landley, Alan Cox, Daniel Phillips, Larry McVoy,
	Rik van Riel, frankeh, davidm, David Mosberger, David S. Miller,
	gh, Martin.Bligh, William Lee Irwin III, linux-kernel

On Tue, 13 Aug 2002, Linus Torvalds wrote:
>I don't like patents. But I absolutely _hate_ people who play politics 
>with other peoples code. Be up-front, not sneaky after-the-fact.


Well said :-)

Ruth

-- 
Ruth Ivimey-Cook
Software engineer and technical writer.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 16:51                                                       ` Linus Torvalds
  2002-08-13 12:53                                                         ` Rob Landley
  2002-08-13 17:14                                                         ` Ruth Ivimey-Cook
@ 2002-08-13 17:29                                                         ` Rik van Riel
  2002-08-13 13:18                                                           ` Rob Landley
                                                                             ` (3 more replies)
  2 siblings, 4 replies; 110+ messages in thread
From: Rik van Riel @ 2002-08-13 17:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rob Landley, Alan Cox, Daniel Phillips, Larry McVoy, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Tue, 13 Aug 2002, Linus Torvalds wrote:

> Also, a license is a two-way street. I do not think it is morally right
> to change an _existing_ license for any other reason than the fact that
> it has some technical legal problem.

Agreed, but we might be running into one of these.

> I don't like patents. But I absolutely _hate_ people who play politics
> with other peoples code. Be up-front, not sneaky after-the-fact.

Suppose somebody sends you a patch which implements a nice
algorithm that just happens to be patented by that same
somebody.  You don't know about the patent.

You integrate the patch into the kernel and distribute it,
one year later you get sued by the original contributor of
that patch because you distribute code that is patented by
that person.

Not having some protection in the license could open you
up to sneaky after-the-fact problems.

Having a license that explicitly states that people who
contribute and use Linux shouldn't sue you over it might
prevent some problems.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:29                                                         ` Rik van Riel
  2002-08-13 13:18                                                           ` Rob Landley
@ 2002-08-13 17:45                                                           ` Alexander Viro
  2002-08-13 17:55                                                           ` Linus Torvalds
  2002-08-22 12:03                                                           ` bill davidsen
  3 siblings, 0 replies; 110+ messages in thread
From: Alexander Viro @ 2002-08-13 17:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Rob Landley, Alan Cox, Daniel Phillips,
	Larry McVoy, frankeh, davidm, David Mosberger, David S. Miller,
	gh, Martin.Bligh, William Lee Irwin III, linux-kernel



On Tue, 13 Aug 2002, Rik van Riel wrote:

> Suppose somebody sends you a patch which implements a nice
> algorithm that just happens to be patented by that same
> somebody.  You don't know about the patent.
> 
> You integrate the patch into the kernel and distribute it,
> one year later you get sued by the original contributor of
> that patch because you distribute code that is patented by
> that person.
> 
> Not having some protection in the license could open you
> up to sneaky after-the-fact problems.

Accepting non-trivial patches from malicious source means running code
from malicious source on your boxen.  In kernel mode.  And in that case
patents are the least of your troubles...


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:29                                                         ` Rik van Riel
  2002-08-13 13:18                                                           ` Rob Landley
  2002-08-13 17:45                                                           ` Alexander Viro
@ 2002-08-13 17:55                                                           ` Linus Torvalds
  2002-08-13 17:59                                                             ` Rik van Riel
  2002-08-13 19:12                                                             ` Daniel Phillips
  2002-08-22 12:03                                                           ` bill davidsen
  3 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2002-08-13 17:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Rob Landley, Alan Cox, Daniel Phillips, Larry McVoy, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel


On Tue, 13 Aug 2002, Rik van Riel wrote:
> 
> Having a license that explicitly states that people who
> contribute and use Linux shouldn't sue you over it might
> prevent some problems.

The thing is, if you own the patent, and you sneaked the code into the
kernel, you will almost certainly be laughed out of court for trying to
enforce it.

And if somebody else owns the patent, no amount of copyright license makes 
any difference.

		Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:55                                                           ` Linus Torvalds
@ 2002-08-13 17:59                                                             ` Rik van Riel
  2002-08-13 13:35                                                               ` Rob Landley
  2002-08-13 19:12                                                             ` Daniel Phillips
  1 sibling, 1 reply; 110+ messages in thread
From: Rik van Riel @ 2002-08-13 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rob Landley, Alan Cox, Daniel Phillips, Larry McVoy, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Tue, 13 Aug 2002, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rik van Riel wrote:
> >
> > Having a license that explicitly states that people who
> > contribute and use Linux shouldn't sue you over it might
> > prevent some problems.
>
> The thing is, if you own the patent, and you sneaked the code into the
> kernel, you will almost certainly be laughed out of court for trying to
> enforce it.

Apparently not everybody agrees on this:

http://zdnet.com.com/2100-1106-884681.html

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 13:18                                                           ` Rob Landley
@ 2002-08-13 18:32                                                             ` Linus Torvalds
  2002-08-13 13:50                                                               ` Rob Landley
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2002-08-13 18:32 UTC (permalink / raw)
  To: Rob Landley
  Cc: Rik van Riel, Alan Cox, Daniel Phillips, Larry McVoy, frankeh,
	davidm, David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel


On Tue, 13 Aug 2002, Rob Landley wrote:
> 
> > Having a license that explicitly states that people who
> > contribute and use Linux shouldn't sue you over it might
> > prevent some problems.
> 
> Such a clause is what IBM insisted on having in ITS open source license.  You 
> sue, your rights under this license terminate, which is basically automatic 
> grounds for a countersuit for infringement.

Note that I personally think the "you screw with me, I screw with you"  
approach is a fine one. After all, the GPL is based on "you help me, I'll
help you", so it fits fine.

However, it doesn't work due to the distributed nature of the GPL. The FSF
tried to do something like it in the GPL 3.0 discussions, and the end
result was a total disaster. The GPL 3.0 suggestion was something along
the lines of "you sue any GPL project, you lose all GPL rights". Which to 
me makes no sense at all - I could imagine that there might be some GPL 
project out there that _deserves_ getting sued(*) and it has nothing to do 
with Linux.

		Linus

(*) "GNU Emacs, the defendent, did inefariously conspire to play 
towers-of-hanoy, while under the guise of a harmless editor".


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd)
       [not found]                                                       ` <Pine.LNX.4.44.0208130942130.7411-100000@home.transmeta.com >
@ 2002-08-13 18:46                                                         ` Mike Galbraith
  0 siblings, 0 replies; 110+ messages in thread
From: Mike Galbraith @ 2002-08-13 18:46 UTC (permalink / raw)
  To: linux-kernel


>Also, a license is a two-way street. I do not think it is morally right to
>change an _existing_ license for any other reason than the fact that it
>has some technical legal problem. I intensely dislike the fact that many
>people seem to want to extend the current GPL as a way to take advantage
>of people who used the old GPL and agreed with _that_ - but not
>necessarily the new one.

Amen.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:55                                                           ` Linus Torvalds
  2002-08-13 17:59                                                             ` Rik van Riel
@ 2002-08-13 19:12                                                             ` Daniel Phillips
  1 sibling, 0 replies; 110+ messages in thread
From: Daniel Phillips @ 2002-08-13 19:12 UTC (permalink / raw)
  To: Linus Torvalds, Rik van Riel
  Cc: Rob Landley, Alan Cox, Larry McVoy, frankeh, davidm,
	David Mosberger, David S. Miller, gh, Martin.Bligh,
	William Lee Irwin III, linux-kernel

On Tuesday 13 August 2002 19:55, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rik van Riel wrote:
> > 
> > Having a license that explicitly states that people who
> > contribute and use Linux shouldn't sue you over it might
> > prevent some problems.
> 
> The thing is, if you own the patent, and you sneaked the code into the
> kernel, you will almost certainly be laughed out of court for trying to
> enforce it.
> 
> And if somebody else owns the patent, no amount of copyright license makes 
> any difference.

I don't think that's correct.  SGI needs to use and distribute Linux more
than they need to enforce their reverse mapping patents against Linux
users.

-- 
Daniel

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-13 17:29                                                         ` Rik van Riel
                                                                             ` (2 preceding siblings ...)
  2002-08-13 17:55                                                           ` Linus Torvalds
@ 2002-08-22 12:03                                                           ` bill davidsen
  3 siblings, 0 replies; 110+ messages in thread
From: bill davidsen @ 2002-08-22 12:03 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.44L.0208131425500.23404-100000@imladris.surriel.com>,
Rik van Riel  <riel@conectiva.com.br> wrote:

| Suppose somebody sends you a patch which implements a nice
| algorithm that just happens to be patented by that same
| somebody.  You don't know about the patent.
| 
| You integrate the patch into the kernel and distribute it,
| one year later you get sued by the original contributor of
| that patch because you distribute code that is patented by
| that person.
| 
| Not having some protection in the license could open you
| up to sneaky after-the-fact problems.
| 
| Having a license that explicitly states that people who
| contribute and use Linux shouldn't sue you over it might
| prevent some problems.

Unlikely as this is, since offering the patch would probably be
(eventually) interpreted as giving you the right to use it under GPL, I
think this is a valid concern.

Maybe some lawyer could add the required words and it could become the
LFSL v1.0 (Linux Free Software License). Although FSF would probably buy
into a change if the alternative was creation of a Linux license. There
are people there who are in touch with reality.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: large page patch (fwd) (fwd)
@ 2002-08-09 17:51 Seth, Rohit
  0 siblings, 0 replies; 110+ messages in thread
From: Seth, Rohit @ 2002-08-09 17:51 UTC (permalink / raw)
  To: 'Daniel Phillips', Linus Torvalds
  Cc: frankeh, davidm, David Mosberger, David S. Miller, gh,
	Martin.Bligh, wli, linux-kernel



> -----Original Message-----
> From: Daniel Phillips [mailto:phillips@arcor.de]
> Sent: Friday, August 09, 2002 10:12 AM
> To: Linus Torvalds
> Cc: frankeh@watson.ibm.com; davidm@hpl.hp.com; David 
> Mosberger; David S.
> Miller; gh@us.ibm.com; Martin.Bligh@us.ibm.com; wli@holomorphy.com;
> linux-kernel@vger.kernel.org
> Subject: Re: large page patch (fwd) (fwd)
> 
> 
> On Friday 09 August 2002 18:51, Linus Torvalds wrote:
> > On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > > Slab allocations would not have GFP_DEFRAG (I mistakenly 
> wrote GFP_LARGE 
> > > earlier) and so would be allocated outside ZONE_LARGE.
> > 
> > .. at which poin tyou then get zone balancing problems.
> > 
> > Or we end up with the same kind of special zone that we 
> have _anyway_ in
> > the current large-page patch, in which case the point of 
> doing this is
> > what?
> 
> The current large-page patch doesn't have any kind of 
> defragmentation in the 
> special zone and that memory is just not available for other 
> uses.  The thing 
> is, when demand for large pages is low the zone should be 
> allowed to fragment.
> 

You are right that as long as the pages are in large page pool they are not
available for other regualr purposes.  Though the current implementation
basically allows on-demand moving of pages between large_page and other
regular pools using sysctl interface.   The issue is really not forced (in
the sense that large pages are freed only if they are available and vice
versa).  And it will not be an issue where demand for large pages is low.
Theoritically you can extend this support in pageout daemon to find out if
it can retrieve some free large pages (for environments where expectations
are that most of the memory will be used for large pages but actual usage is
not as per the expectations. Though I doubt if those environments will
occur, but bad configurations are always there)  The current approach really
allows the large page/regular_page movement without doing too much of house
cleaning.  It is likely that once a large page goes back to general pool, it
will not easy to replenish the large_page pool because of fragmentation in
regular memory pool (for memory starved machines.  For the scenarios where
sometime the machine is running low on regular memory and sometimes on
large_pages....probably it would be a good idea to add in more RAM in these
cases.).
> 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-06 20:38 Luck, Tony
@ 2002-08-06 21:03 ` Hubertus Franke
  0 siblings, 0 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-06 21:03 UTC (permalink / raw)
  To: Luck, Tony, Seth, Rohit, Linus Torvalds; +Cc: linux-kernel

On Tuesday 06 August 2002 04:38 pm, Luck, Tony wrote:
> > > 4GB TLB entry size ???
> >
> > I assume you mean 4MB TLB entry size or did I fall
> > into a coma for 10 years
>
> That wasn't a typo ... Itanium2 supports page sizes up
> to 4 Gigabytes.  Databases (well, Oracle for sure) want
> to use those huge TLB entries to map their multi-gigabyte
> shared memory areas.
>
> -Tony

Whooooowww...  Power4 I believe opted out at 16MB.
So the story about sleeping beauty is true :-).

Wouldn't want to manage the 8-32 physical pages of memory through the VM.
Paging not an option, file access irrelevant.

In that case I agree that it should be handled by a special purpose
extension like Seth's patch to cover a 4GB page. 

Upto 4MB or so I still believe going the other way is proper.
More later... thanks for the info.

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: large page patch (fwd) (fwd)
@ 2002-08-06 20:38 Luck, Tony
  2002-08-06 21:03 ` Hubertus Franke
  0 siblings, 1 reply; 110+ messages in thread
From: Luck, Tony @ 2002-08-06 20:38 UTC (permalink / raw)
  To: 'frankeh@watson.ibm.com', Seth, Rohit, Linus Torvalds
  Cc: linux-kernel, linux-mm

> > 4GB TLB entry size ??? 
> I assume you mean 4MB TLB entry size or did I fall
> into a coma for 10 years

That wasn't a typo ... Itanium2 supports page sizes up
to 4 Gigabytes.  Databases (well, Oracle for sure) want
to use those huge TLB entries to map their multi-gigabyte
shared memory areas.

-Tony


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-05 23:30 Seth, Rohit
  2002-08-06  5:01 ` David Mosberger
@ 2002-08-06 19:11 ` Hubertus Franke
  1 sibling, 0 replies; 110+ messages in thread
From: Hubertus Franke @ 2002-08-06 19:11 UTC (permalink / raw)
  To: Seth, Rohit, Linus Torvalds; +Cc: linux-kernel, linux-mm

On Monday 05 August 2002 07:30 pm, Seth, Rohit wrote:
> > -----Original Message-----
> > From: Hubertus Franke [mailto:frankeh@watson.ibm.com]
> > Sent: Sunday, August 04, 2002 12:30 PM
> > To: Linus Torvalds
> > Cc: David S. Miller; davidm@hpl.hp.com; davidm@napali.hpl.hp.com;
> > gh@us.ibm.com; Martin.Bligh@us.ibm.com; wli@holomorphy.com;
> > linux-kernel@vger.kernel.org
> > Subject: Re: large page patch (fwd) (fwd)
> >
> > Well, in what you described above there is no concept of superpages
> > the way it is defined for the purpose of <tracking> and <TLB overhead
> > reduction>.
> > If you don't know about super pages at the VM level, then you need to
> > deal with them at TLB fault level to actually create the <large TLB>
> > entry. That what the INTC patch will do, namely throughing all the
> > complexity over the fence for the page fault.
>
> Our patch does the preallocation of large pages at the time of request.
> There is really nothing special like replicating PTEs (that you mentioned
> below in your design) happens there. In any case,  even for IA-64 where the
> TLBs are also sw controlled (we also have Hardware Page Walker that can
> walk any 3rd level pt and insert the PTE in TLB.) there are almost no
> changes (to be precise one additional asm instructionin the begining of
> handler for shifting extra bits) in our implementation that pollute the low
> level TLB fault handlers to have the knowledge of large page size in
> traversing the 3-level page table. (Though there are couple of other asm
> instructions that are added in this low-level routine to set helping
> register with proper page_size while inserting bigger TLBs).  On IA-32
> obviously things fall in place automagically as the page tables are setup
> as per arch.
>

Hi, I quite apparent from your answer that I didn't make our approach and 
intent clear.
I don't mean to discredit your approach in anyway, its quick, its special 
purpose and does what it does namely anonymous memory for large pages
supported by a particular architecture on specific request in an efficient 
manner with absolutely no overhead on the base kernel. Agreed that
the interface is up for negotiation but that has nothing to do with the 
essense its estatics and other API arguments.

Our intent has been, if you followed our presentation at OLS or on the web,
to build a multiple page size support that spans (i) mmap'ed files, (ii) shm 
segments and (iii) anonymous files and memory, hence covers the page cache 
and eventually the swap system as well. The target would really to be like 
the Rice BSD paper with automatic promotion/demotion and reservation. 
Ofcourse its up to discussions whether that is any good or just overkill and
useless in the face that if important apps could simply use a special purpose 
interface and force the issue.

We are nowhere close to there. More analysis is required to even establish 
that fragmentation, large page aware defragmenter etc. won't kill any TLB 
overhead performance gains seen. The Rice Paper is one reference point.
But its important to point out that this is what the OS research community 
wants to see.
This can NOT be accomplished in one big patch, several intermediate steps are 
required.
(a) The first one is to demonstrate that large pages for anonymous memory 
(shared through fork and non-shared) can be integrated effortless into the 
current VM 
code with no overhead an almost no major cludges and code messups.....
Doing so would give the benefit for every architecture. In essense what needs 
to be provided is a few low level macros to force the page order into the 
PTE/PMD entry.
(b) The second one is to extend the concept to the I/O to be able to back 
regions with files.

We are closed to be done with (a) having pulled out the stuff out of Simon's 
patch for 2.4.18 and moved it up to 2.5.30. We will retrofit (b) later.

The next confusion is that of the definition of a large page and a super page.
I am guilty too of mixing these up every now and then.
SuperPage:   a cluster of contiguous physical pages that is treated by the OS
                   as a single entity. Operations are on superpages, including
                   tracking of Dirty, referenced, active .... although 
                   they might be broken down in smaller operations on their 
                   base pages.
Large Page:  A superpage that coincides with pagesize supported by the HW.

In your case you are clearly dealing with <LargePages> as a special memory 
area that is intercepted at fault time and specially dealt with.
Our case, essentially supports the super page concept as defined above.
Not all is implemented and the x86 prototype was focused on the
SuperPage=LargePage issue. 
Continuation below ....

> > In your case not keeping track of the super pages in the
> > VM layer and PT layer requires to discover the large page at soft TLB
> > time by scanning PT proximity for contigous pages if we are
> > talking now
> > about the read_ahead ....
> > In our case, we store the same physical address of the super page
> > in the PTEs spanning the superpage together with the page order.
> > At software TLB time we simply extra the single PTE from the PT based
> > on the faulting address and move it into the TLB. This
> > ofcourse works only
> > for software TLBs (PwrPC, MIPS, IA64). For HW TLB (x86) the
> > PT structure
> > by definition overlaps the large page size support.
> > The HW TLB case can be extended to not store the same PA in
> > all the PTEs,
> > but conceptually carry the superpage concept for the purpose
> > described above.
>
> I'm afraid you may be wasting a lot of extra memory by replicaitng these
> PTEs(Take an example of one 4G large TLB size entry and assume there are
> few hunderd processes using that same physical page.)
>
4GB TLB entry size ??? 
I assume you mean 4MB TLB entry size or did I fall into a coma for 10 years 
:-)

Well, the way it is architected is that all translations (including 
superpages) are stored in the PT. Should a superpage coincide with
a PMD we do not allocate the lowest level of (1<<PMD_SHIFT) entries and
record that fact in the PMD. 
In case the super page is smaller then a PMD (e.g. 4MB), then entries need
to be created in the PTEs. One can dream up some optimizations, that only
a single entry needs to be created, but that only works for SW-TLB. 
For HW-TLB (x86) there is no choice but doing so.
For SW-TLB, this probably works the same as you stuff (only have seen the 
HW-TLB x86 port). You store this one entry in the PT at the basepage 
translation of the superpage and walk at page fault time. This can be done 
equally in architecture independent code as far as I can tell, with the same
tricks you are mentioning above dressed up in architecture dependent code.
super 

> > We have that concept exactly the way you want it, but the dress code
> > seems to be wrong. That can be worked on.
> > Our goal was in the long run 2.7 to explore the Rice approach to see
> > whether it yields benefits or whether we getting down the road of
> > fragmentation reduction overhead that will kill all the
> > benefits we get
> > from reduced TLB overhead. Time would tell.
> >
> > But to go down this route we need the concept of a superpage
> > in the VM,
> > not just at TLB time or a hack that throws these things over
> > the fence.
>
> As others have already said that you may want to have the support of
> smaller superpages in this way.  Where VM is embeded with some knowledge of
> different page sizes that it can support.  Demoting and permoting pages
> from one size to another (efficiently)will be very critical in the design.
> In my opinion supporting the largest TLB on archs (like 256M or 4G) will
> need more direct appraoch and less intrusion from kernel VM will be
> prefered. Ofcourse, kernel will need to put extra checks etc. to maintain
> some sanity for allowed users.
>

Yipp, that's the goal. Just think that in general coverage of up to decently 
large TLB entry sizes (4MB) can be handled with our approach and it would 
essentially allow the filemaps and general shm segments.

> There has already been lot of discussion on this mailing list about what is
> the right approach.  Whether the new APIs are needed or something like
> madvise would do it, whether kernel needs to allocate large_pages
> transparently to the user or we should expose the underlying HW feature to
> user land.  There are issues that favor one approach over another.  But the
> bottom line is: 1) We should not break anything semantically for regular
> system calls that happen to be using large TLBs and 2) The performance
> advantage of this HW feature (on most of the archs I hope) is just too much
> to let go without notice.  I hope we get to consensus for getting this
> support in kernel ASAP.  This will benefit lot of Linux users.  (And yes I
> understand that we need to do things right in kernel so that we don't make
> unforeseen errors.)
>

Absolutely, well phrased. The "+"s and "-"s of each approach have been 
pointed out by many folks and papers. Your stuff certainly provides a short 
term solution that works. After OLS I was hoping to look from it at a long 
term solution ala Rice BSD approach. 

> > > And no, I do not want separate coloring support in the
> >
> > allocator. I think
> >
> > > coloring without superpage support is stupid and worthless (and
> > > complicates the code for no good reason).
> > >
> > > 		Linus
> >
> > That <stupid> seems premature. You are mixing the concept of
> > superpage from a TLB miss reduction perspective
> > with the concept of superpage for page coloring.
>
> I have seen couple of HPC apps that try to fit (configure) in their data
> sets on the L3 caches size (Like on IA-64 4M).  I think these are the apps
> that really get hit hardest by lack of proper page coloring support in
> Linux kernel.  The performance variation of these workloads from run to run
> could be as much as 60%  And with the page coloring patch, these apps seems
> to be giving consistent higher throuput (The real bad part is that once the
> throughput of these workloads drop, it stays down thereafter :( )  But
> seems like DavidM has enough real world data that prohibits the use of this
> approach in kernel for real world scenarios.  The good part of large TLBs
> is that, TLBs larger than CPU cache size will automatically get you perfect
> page coloring .........for free.
>
> rohit

Yipp.... 
let's see how it evolves. 
As DavidM so elegantly stated:  talk=BS  walk=code    :-)

Cheers....

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-06  5:08       ` David S. Miller
@ 2002-08-06  5:32         ` David Mosberger
  0 siblings, 0 replies; 110+ messages in thread
From: David Mosberger @ 2002-08-06  5:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: davidm, rohit.seth, frankeh, torvalds, gh, Martin.Bligh, wli,
	linux-kernel

>>>>> On Mon, 05 Aug 2002 22:08:36 -0700 (PDT), "David S. Miller" <davem@redhat.com> said:

  >    From: David Mosberger <davidm@napali.hpl.hp.com> Date:
  > Mon, 5 Aug 2002 22:19:24 -0700

  >    Sounds great if you have the hardware that can do it.  Not
  > too many CPUs I know of support it.

  DaveM> Of course, and the fact that nobody has put it into silicon
  DaveM> may be a suggestion of how useful the feature really is :-)

My thought exactly! ;-)

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-06  4:58   ` David S. Miller
@ 2002-08-06  5:19     ` David Mosberger
  2002-08-06  5:08       ` David S. Miller
  0 siblings, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-06  5:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: davidm, rohit.seth, frankeh, torvalds, gh, Martin.Bligh, wli,
	linux-kernel

>>>>> On Mon, 05 Aug 2002 21:58:17 -0700 (PDT), "David S. Miller" <davem@redhat.com> said:

  >>    From: David Mosberger <davidm@napali.hpl.hp.com> Date:
  >> Mon, 5 Aug 2002 22:01:16 -0700

  >>    In my opinion, this is perhaps the strongest argument
  >> *for* a separate "giant page" syscall interface.  It will be
  >> very hard (perhaps impossible) to optimize superpages to work
  >> efficiently when the ratio of superpage/basepage grows huge
  >> (as, by definition, the kernel would manage them as a set of
  >> basepages).

  DaveM> Actually, this is one of the reasons there was a lot of
  DaveM> research into using sub-page clustering for large mappings in
  DaveM> the TLB.  Basically how this worked is that for a superpage,
  DaveM> you could stick multiple sub-mappings into the entry such
  DaveM> that you didn't need a fully physically contiguous superpage.

Sounds great if you have the hardware that can do it.  Not too many
CPUs I know of support it.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-06  5:19     ` David Mosberger
@ 2002-08-06  5:08       ` David S. Miller
  2002-08-06  5:32         ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: David S. Miller @ 2002-08-06  5:08 UTC (permalink / raw)
  To: davidm, davidm
  Cc: rohit.seth, frankeh, torvalds, gh, Martin.Bligh, wli, linux-kernel

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Mon, 5 Aug 2002 22:19:24 -0700
   
   Sounds great if you have the hardware that can do it.  Not too many
   CPUs I know of support it.

Of course, and the fact that nobody has put it into silicon may
be a suggestion of how useful the feature really is :-)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: large page patch (fwd) (fwd)
  2002-08-05 23:30 Seth, Rohit
@ 2002-08-06  5:01 ` David Mosberger
  2002-08-06  4:58   ` David S. Miller
  2002-08-06 19:11 ` Hubertus Franke
  1 sibling, 1 reply; 110+ messages in thread
From: David Mosberger @ 2002-08-06  5:01 UTC (permalink / raw)
  To: Seth, Rohit
  Cc: 'frankeh@watson.ibm.com',
	Linus Torvalds, David S. Miller, davidm, gh, Martin.Bligh, wli,
	linux-kernel

>>>>> On Mon, 5 Aug 2002 16:30:54 -0700 , "Seth, Rohit" <rohit.seth@intel.com> said:

  Rohit> I'm afraid you may be wasting a lot of extra memory by
  Rohit> replicaitng these PTEs(Take an example of one 4G large TLB
  Rohit> size entry and assume there are few hunderd processes using
  Rohit> that same physical page.)

In my opinion, this is perhaps the strongest argument *for* a separate
"giant page" syscall interface.  It will be very hard (perhaps
impossible) to optimize superpages to work efficiently when the ratio
of superpage/basepage grows huge (as, by definition, the kernel would
manage them as a set of basepages).  For example, even if we used a
base page-size of 64KB, a 4GB giant page (as supported by Itanium 2)
would correspond to 65536 base pages.  A superpage of this size would
almost certainly still do a lot better than 65536 base pages, but
compared to a single giant page, it probably stands no chance
performance-wise.

	--david

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-06  5:01 ` David Mosberger
@ 2002-08-06  4:58   ` David S. Miller
  2002-08-06  5:19     ` David Mosberger
  0 siblings, 1 reply; 110+ messages in thread
From: David S. Miller @ 2002-08-06  4:58 UTC (permalink / raw)
  To: davidm, davidm
  Cc: rohit.seth, frankeh, torvalds, gh, Martin.Bligh, wli, linux-kernel

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Mon, 5 Aug 2002 22:01:16 -0700
   
   In my opinion, this is perhaps the strongest argument *for* a separate
   "giant page" syscall interface.  It will be very hard (perhaps
   impossible) to optimize superpages to work efficiently when the ratio
   of superpage/basepage grows huge (as, by definition, the kernel would
   manage them as a set of basepages).

Actually, this is one of the reasons there was a lot of research into
using sub-page clustering for large mappings in the TLB.  Basically
how this worked is that for a superpage, you could stick multiple
sub-mappings into the entry such that you didn't need a fully
physically contiguous superpage.

It's talked about in one of the Talluri papers.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: large page patch (fwd) (fwd)
@ 2002-08-05 23:30 Seth, Rohit
  2002-08-06  5:01 ` David Mosberger
  2002-08-06 19:11 ` Hubertus Franke
  0 siblings, 2 replies; 110+ messages in thread
From: Seth, Rohit @ 2002-08-05 23:30 UTC (permalink / raw)
  To: 'frankeh@watson.ibm.com', Linus Torvalds
  Cc: David S. Miller, davidm, davidm, gh, Martin.Bligh, wli, linux-kernel



> -----Original Message-----
> From: Hubertus Franke [mailto:frankeh@watson.ibm.com]
> Sent: Sunday, August 04, 2002 12:30 PM
> To: Linus Torvalds
> Cc: David S. Miller; davidm@hpl.hp.com; davidm@napali.hpl.hp.com;
> gh@us.ibm.com; Martin.Bligh@us.ibm.com; wli@holomorphy.com;
> linux-kernel@vger.kernel.org
> Subject: Re: large page patch (fwd) (fwd)
> 
> Well, in what you described above there is no concept of superpages
> the way it is defined for the purpose of <tracking> and <TLB overhead 
> reduction>. 
> If you don't know about super pages at the VM level, then you need to
> deal with them at TLB fault level to actually create the <large TLB> 
> entry. That what the INTC patch will do, namely throughing all the 
> complexity over the fence for the page fault.
Our patch does the preallocation of large pages at the time of request.
There is really nothing special like replicating PTEs (that you mentioned
below in your design) happens there. In any case,  even for IA-64 where the
TLBs are also sw controlled (we also have Hardware Page Walker that can walk
any 3rd level pt and insert the PTE in TLB.) there are almost no changes (to
be precise one additional asm instructionin the begining of handler for
shifting extra bits) in our implementation that pollute the low level TLB
fault handlers to have the knowledge of large page size in traversing the
3-level page table. (Though there are couple of other asm instructions that
are added in this low-level routine to set helping register with proper
page_size while inserting bigger TLBs).  On IA-32 obviously things fall in
place automagically as the page tables are setup as per arch.

> In your case not keeping track of the super pages in the 
> VM layer and PT layer requires to discover the large page at soft TLB 
> time by scanning PT proximity for contigous pages if we are 
> talking now 
> about the read_ahead ....
> In our case, we store the same physical address of the super page 
> in the PTEs spanning the superpage together with the page order.
> At software TLB time we simply extra the single PTE from the PT based
> on the faulting address and move it into the TLB. This 
> ofcourse works only
> for software TLBs (PwrPC, MIPS, IA64). For HW TLB (x86) the 
> PT structure
> by definition overlaps the large page size support.
> The HW TLB case can be extended to not store the same PA in 
> all the PTEs,
> but conceptually carry the superpage concept for the purpose 
> described above.
> 
I'm afraid you may be wasting a lot of extra memory by replicaitng these
PTEs(Take an example of one 4G large TLB size entry and assume there are few
hunderd processes using that same physical page.)

> We have that concept exactly the way you want it, but the dress code 
> seems to be wrong. That can be worked on.
> Our goal was in the long run 2.7 to explore the Rice approach to see
> whether it yields benefits or whether we getting down the road of 
> fragmentation reduction overhead that will kill all the 
> benefits we get
> from reduced TLB overhead. Time would tell.
> 
> But to go down this route we need the concept of a superpage 
> in the VM,
> not just at TLB time or a hack that throws these things over 
> the fence. 
> 
As others have already said that you may want to have the support of smaller
superpages in this way.  Where VM is embeded with some knowledge of
different page sizes that it can support.  Demoting and permoting pages from
one size to another (efficiently)will be very critical in the design. In my
opinion supporting the largest TLB on archs (like 256M or 4G) will need more
direct appraoch and less intrusion from kernel VM will be prefered.
Ofcourse, kernel will need to put extra checks etc. to maintain some sanity
for allowed users. 

There has already been lot of discussion on this mailing list about what is
the right approach.  Whether the new APIs are needed or something like
madvise would do it, whether kernel needs to allocate large_pages
transparently to the user or we should expose the underlying HW feature to
user land.  There are issues that favor one approach over another.  But the
bottom line is: 1) We should not break anything semantically for regular
system calls that happen to be using large TLBs and 2) The performance
advantage of this HW feature (on most of the archs I hope) is just too much
to let go without notice.  I hope we get to consensus for getting this
support in kernel ASAP.  This will benefit lot of Linux users.  (And yes I
understand that we need to do things right in kernel so that we don't make
unforeseen errors.)
> 
> > And no, I do not want separate coloring support in the 
> allocator. I think
> > coloring without superpage support is stupid and worthless (and
> > complicates the code for no good reason).
> >
> > 		Linus
> 
> That <stupid> seems premature. You are mixing the concept of 
> superpage from a TLB miss reduction perspective 
> with the concept of superpage for page coloring. 
> 
>
I have seen couple of HPC apps that try to fit (configure) in their data
sets on the L3 caches size (Like on IA-64 4M).  I think these are the apps
that really get hit hardest by lack of proper page coloring support in Linux
kernel.  The performance variation of these workloads from run to run could
be as much as 60%  And with the page coloring patch, these apps seems to be
giving consistent higher throuput (The real bad part is that once the
throughput of these workloads drop, it stays down thereafter :( )  But seems
like DavidM has enough real world data that prohibits the use of this
approach in kernel for real world scenarios.  The good part of large TLBs is
that, TLBs larger than CPU cache size will automatically get you perfect
page coloring .........for free. 

rohit

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
  2002-08-04 20:20     ` Andi Kleen
@ 2002-08-04 23:51       ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2002-08-04 23:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Hubertus Franke, David S. Miller, davidm, davidm,
	gh, Martin.Bligh, wli, linux-kernel, torvalds

Andi Kleen <ak@suse.de> writes:

> Andrew Morton <akpm@zip.com.au> writes:
> 
> > If we instead clear out 4 or 8 pages, we trash a ton of cache and
> > the chances of userspace _using_ pages 1-7 in the short-term are
> > lower.   We could clear the pages with 7,6,5,4,3,2,1,0 ordering,
> > but the cache implications of faultahead are still there.
> 
> What you could do on modern x86 and probably most other architectures as 
> well is to clear the faulted page in cache and clear the other pages
> with a non temporal write. The non temporal write will go straight
> to main memory and not pollute any caches. 

Plus a non temporal write is 3x faster than a write that lands in
the cache on x86 (tested on Athlons, P4, & P3).
 
> When the process accesses it later it has to fetch the zeroes from
> main memory. This is probably still faster than a page fault at least
> for the first few accesses. It could be more costly when walking the full
> page (then the added up cache miss costs could exceed the page fault cost), 
> but then hopefully the CPU will help by doing hardware prefetch.
> 
> It could help or not help, may be worth a try at least :-)

Certainly.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: large page patch (fwd) (fwd)
       [not found]   ` <3D4D7F24.10AC4BDB@zip.com.au.suse.lists.linux.kernel>
@ 2002-08-04 20:20     ` Andi Kleen
  2002-08-04 23:51       ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Andi Kleen @ 2002-08-04 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hubertus Franke, David S. Miller, davidm, davidm, gh,
	Martin.Bligh, wli, linux-kernel, torvalds

Andrew Morton <akpm@zip.com.au> writes:

> If we instead clear out 4 or 8 pages, we trash a ton of cache and
> the chances of userspace _using_ pages 1-7 in the short-term are
> lower.   We could clear the pages with 7,6,5,4,3,2,1,0 ordering,
> but the cache implications of faultahead are still there.

What you could do on modern x86 and probably most other architectures as 
well is to clear the faulted page in cache and clear the other pages
with a non temporal write. The non temporal write will go straight
to main memory and not pollute any caches. 

When the process accesses it later it has to fetch the zeroes from
main memory. This is probably still faster than a page fault at least
for the first few accesses. It could be more costly when walking the full
page (then the added up cache miss costs could exceed the page fault cost), 
but then hopefully the CPU will help by doing hardware prefetch.

It could help or not help, may be worth a try at least :-)

-Andi


^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2002-08-22 12:06 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <E17ahdi-0001RC-00@w-gerrit2>
2002-08-02 19:34 ` large page patch (fwd) (fwd) Linus Torvalds
2002-08-03  3:19   ` David Mosberger
2002-08-03  3:32     ` Linus Torvalds
2002-08-03  4:17       ` David Mosberger
2002-08-03  4:26         ` Linus Torvalds
2002-08-03  4:39           ` David Mosberger
2002-08-03  5:20             ` David S. Miller
2002-08-03 17:35               ` Linus Torvalds
2002-08-03 19:30                 ` David Mosberger
2002-08-03 19:43                   ` Linus Torvalds
2002-08-03 21:18                     ` David Mosberger
2002-08-03 21:54                       ` Hubertus Franke
2002-08-04  0:35                         ` David S. Miller
2002-08-04  2:25                           ` David Mosberger
2002-08-04 17:19                             ` Hubertus Franke
2002-08-09 15:20                               ` Daniel Phillips
2002-08-09 15:56                                 ` Linus Torvalds
2002-08-09 16:15                                   ` Daniel Phillips
2002-08-09 16:31                                     ` Rik van Riel
2002-08-09 18:08                                       ` Daniel Phillips
2002-08-09 16:51                                     ` Linus Torvalds
2002-08-09 17:11                                       ` Daniel Phillips
2002-08-09 16:27                                   ` Rik van Riel
2002-08-09 16:52                                     ` Linus Torvalds
2002-08-09 17:40                                       ` yodaiken
2002-08-09 19:15                                         ` Rik van Riel
2002-08-09 21:20                                           ` Linus Torvalds
2002-08-09 21:19                                         ` Marcin Dalecki
2002-08-09 17:46                                       ` Bill Rugolsky Jr.
2002-08-12  9:23                                     ` Helge Hafting
2002-08-13  3:15                                       ` Bill Davidsen
2002-08-13  3:31                                         ` Rik van Riel
2002-08-13  7:28                                         ` Helge Hafting
2002-08-09 21:38                                   ` Andrew Morton
2002-08-10 18:20                                     ` Eric W. Biederman
2002-08-10 18:59                                       ` Daniel Phillips
2002-08-10 19:55                                       ` Rik van Riel
2002-08-10 19:54                                         ` Eric W. Biederman
2002-08-09 18:32                                 ` Hubertus Franke
2002-08-09 18:43                                   ` Daniel Phillips
2002-08-09 19:17                                     ` Hubertus Franke
2002-08-11 20:30                                 ` Alan Cox
2002-08-11 22:33                                   ` Daniel Phillips
2002-08-11 22:55                                     ` Linus Torvalds
2002-08-11 22:56                                       ` Linus Torvalds
2002-08-11 23:36                                         ` William Lee Irwin III
2002-08-12  0:46                                         ` Alan Cox
2002-08-11 23:42                                           ` Rik van Riel
2002-08-11 23:50                                             ` Larry McVoy
2002-08-12  8:22                                               ` Daniel Phillips
2002-08-13  8:40                                                 ` Rob Landley
2002-08-13 15:06                                                   ` Alan Cox
2002-08-13 11:36                                                     ` Rob Landley
2002-08-13 16:51                                                       ` Linus Torvalds
2002-08-13 12:53                                                         ` Rob Landley
2002-08-13 17:14                                                         ` Ruth Ivimey-Cook
2002-08-13 17:29                                                         ` Rik van Riel
2002-08-13 13:18                                                           ` Rob Landley
2002-08-13 18:32                                                             ` Linus Torvalds
2002-08-13 13:50                                                               ` Rob Landley
2002-08-13 17:45                                                           ` Alexander Viro
2002-08-13 17:55                                                           ` Linus Torvalds
2002-08-13 17:59                                                             ` Rik van Riel
2002-08-13 13:35                                                               ` Rob Landley
2002-08-13 19:12                                                             ` Daniel Phillips
2002-08-22 12:03                                                           ` bill davidsen
     [not found]                                                       ` <Pine.LNX.4.44.0208130942130.7411-100000@home.transmeta.com >
2002-08-13 18:46                                                         ` large page patch (fwd) Mike Galbraith
2002-08-11 23:44                                           ` large page patch (fwd) (fwd) Daniel Phillips
2002-08-13  8:51                                             ` Rob Landley
2002-08-13 16:47                                               ` Daniel Phillips
2002-08-13 13:09                                                 ` Rob Landley
2002-08-11 23:15                                       ` Larry McVoy
2002-08-12  1:26                                         ` Linus Torvalds
2002-08-12  5:05                                           ` Larry McVoy
2002-08-12 10:31                                           ` Alan Cox
2002-08-04  0:28                 ` David S. Miller
2002-08-04 17:31                   ` Hubertus Franke
2002-08-04 18:38                     ` Linus Torvalds
2002-08-04 19:23                       ` Andrew Morton
2002-08-04 19:28                         ` Linus Torvalds
2002-08-05  5:42                           ` David S. Miller
2002-08-04 19:30                       ` Hubertus Franke
2002-08-04 20:23                         ` William Lee Irwin III
2002-08-05 16:59                         ` David Mosberger
2002-08-05 17:21                           ` Hubertus Franke
2002-08-05 21:10                             ` Jamie Lokier
2002-08-04 19:41                       ` Rik van Riel
2002-08-05  5:40                     ` David S. Miller
2002-08-03 18:41             ` Hubertus Franke
2002-08-03 19:39               ` Linus Torvalds
2002-08-04  0:32                 ` David S. Miller
2002-08-03 19:41               ` David Mosberger
2002-08-03 20:53                 ` Hubertus Franke
2002-08-03 21:26                   ` David Mosberger
2002-08-03 21:50                     ` Hubertus Franke
2002-08-04  0:34                   ` David S. Miller
2002-08-04  0:31                 ` David S. Miller
2002-08-04 17:25                   ` Hubertus Franke
     [not found] <200208041331.24895.frankeh@watson.ibm.com.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0208041131380.10314-100000@home.transmeta.com.suse.lists.linux.kernel>
     [not found]   ` <3D4D7F24.10AC4BDB@zip.com.au.suse.lists.linux.kernel>
2002-08-04 20:20     ` Andi Kleen
2002-08-04 23:51       ` Eric W. Biederman
2002-08-05 23:30 Seth, Rohit
2002-08-06  5:01 ` David Mosberger
2002-08-06  4:58   ` David S. Miller
2002-08-06  5:19     ` David Mosberger
2002-08-06  5:08       ` David S. Miller
2002-08-06  5:32         ` David Mosberger
2002-08-06 19:11 ` Hubertus Franke
2002-08-06 20:38 Luck, Tony
2002-08-06 21:03 ` Hubertus Franke
2002-08-09 17:51 Seth, Rohit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).