linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Arch option to touch newly allocated pages
@ 2002-03-03 21:12 Jeff Dike
  2002-03-03 22:01 ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-03 21:12 UTC (permalink / raw)
  To: linux-kernel

What I'd like is for the arch to have __alloc_pages touch all pages that it 
has allocated and is about to return.

The reason for this is that for UML, those pages are backed by host memory,
which may or may not be available when they are finally touched at some
arbitrary place in the kernel.  I hit this by tmpfs running out of room
because my UMLs have their memory backed by tmpfs mounted on /tmp.  So, I
want to be able to dirty those pages before they are seen by any other code.

My first guess at what I want in the code is for all the places that 
__alloc_pages says this:

			if (page)
				return page;

to change to this:

			if (page)
				return arch_validate(page);

arch_validate would be defined as basically empty somewhere in a
include/linux/*.h unless the arch has defined one already.  And I may want
to add order to the arg list if it can't be inferred from the page alignment.

My arch_validate would look something like this:

struct page_struct *arch_validate(page_struct *page)
{
	unsigned long zero = 0;
	unsigned long addr = page_address(page);

	set_fs(USER_DS);
	for(i = 0; i < 1 << order; i++){
		if(copy_to_user(addr + i * PAGE_SIZE, &zero, sizeof(zero))){
			set_fs(KERNEL_DS);
			free_pages(addr, order);
			return(NULL);
		}
	}
	set_fs(KERNEL_DS);
	return(page);
}

The use of set_fs/copy_to_user is somewhat hokey, but that's exactly the
effect that I want.  Is there a better way of doing that?

So, is this a reasonable thing to do, and is the above the right way of
getting it?

				Jeff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-03 21:12 [RFC] Arch option to touch newly allocated pages Jeff Dike
@ 2002-03-03 22:01 ` Alan Cox
  2002-03-03 23:27   ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-03 22:01 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

> The reason for this is that for UML, those pages are backed by host memory,
> which may or may not be available when they are finally touched at some
> arbitrary place in the kernel.  I hit this by tmpfs running out of room
> because my UMLs have their memory backed by tmpfs mounted on /tmp.  So, I
> want to be able to dirty those pages before they are seen by any other code.

No - you think you want to dirty the pages - you want to account the address
space. What you want to do is run 2.4.18ac3 and do

	echo "2" > /proc/sys/vm/overcommit_memory

which on a good day will give you overcommit protection. Your map requests
will fail without the pages being dirtied and the extra swap that would
cause. It knows about tmpfs too but not ramfs, ramdisk or ptrace yet

Alan

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-03 22:01 ` Alan Cox
@ 2002-03-03 23:27   ` Jeff Dike
  2002-03-03 23:48     ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-03 23:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@lxorguk.ukuu.org.uk said:
> No - you think you want to dirty the pages - you want to account the
> address space. What you want to do is run 2.4.18ac3 and do
> 	echo "2" > /proc/sys/vm/overcommit_memory
> which on a good day will give you overcommit protection. Your map
> requests will fail without the pages being dirtied and the extra swap
> that would cause.

That doesn't sound right to me.

I don't have individual little map requests going on here.  I have a single
large map happening at boot time which creates the UML "physical" memory
area.  

So, say I have a 128M UML which is only ever going to use 32M of that.  If 
there isn't 128M of address space, but there is 32M, this UML will never
get off the ground, even though it really deserved to.

About the swap allocation, I'd bet essentially all the time when a page
is allocated, its dirtiness is imminent anyway.  So, I'm not adding anything
to swap.  It'll be there a usec later anyway.  What I want is for the dirtying
to happen in a controlled place where something sane can be done if the page
isn't really there.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-03 23:27   ` Jeff Dike
@ 2002-03-03 23:48     ` Alan Cox
  2002-03-04  3:16       ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-03 23:48 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> I don't have individual little map requests going on here.  I have a single
> large map happening at boot time which creates the UML "physical" memory
> area.  

Doesn't matter

> So, say I have a 128M UML which is only ever going to use 32M of that.  If 
> there isn't 128M of address space, but there is 32M, this UML will never
> get off the ground, even though it really deserved to.

Well thats up to you on how you implement it. mmap will tell you the truth
in overcommit mode 2 or 3. Nothing will get killed off when you try and
mmap too much or dirty pages you have.

> About the swap allocation, I'd bet essentially all the time when a page
> is allocated, its dirtiness is imminent anyway.  So, I'm not adding anything

Nothing of the sort. Sitting in a gnome desktop I'm showing a 41200Kb worst
case swap requirement, but it appears under half of that is used.

> to swap.  It'll be there a usec later anyway.  What I want is for the dirtying
> to happen in a controlled place where something sane can be done if the page
> isn't really there.

Like randomly killing another process off ? If you want to dirty the pages
pray and catch the sigbus then see memset(3). If you want to be told "sorry
you can't have that" and write a simple loop to pick a good memory size,
you need the address space accounting.


Alan

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-03 23:48     ` Alan Cox
@ 2002-03-04  3:16       ` Jeff Dike
  2002-03-04  3:35         ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-04  3:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@lxorguk.ukuu.org.uk said:
> Like randomly killing another process off ? If you want to dirty the
> pages pray and catch the sigbus then see memset(3). If you want to be
> told "sorry you can't have that" and write a simple loop to pick a
> good memory size, you need the address space accounting.

OK, this sounds right if the machine is short of memory.  Random
hacks to do something reasonable if a SIGBUS manages to gets through aren't
the way to go when random process deaths are what happen if it doesn't.

However, the host wasn't under a global memory shortage.  The UML hit the 
tmpfs size limit.

Does address space accounting enforce tmpfs limits (and other limits, like
RSS, when it happens)?  Or is it enforcing a global limit?

When the host isn't in a memory shortage and UML is running under a sub-limit
(as with tmpfs), either of those gives me worse behavior than I get by being
able to trap the SIGBUS.  It will arrive reliably without accompanying process
deaths.  The first case means that the UML won't get off the ground even
though it would be able to deal semi-gracefully with tmpfs running out of room.
The second means that the mmap will succeed and I'm back to SIGBUS anyway.

> Nothing of the sort. Sitting in a gnome desktop I'm showing a 41200Kb
> worst case swap requirement, but it appears under half of that is
> used. 

This I don't get.  I'm assuming that the vast majority of the time when a
set of pages is returned by __alloc_pages, they all are going to be written
pretty soon.  This being the case, how can it possibly affect anything to
touch them at the end of __alloc_pages?

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04  3:16       ` Jeff Dike
@ 2002-03-04  3:35         ` Alan Cox
  2002-03-04  5:04           ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-04  3:35 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> Does address space accounting enforce tmpfs limits (and other limits, like
> RSS, when it happens)?  Or is it enforcing a global limit?

It ensures that the total number of anonymous and/or tmpfs (eg anon shared)
pages that are mappable will fit in swap (or in mode 2 swap + 0.5*ram). You
never get a SIGBUS. Writes to tmpfs for new blocks will fail if that would
place the system in a potential overcommit situation.

> > Nothing of the sort. Sitting in a gnome desktop I'm showing a 41200Kb
> > worst case swap requirement, but it appears under half of that is
> > used. 
> 
> This I don't get.  I'm assuming that the vast majority of the time when a
> set of pages is returned by __alloc_pages, they all are going to be written
> pretty soon.  This being the case, how can it possibly affect anything to
> touch them at the end of __alloc_pages?

It isnt the alloc pages that is the problem.

You mmap - no pages are allocated. You use them , pages get allocated. If
you look at the actual maps you'll find a lot of people allocate an area
of address space but don't use it all. Without the address overcommit
management nothing guarantees that when you touch those pages you won't
fault. Furthermore unless you are very careful you may fault again on
the stack push for the SIGBUS and if that faults - SIGKILL->OOM time


Alan

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04  3:35         ` Alan Cox
@ 2002-03-04  5:04           ` Jeff Dike
  2002-03-04 15:09             ` Alan Cox
  2002-03-04 17:46             ` H. Peter Anvin
  0 siblings, 2 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-04  5:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@lxorguk.ukuu.org.uk said:
> You never get a SIGBUS. Writes to tmpfs for new blocks will fail if
> that would place the system in a potential overcommit situation.

How will writes to tmpfs fail if we're not in an overcommit situation, but
tmpfs is full?  Unless tmpfs is changed, it looks to me like you get a SIGBUS.

> It isnt the alloc pages that is the problem.

We are somehow failing to communicate...

> You mmap - no pages are allocated. 

I understand this.

> You use them , pages get allocated.

This too.

> If you look at the actual maps you'll find a lot of people allocate an
> area of address space but don't use it all. 

Yes.

> Without the address
> overcommit management nothing guarantees that when you touch those
> pages you won't fault. 

Even with address overcommit management, I can fault if I touch pages when
tmpfs is full but the system is not near overcommit.

> Furthermore unless you are very careful you may
> fault again on the stack push for the SIGBUS and if that faults -
> SIGKILL->OOM time

We are talking about UML kernel stacks.  If they have been allocated the way
I'm proposing with the UML __alloc_pages touching each page on the way out,
they are allocated on the host, and therefore can't fault.

This seems to me to be sufficiently careful.

One of us is missing something, who is it?

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04  5:04           ` Jeff Dike
@ 2002-03-04 15:09             ` Alan Cox
  2002-03-04 17:42               ` Jeff Dike
  2002-03-04 17:46             ` H. Peter Anvin
  1 sibling, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-04 15:09 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> Even with address overcommit management, I can fault if I touch pages when
> tmpfs is full but the system is not near overcommit.

That is what mmap defines for a file based mapping yes. Thats a case where
there isnt much else you can do

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 15:09             ` Alan Cox
@ 2002-03-04 17:42               ` Jeff Dike
  2002-03-04 18:29                 ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-04 17:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@lxorguk.ukuu.org.uk said:
> That is what mmap defines for a file based mapping yes. Thats a case
> where there isnt much else you can do 

Except the whole point of me starting this thread is that there is something
sane that UML can do *if* it can trap those bus errors in a controlled way.

If UML can detect pages which tmpfs can't back as they leave the allocator,
then it can prevent the rest of the UML kernel from getting randomly SIGBUSed
as it touches those pages.

To recap in case it got lost in the confusion, I want __alloc_pages to call
an arch hook before it return memory, turning every instance of

	if (page)
		return page;

into

	if (page)
		return arch_validate(page);

Unless the arch defines its own arch_validate(), a generic header would
define it as

	static inline arch_validate(struct page_struct *page){ return page; }

or the equivalent macro.

On the other hand, UML would define it to touch each page in the allocation,
trapping SIGBUS there.  If any do SIGBUS, then my orginal proposal was to 
free the block back to the allocator and return NULL.  This would cause a
flurry of allocation failures to things that weren't willing to sleep, and
if that causes trouble, then the caller needed fixing anyway.

A more interesting idea is to hang on to the block and maybe unmap it.  
Umapping would free any backed pages in the block back to tmpfs, giving it 
(and the other UMLs, if any) some breathing room.  Even if the entire block
was unbacked, the UML would lose it as being allocatable and would eventually
be restricted to handing out pages that it had managed to touch before tmpfs
ran out.

This is way more sane than the current get-a-SIGBUS-someplace-random-and-panic
situation I have now.

Given that we are talking about tmpfs running out of space, the host still
has plenty of free memory, and UML kernel stacks can receive the SIGBUS
(because they've been allocated with this mechanism), is this still 
objectionable?

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04  5:04           ` Jeff Dike
  2002-03-04 15:09             ` Alan Cox
@ 2002-03-04 17:46             ` H. Peter Anvin
  2002-03-04 18:34               ` Jeff Dike
  1 sibling, 1 reply; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-04 17:46 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200203040504.AAA05343@ccure.karaya.com>
By author:    Jeff Dike <jdike@karaya.com>
In newsgroup: linux.dev.kernel
> 
> Even with address overcommit management, I can fault if I touch pages when
> tmpfs is full but the system is not near overcommit.
> 
> > Furthermore unless you are very careful you may
> > fault again on the stack push for the SIGBUS and if that faults -
> > SIGKILL->OOM time
> 
> We are talking about UML kernel stacks.  If they have been allocated the way
> I'm proposing with the UML __alloc_pages touching each page on the way out,
> they are allocated on the host, and therefore can't fault.
> 
> This seems to me to be sufficiently careful.
> 
> One of us is missing something, who is it?
> 

I think it's you -- you seem to suffer from the "my application is the
only one that counts" syndrome.  If you want to pages dirtied, then
dirty them using memset() or similar.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 17:42               ` Jeff Dike
@ 2002-03-04 18:29                 ` Alan Cox
  2002-03-04 18:36                   ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-04 18:29 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> If UML can detect pages which tmpfs can't back as they leave the allocator,
> then it can prevent the rest of the UML kernel from getting randomly SIGBUSed
> as it touches those pages.

Yes I follow this.  I don't understand how it is related to your intended
solution.

> To recap in case it got lost in the confusion, I want __alloc_pages to call
> an arch hook before it return memory, turning every instance of

alloc_pages is only called at the time the backing page is created - by
then it doesnt matter - its too late. You'd need to hack up the same code
areas that are used for mlock MCL_FUTURE not alloc_pages

> Given that we are talking about tmpfs running out of space, the host still
> has plenty of free memory, and UML kernel stacks can receive the SIGBUS
> (because they've been allocated with this mechanism), is this still 
> objectionable?

With the vm no overcommit code the tmpfs cannot run out of space filling
in pages, only when you make a tmpfs file larger. The code guarantees there
are swap pages available to back between offset 0 and the file size.  A
write extending a tmpfs file may fail reporting the disk full.

The code guarantees (modulo bugs of course!) that the total number of
pages that could be created by touching addresses that have already been
mapped including accounting for tmpfs on the basis above never exceeds the
number of pages available.

The bugs at the moment being 
	1. ptrace isnt accounted for its special weirdnesses
	2. MAP_NORESERVE isnt forcibly accounted in these modes as required


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 18:34               ` Jeff Dike
@ 2002-03-04 18:33                 ` H. Peter Anvin
  2002-03-04 20:36                   ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-04 18:33 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

Jeff Dike wrote:

> hpa@zytor.com said:
> 
>>I think it's you -- you seem to suffer from the "my application is the
>>only one that counts" syndrome.  If you want to pages dirtied, then
>>dirty them using memset() or similar. 
>>
> 
> I think you and Alan think I want the host kernel to do the dirtying.  Not so,
> I want no changes on the host.  I want a hook that UML can use to make sure
> that all pages that it allocates are backed.
> 
> And memset or something similar is exactly what I have in mind.
> 


So why, then, phrase this as a feature request???

	-hpa



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 17:46             ` H. Peter Anvin
@ 2002-03-04 18:34               ` Jeff Dike
  2002-03-04 18:33                 ` H. Peter Anvin
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-04 18:34 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

hpa@zytor.com said:
> I think it's you -- you seem to suffer from the "my application is the
> only one that counts" syndrome.  If you want to pages dirtied, then
> dirty them using memset() or similar. 

I think you and Alan think I want the host kernel to do the dirtying.  Not so,
I want no changes on the host.  I want a hook that UML can use to make sure
that all pages that it allocates are backed.

And memset or something similar is exactly what I have in mind.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 18:29                 ` Alan Cox
@ 2002-03-04 18:36                   ` Jeff Dike
  2002-03-04 18:49                     ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-04 18:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@lxorguk.ukuu.org.uk said:
> alloc_pages is only called at the time the backing page is created -
> by then it doesnt matter - its too late.

*My* (i.e. the one inside UML) alloc_pages, not the host's would do the
dirtying.  That's the whole point.  The UML alloc_pages would make sure
that the pages it hands out are backed on the host before they are handed
out to the rest of UML.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 18:36                   ` Jeff Dike
@ 2002-03-04 18:49                     ` Alan Cox
  2002-03-04 20:46                       ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-04 18:49 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> > alloc_pages is only called at the time the backing page is created -
> > by then it doesnt matter - its too late.
> 
> *My* (i.e. the one inside UML) alloc_pages, not the host's would do the
> dirtying.  That's the whole point.  The UML alloc_pages would make sure
> that the pages it hands out are backed on the host before they are handed
> out to the rest of UML.

Ok got you - so its merely grossly ineffecient and downright rude to
other users of the system ?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 18:33                 ` H. Peter Anvin
@ 2002-03-04 20:36                   ` Jeff Dike
  2002-03-04 22:51                     ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-04 20:36 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

hpa@zytor.com said:
> So why, then, phrase this as a feature request??? 

Because it requires a hook in the generic kernel allocator that UML can
use to make sure that all allocated pages are backed on the host.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 18:49                     ` Alan Cox
@ 2002-03-04 20:46                       ` Jeff Dike
  2002-03-04 22:49                         ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-04 20:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

alan@lxorguk.ukuu.org.uk said:
> Ok got you - 

Good, if that's not being sarcastic...

> so its merely grossly ineffecient and downright rude to
> other users of the system ? 

OK, when something calls alloc_pages and gets back some pages, it's almost
always going to modify them immediately, right?

If this is true, then what I'm proposing would force the host to find backing
memory for those pages a tiny bit earlier than it would have had to otherwise.

This is the only possibility for inefficiency and rudeness that I can see.
If I'm totally missing what you are referring to, please be a little bit more
specific.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 20:46                       ` Jeff Dike
@ 2002-03-04 22:49                         ` Alan Cox
  0 siblings, 0 replies; 96+ messages in thread
From: Alan Cox @ 2002-03-04 22:49 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, linux-kernel

> OK, when something calls alloc_pages and gets back some pages, it's almost
> always going to modify them immediately, right?

Yes. Which is why we don't allocate the when you map an object or create
a shmem fs file

> If this is true, then what I'm proposing would force the host to find backing
> memory for those pages a tiny bit earlier than it would have had to otherwise.

In the normal case about half of the pages are never allocated that are
mapped. In other words no alloc_pages was ever done for them or will ever
be needed. 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 20:36                   ` Jeff Dike
@ 2002-03-04 22:51                     ` Alan Cox
  2002-03-05  4:15                       ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-04 22:51 UTC (permalink / raw)
  To: Jeff Dike; +Cc: H. Peter Anvin, linux-kernel

> hpa@zytor.com said:
> > So why, then, phrase this as a feature request??? 
> 
> Because it requires a hook in the generic kernel allocator that UML can
> use to make sure that all allocated pages are backed on the host.

At the point you actually allocate pages they are being allocated. No hook
is needed. You seem to misunderstand the way the allocation works - we
allocate address space not memory in things like mmap. We allocate pages
on demand when referenced. The page allocator is only called after a page
is referenced

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-04 22:51                     ` Alan Cox
@ 2002-03-05  4:15                       ` Jeff Dike
  2002-03-05  4:28                         ` Benjamin LaHaise
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-05  4:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: H. Peter Anvin, linux-kernel

alan@lxorguk.ukuu.org.uk said:
> You seem to misunderstand the way the allocation works - we allocate
> address space not memory in things like mmap. We allocate pages on
> demand when referenced. The page allocator is only called after a page
> is referenced

I understand perfectly well how it works.

You still don't understand what I'm talking about.  To make this a bit more
concrete, the patch below implements what I want (plus a couple of bug fixes
needed to make it work).

If you want to run it, apply the 2.4.18-2 UML patch (available at 
http://prdownloads.sourceforge.net/user-mode-linux/uml-patch-2.4.18-2.bz2) to 
a stock 2.4.18 pool.  Copy the pool, apply the patch below to one of them,
and build both.

Mount a 64M tmpfs on /tmp, boot up two 64M UMLs without the patch, run a -j 2
kernel build in each and watch them hang (see http://user-mode-linux/sf.net
for lots of docs, filesystem images, etc if you haven't run UML before).  If 
you have gdb running on them, you will see that they're stuck at some random 
place in the kernel taking an infinite stream of SIGBUSes on a page that tmpfs 
can't back.  If you apply the relay_signal piece of the patch to this pool, 
you will get panics instead of hangs.

Now do the same with two 64M UMLs with the patch.  You will see the build die
like this, but the UMLs stay up and they're fairly healthy:

gcc -D__KERNEL__ -I/kernel/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer -fno-strict-aliasing -fno-common -pipe -mpreferred-stack-boundary=2 -march=i686    -c -o dma.o dma.c
cpp: output pipe has been closed
gcc: Internal compiler error: program cc1 got fatal signal 11
make[2]: *** [dma.o] Error 1

Note the following:
	the host is not short of memory, so address space accounting and the
possibility of random process deaths do not come into play
	you did not build or reboot the host kernel - all this is strictly 
inside UML
	the code added to mm.h is a no-op for every arch but UML

So, does this make things at all clearer?  Without the patch I get random
UML deaths when tmpfs can't back a page.  With it, tmpfs is forced to back
newly allocated pages when they're allocated, and the allocation returns NULL
if it can't.  The result being I get no UML deaths and fairly reasonable 
behavior.

				Jeff


diff -Naur um/arch/um/kernel/exec_kern.c back/arch/um/kernel/exec_kern.c
--- um/arch/um/kernel/exec_kern.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/exec_kern.c	Mon Mar  4 17:20:52 2002
@@ -38,6 +38,12 @@
 	int new_pid;
 
 	stack = alloc_stack();
+	if(stack == 0){
+		printk(KERN_ERR 
+		       "flush_thread : failed to allocate temporary stack\n");
+		do_exit(SIGKILL);
+	}
+		
 	new_pid = start_fork_tramp((void *) current->thread.kernel_stack,
 				   stack, 0, exec_tramp);
 	if(new_pid < 0){
diff -Naur um/arch/um/kernel/mem.c back/arch/um/kernel/mem.c
--- um/arch/um/kernel/mem.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/mem.c	Mon Mar  4 16:04:01 2002
@@ -212,6 +212,32 @@
 "    just be swapped out.\n        Example: mem=64M\n\n"
 );
 
+struct page *arch_validate(struct page *page, int order)
+{
+	unsigned long addr, zero = 0;
+	int i;
+
+	addr = (unsigned long) page_address(page);
+	for(i = 0; i < (1 << order); i++){
+		current->thread.fault_addr = (void *) addr;
+		if(__do_copy_to_user((void *) addr, &zero, sizeof(zero),
+				     &current->thread.fault_addr,
+				     &current->thread.fault_catcher))
+			return(NULL);
+		addr += PAGE_SIZE;
+	}
+	return(page);
+}
+
+extern void relay_signal(int sig, void *sc, int usermode);
+
+void bus_handler(int sig, void *sc, int usermode)
+{
+	if(current->thread.fault_catcher != NULL)
+		do_longjmp(current->thread.fault_catcher);
+	else relay_signal(sig, sc, usermode);
+}
+
 /*
  * Overrides for Emacs so that we follow Linus's tabbing style.
  * Emacs will notice this stuff at the end of the file and automatically
diff -Naur um/arch/um/kernel/process_kern.c back/arch/um/kernel/process_kern.c
--- um/arch/um/kernel/process_kern.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/process_kern.c	Mon Mar  4 17:19:00 2002
@@ -141,7 +141,7 @@
 	unsigned long page;
 
 	if((page = __get_free_page(GFP_KERNEL)) == 0)
-		panic("Couldn't allocate new stack");
+		return(0);
 	stack_protections(page);
 	return(page);
 }
@@ -318,6 +318,11 @@
 		panic("copy_thread : pipe failed");
 	if(current->thread.forking){
 		stack = alloc_stack();
+		if(stack == 0){
+			printk(KERN_ERR "copy_thread : failed to allocate "
+			       "temporary stack\n");
+			return(-ENOMEM);
+		}
 		clone_vm = (p->mm == current->mm);
 		p->thread.temp_stack = stack;
 		new_pid = start_fork_tramp((void *) p->thread.kernel_stack,
diff -Naur um/arch/um/kernel/trap_kern.c back/arch/um/kernel/trap_kern.c
--- um/arch/um/kernel/trap_kern.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/trap_kern.c	Mon Mar  4 17:22:26 2002
@@ -30,6 +30,7 @@
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	struct siginfo si;
+	void *catcher;
 	pgd_t *pgd;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -40,6 +41,7 @@
 		return(0);
 	}
 	if(mm == NULL) panic("Segfault with no mm");
+	catcher = current->thread.fault_catcher;
 	si.si_code = SEGV_MAPERR;
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -84,10 +86,10 @@
 	up_read(&mm->mmap_sem);
 	return(0);
  bad:
-	if (current->thread.fault_catcher != NULL) {
+	if(catcher != NULL) {
 		current->thread.fault_addr = (void *) address;
 		up_read(&mm->mmap_sem);
-		do_longjmp(current->thread.fault_catcher);
+		do_longjmp(catcher);
 	} 
 	else if(current->thread.fault_addr != NULL){
 		panic("fault_addr set but no fault catcher");
@@ -120,6 +122,7 @@
 
 void relay_signal(int sig, void *sc, int usermode)
 {
+	if(!usermode) panic("Kernel mode signal %d", sig);
 	force_sig(sig, current);
 }
 
diff -Naur um/arch/um/kernel/trap_user.c back/arch/um/kernel/trap_user.c
--- um/arch/um/kernel/trap_user.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/trap_user.c	Mon Mar  4 15:45:58 2002
@@ -420,11 +420,13 @@
 
 extern int timer_ready, timer_on;
 
+extern void bus_handler(int sig, void *sc, int usermode);
+
 static void (*handlers[])(int, void *, int) = {
 	[ SIGTRAP ] relay_signal,
 	[ SIGFPE ] relay_signal,
 	[ SIGILL ] relay_signal,
-	[ SIGBUS ] relay_signal,
+	[ SIGBUS ] bus_handler,
 	[ SIGSEGV] segv_handler,
 	[ SIGIO ] sigio_handler,
 	[ SIGVTALRM ] timer_handler,
diff -Naur um/include/asm-um/page.h back/include/asm-um/page.h
--- um/include/asm-um/page.h	Mon Mar  4 17:27:34 2002
+++ back/include/asm-um/page.h	Mon Mar  4 15:45:46 2002
@@ -42,4 +42,7 @@
 #define virt_to_page(kaddr)	(mem_map + (__pa(kaddr) >> PAGE_SHIFT))
 #define VALID_PAGE(page)	((page - mem_map) < max_mapnr)
 
+extern struct page *arch_validate(struct page *page, int order);
+#define HAVE_ARCH_VALIDATE
+
 #endif
diff -Naur um/include/linux/mm.h back/include/linux/mm.h
--- um/include/linux/mm.h	Mon Mar  4 16:16:44 2002
+++ back/include/linux/mm.h	Mon Mar  4 16:43:26 2002
@@ -358,6 +358,13 @@
 extern struct page * FASTCALL(__alloc_pages(unsigned int gfp_mask, unsigned int order, zonelist_t *zonelist));
 extern struct page * alloc_pages_node(int nid, unsigned int gfp_mask, unsigned int order);
 
+#ifndef HAVE_ARCH_VALIDATE
+static inline struct page *arch_validate(struct page *page, int order)
+{
+        return(page);
+}
+#endif
+
 static inline struct page * alloc_pages(unsigned int gfp_mask, unsigned int order)
 {
 	/*
@@ -365,7 +372,7 @@
 	 */
 	if (order >= MAX_ORDER)
 		return NULL;
-	return _alloc_pages(gfp_mask, order);
+	return arch_validate(_alloc_pages(gfp_mask, order), order);
 }
 
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05  4:15                       ` Jeff Dike
@ 2002-03-05  4:28                         ` Benjamin LaHaise
  2002-03-05  4:40                           ` Jeff Dike
  2002-03-05 14:43                           ` Jeff Dike
  0 siblings, 2 replies; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-05  4:28 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, H. Peter Anvin, linux-kernel

On Mon, Mar 04, 2002 at 11:15:56PM -0500, Jeff Dike wrote:
> So, does this make things at all clearer?  Without the patch I get random
> UML deaths when tmpfs can't back a page.  With it, tmpfs is forced to back
> newly allocated pages when they're allocated, and the allocation returns NULL
> if it can't.  The result being I get no UML deaths and fairly reasonable 
> behavior.

>From your explanation of things, you only need to do the memsets once at 
startup of UML where the ram is allocated -> a uml booted with 64MB of 
ram would write into every page of the backing store file before even 
running the kernel.  Doesn't that accomplish the same thing?

		-ben

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05  4:28                         ` Benjamin LaHaise
@ 2002-03-05  4:40                           ` Jeff Dike
  2002-03-05  5:34                             ` H. Peter Anvin
  2002-03-05 14:43                           ` Jeff Dike
  1 sibling, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-05  4:40 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Alan Cox, H. Peter Anvin, linux-kernel

bcrl@redhat.com said:
> From your explanation of things, you only need to do the memsets once
> at  startup of UML where the ram is allocated -> a uml booted with
> 64MB of  ram would write into every page of the backing store file
> before even  running the kernel.  Doesn't that accomplish the same
> thing?

Sort of, but it's very heavy-handed.  The UML will force memory to be
allocated on the host long before it will ever be needed, and it may never
be needed.  This patch doesn't waste memory like that.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05  4:40                           ` Jeff Dike
@ 2002-03-05  5:34                             ` H. Peter Anvin
  2002-03-05 14:43                               ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-05  5:34 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Benjamin LaHaise, Alan Cox, linux-kernel

Jeff Dike wrote:

> bcrl@redhat.com said:
> 
>>From your explanation of things, you only need to do the memsets once
>>at  startup of UML where the ram is allocated -> a uml booted with
>>64MB of  ram would write into every page of the backing store file
>>before even  running the kernel.  Doesn't that accomplish the same
>>thing?
>>
> 
> Sort of, but it's very heavy-handed.  The UML will force memory to be
> allocated on the host long before it will ever be needed, and it may never
> be needed.  This patch doesn't waste memory like that.
> 


This is not necessarily a bad thing, however.  If the user hadn't set up 
enough swap, they're probably better off getting the error message early.

	-hpa




^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05  5:34                             ` H. Peter Anvin
@ 2002-03-05 14:43                               ` Jeff Dike
  2002-03-05 16:37                                 ` H. Peter Anvin
  2002-03-05 16:56                                 ` Wayne Whitney
  0 siblings, 2 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-05 14:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Benjamin LaHaise, Alan Cox, linux-kernel

hpa@zytor.com said:
> This is not necessarily a bad thing, however.  If the user hadn't set
> up  enough swap, they're probably better off getting the error message
> early. 

This is not a situation in which a lack of swap or a lack of RAM is a problem.

The problem is a tmpfs filling up.

You think that UML refusing to run if it can't get every bit of memory it
might ever need is preferable to UML running fine in somewhat less memory?

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05  4:28                         ` Benjamin LaHaise
  2002-03-05  4:40                           ` Jeff Dike
@ 2002-03-05 14:43                           ` Jeff Dike
  2002-03-05 16:57                             ` H. Peter Anvin
  2002-03-05 17:30                             ` Jan Harkes
  1 sibling, 2 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-05 14:43 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Alan Cox, H. Peter Anvin, linux-kernel

bcrl@redhat.com said:
> you only need to do the memsets once at  startup of UML where the ram
> is allocated -> a uml booted with 64MB of  ram would write into every
> page of the backing store file before even  running the kernel.
> Doesn't that accomplish the same thing?

The other reason I don't like this is that, at some point, I'd like to
start thinking about userspace cooperating with the kernel on memory
management.  UML looks like a perfect place to start since it's essentially
identical to the host making it easier for the two to bargain over memory.

Having UML react sanely to unbacked pages is a step in that direction, having
UML preemptively grab all the memory it could ever use isn't.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 14:43                               ` Jeff Dike
@ 2002-03-05 16:37                                 ` H. Peter Anvin
  2002-03-05 18:12                                   ` Jeff Dike
  2002-03-05 16:56                                 ` Wayne Whitney
  1 sibling, 1 reply; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-05 16:37 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Benjamin LaHaise, Alan Cox, linux-kernel

Jeff Dike wrote:
> hpa@zytor.com said:
> 
>>This is not necessarily a bad thing, however.  If the user hadn't set
>>up  enough swap, they're probably better off getting the error message
>>early. 
>>
> 
> This is not a situation in which a lack of swap or a lack of RAM is a problem.
> 
> The problem is a tmpfs filling up.
> 
> You think that UML refusing to run if it can't get every bit of memory it
> might ever need is preferable to UML running fine in somewhat less memory?
> 

Actually, yes, esp. since the only case you have been able to bring up is 
one of the sysadmin being a moron.

	-hpa



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 14:43                               ` Jeff Dike
  2002-03-05 16:37                                 ` H. Peter Anvin
@ 2002-03-05 16:56                                 ` Wayne Whitney
  1 sibling, 0 replies; 96+ messages in thread
From: Wayne Whitney @ 2002-03-05 16:56 UTC (permalink / raw)
  To: H. Peter Anvin, Jeff Dike; +Cc: linux-kernel

H. Peter Avin wrote:

> Jeff Dike wrote:
>
> > You think that UML refusing to run if it can't get every bit of memory it
> > might ever need is preferable to UML running fine in somewhat less memory?
> 
> Actually, yes, esp. since the only case you have been able to bring up is 
> one of the sysadmin being a moron.

I could easily imagine it being useful to run multiple UMLs on one
machine (to simulate a network, say), and that one's application
causes each UML to occasionally spike in its memory requirements.
Then it would be disappointing for the number of UMLs one could run to
be determined by this maximum memory requirement, rather than by the
average memory requirement (minus some leeway for a few spiking UMLs).

The hook Jeff asks for seems harmless enough.  If there is some
disagreement about how UML interacts with the host kernel on memory
allocation, the two different modes could be a configuration option of
UML.  The "touch it all at startup" option could be the default, as it
does make alot of sense for the single UML case.

Cheers,
Wayne

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 14:43                           ` Jeff Dike
@ 2002-03-05 16:57                             ` H. Peter Anvin
  2002-03-05 18:14                               ` Jeff Dike
  2002-03-05 17:30                             ` Jan Harkes
  1 sibling, 1 reply; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-05 16:57 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200203051443.JAA02119@ccure.karaya.com>
By author:    Jeff Dike <jdike@karaya.com>
In newsgroup: linux.dev.kernel
> 
> The other reason I don't like this is that, at some point, I'd like to
> start thinking about userspace cooperating with the kernel on memory
> management.  UML looks like a perfect place to start since it's essentially
> identical to the host making it easier for the two to bargain over memory.
> 
> Having UML react sanely to unbacked pages is a step in that direction, having
> UML preemptively grab all the memory it could ever use isn't.
> 

Until you can come up with a sane application for it, this is just
featuritis.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 14:43                           ` Jeff Dike
  2002-03-05 16:57                             ` H. Peter Anvin
@ 2002-03-05 17:30                             ` Jan Harkes
  1 sibling, 0 replies; 96+ messages in thread
From: Jan Harkes @ 2002-03-05 17:30 UTC (permalink / raw)
  To: linux-kernel

On Tue, Mar 05, 2002 at 09:43:39AM -0500, Jeff Dike wrote:
> bcrl@redhat.com said:
> > you only need to do the memsets once at  startup of UML where the ram
> > is allocated -> a uml booted with 64MB of  ram would write into every
> > page of the backing store file before even  running the kernel.
> > Doesn't that accomplish the same thing?
> 
> The other reason I don't like this is that, at some point, I'd like to
> start thinking about userspace cooperating with the kernel on memory
> management.  UML looks like a perfect place to start since it's essentially
> identical to the host making it easier for the two to bargain over memory.

I could use the same thing in Coda, we have large private memory
mappings that are backed by a file which isn't always up-to-date. But we
can make it so by applying the logged modifications. If there is some
'memory pressure' signal we could apply the log and remap the memory to
reduce swap usage.

On the other hand, applying the logged modifications generates a lot of
write activity which could push the system over the edge, so the current
method of having a large amount of swap available is probably more
reliable. Otherwise we'll get the whole OOM killer debate again (the
pre-OOM signaller?).

Jan


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 16:37                                 ` H. Peter Anvin
@ 2002-03-05 18:12                                   ` Jeff Dike
  2002-03-05 18:30                                     ` Benjamin LaHaise
                                                       ` (3 more replies)
  0 siblings, 4 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-05 18:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Benjamin LaHaise, Alan Cox, linux-kernel

hpa@zytor.com said:
> Actually, yes, esp. since the only case you have been able to bring up
> is  one of the sysadmin being a moron. 

Really?  And you're unconcerned about the impact on the rest of the system
of a UML grabbing (say) 128M of memory when it starts up?  Especially if it
may never use it?

And I don't see anything wrong with starting a bunch of UMLs with a total
maximum memory exceeding the available tmpfs as long as they don't all need
all that memory at once.  And, if they do, the patch I just posted will let
them deal fairly sanely with the situation.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 16:57                             ` H. Peter Anvin
@ 2002-03-05 18:14                               ` Jeff Dike
  2002-03-05 18:45                                 ` H. Peter Anvin
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-05 18:14 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

hpa@zytor.com said:
> Until you can come up with a sane application for it, this is just
> featuritis. 

Having the system better manage its memory is "featuritis"?

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 18:12                                   ` Jeff Dike
@ 2002-03-05 18:30                                     ` Benjamin LaHaise
  2002-03-06 14:59                                       ` Daniel Phillips
  2002-03-05 18:46                                     ` H. Peter Anvin
                                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-05 18:30 UTC (permalink / raw)
  To: Jeff Dike; +Cc: H. Peter Anvin, Alan Cox, linux-kernel

On Tue, Mar 05, 2002 at 01:12:19PM -0500, Jeff Dike wrote:
> Really?  And you're unconcerned about the impact on the rest of the system
> of a UML grabbing (say) 128M of memory when it starts up?  Especially if it
> may never use it?

Honestly, I think that most people want to know if the system they've setup 
is overcommited at as early a point as possible: a UML failing at startup 
with out of memory is better than random segvs at some later point when the 
system is under load.  Refer to the principle of least surprise.  And if the 
user truely wants to disable that, well, you can give them a command line 
option to shoot themselves in the foot with.

		-ben
-- 
"A man with a bass just walked in,
 and he's putting it down
 on the floor."

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 18:14                               ` Jeff Dike
@ 2002-03-05 18:45                                 ` H. Peter Anvin
  0 siblings, 0 replies; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-05 18:45 UTC (permalink / raw)
  To: Jeff Dike; +Cc: linux-kernel

Jeff Dike wrote:

> hpa@zytor.com said:
> 
>>Until you can come up with a sane application for it, this is just
>>featuritis. 
>>
> 
> Having the system better manage its memory is "featuritis"?
> 


s/better/insanely/

Your proposed application is, quite frankly, bullshit.

	-hpa



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 18:12                                   ` Jeff Dike
  2002-03-05 18:30                                     ` Benjamin LaHaise
@ 2002-03-05 18:46                                     ` H. Peter Anvin
  2002-03-06  1:30                                     ` Alan Cox
  2002-03-06 10:49                                     ` David Woodhouse
  3 siblings, 0 replies; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-05 18:46 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Benjamin LaHaise, Alan Cox, linux-kernel

Jeff Dike wrote:

> 
> Really?  And you're unconcerned about the impact on the rest of the system
> of a UML grabbing (say) 128M of memory when it starts up?  Especially if it
> may never use it?

 >

It doesn't grab memory, it grabs backing store.  The kernel will swap it 
out as necessary.

> 
> And I don't see anything wrong with starting a bunch of UMLs with a total
> maximum memory exceeding the available tmpfs as long as they don't all need
> all that memory at once.  And, if they do, the patch I just posted will let
> them deal fairly sanely with the situation.


Bullshit.  It means you have moved your system into an insane corner 
case, and you would have been better off denying access in the first place.

	-hpa



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 18:12                                   ` Jeff Dike
  2002-03-05 18:30                                     ` Benjamin LaHaise
  2002-03-05 18:46                                     ` H. Peter Anvin
@ 2002-03-06  1:30                                     ` Alan Cox
  2002-03-06 10:49                                     ` David Woodhouse
  3 siblings, 0 replies; 96+ messages in thread
From: Alan Cox @ 2002-03-06  1:30 UTC (permalink / raw)
  To: Jeff Dike; +Cc: H. Peter Anvin, Benjamin LaHaise, Alan Cox, linux-kernel

> maximum memory exceeding the available tmpfs as long as they don't all need
> all that memory at once.  And, if they do, the patch I just posted will let
> them deal fairly sanely with the situation.

And the address space management stuff in the -ac tree will do all that and
more without force allocating pages and regardless of what other apps do
including without allowing your rude app to kill them.

You are using an axe to batter down a door. Worse than that I fitted a
perfectly good door handle.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 18:12                                   ` Jeff Dike
                                                       ` (2 preceding siblings ...)
  2002-03-06  1:30                                     ` Alan Cox
@ 2002-03-06 10:49                                     ` David Woodhouse
  2002-03-06 14:26                                       ` Jeff Dike
  2002-03-06 16:50                                       ` Alan Cox
  3 siblings, 2 replies; 96+ messages in thread
From: David Woodhouse @ 2002-03-06 10:49 UTC (permalink / raw)
  To: Jeff Dike; +Cc: H. Peter Anvin, Benjamin LaHaise, Alan Cox, linux-kernel


jdike@karaya.com said:
>  And I don't see anything wrong with starting a bunch of UMLs with a
> total maximum memory exceeding the available tmpfs as long as they
> don't all need all that memory at once.  And, if they do, the patch I
> just posted will let them deal fairly sanely with the situation.

Going off at a slight tangent...

You say 'at once'. Does UML somehow give pages back to the host when they're 
freed, so the pages that are no longer used by UML can be discarded by the 
host instead of getting swapped?

--
dwmw2



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 10:49                                     ` David Woodhouse
@ 2002-03-06 14:26                                       ` Jeff Dike
  2002-03-06 16:50                                       ` Alan Cox
  1 sibling, 0 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-06 14:26 UTC (permalink / raw)
  To: David Woodhouse; +Cc: H. Peter Anvin, Benjamin LaHaise, Alan Cox, linux-kernel

dwmw2@infradead.org said:
> Does UML somehow give pages back to the host when
> they're  freed, so the pages that are no longer used by UML can be
> discarded by the  host instead of getting swapped?

No, but it could.  Given another hook (in free_pages this time) I could unmap
pages as they're freed, allowing them to be discarded on the host.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-05 18:30                                     ` Benjamin LaHaise
@ 2002-03-06 14:59                                       ` Daniel Phillips
  2002-03-06 15:24                                         ` Benjamin LaHaise
  2002-03-06 16:03                                         ` Jesse Pollard
  0 siblings, 2 replies; 96+ messages in thread
From: Daniel Phillips @ 2002-03-06 14:59 UTC (permalink / raw)
  To: Benjamin LaHaise, Jeff Dike; +Cc: H. Peter Anvin, Alan Cox, linux-kernel

On March 5, 2002 07:30 pm, Benjamin LaHaise wrote:
> On Tue, Mar 05, 2002 at 01:12:19PM -0500, Jeff Dike wrote:
> > Really?  And you're unconcerned about the impact on the rest of the system
> > of a UML grabbing (say) 128M of memory when it starts up?  Especially if it
> > may never use it?
> 
> Honestly, I think that most people want to know if the system they've setup 
> is overcommited at as early a point as possible: a UML failing at startup 
> with out of memory is better than random segvs at some later point when the 
> system is under load.  Refer to the principle of least surprise.  And if the 
> user truely wants to disable that, well, you can give them a command line 
> option to shoot themselves in the foot with.

Suppose you have 512 MB memory and an equal amount of swap.  You start 8
umls with 64 MB each.  With your and Peter's suggestion, the system always
goes into swap.  Whereas if the memory is only allocated on demand it
probably doesn't.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 14:59                                       ` Daniel Phillips
@ 2002-03-06 15:24                                         ` Benjamin LaHaise
  2002-03-06 15:24                                           ` Daniel Phillips
  2002-03-06 16:03                                         ` Jesse Pollard
  1 sibling, 1 reply; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-06 15:24 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Jeff Dike, H. Peter Anvin, Alan Cox, linux-kernel

On Wed, Mar 06, 2002 at 03:59:22PM +0100, Daniel Phillips wrote:
> Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> umls with 64 MB each.  With your and Peter's suggestion, the system always
> goes into swap.  Whereas if the memory is only allocated on demand it
> probably doesn't.

As I said previously, going into swap is preferable over randomly killing 
new tasks under heavy load.

		-ben
-- 
"A man with a bass just walked in,
 and he's putting it down
 on the floor."

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 15:24                                         ` Benjamin LaHaise
@ 2002-03-06 15:24                                           ` Daniel Phillips
  2002-03-06 16:36                                             ` Benjamin LaHaise
  0 siblings, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-06 15:24 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Jeff Dike, H. Peter Anvin, Alan Cox, linux-kernel

On March 6, 2002 04:24 pm, Benjamin LaHaise wrote:
> On Wed, Mar 06, 2002 at 03:59:22PM +0100, Daniel Phillips wrote:
> > Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> > umls with 64 MB each.  With your and Peter's suggestion, the system always
> > goes into swap.  Whereas if the memory is only allocated on demand it
> > probably doesn't.
> 
> As I said previously, going into swap is preferable over randomly killing 
> new tasks under heavy load.

Huh?  In the example I gave, you will never oom but with your suggestion, you
will always go needlessly go into swap.  I'm suprised that you and Peter are
aguing in favor of wasting resources.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 14:59                                       ` Daniel Phillips
  2002-03-06 15:24                                         ` Benjamin LaHaise
@ 2002-03-06 16:03                                         ` Jesse Pollard
  2002-03-06 17:08                                           ` Jeff Dike
  1 sibling, 1 reply; 96+ messages in thread
From: Jesse Pollard @ 2002-03-06 16:03 UTC (permalink / raw)
  To: linux-kernel

Daniel Phillips <phillips@bonn-fries.net>:
> On March 5, 2002 07:30 pm, Benjamin LaHaise wrote:
> > On Tue, Mar 05, 2002 at 01:12:19PM -0500, Jeff Dike wrote:
> > > Really?  And you're unconcerned about the impact on the rest of the system
> > > of a UML grabbing (say) 128M of memory when it starts up?  Especially if it
> > > may never use it?
> > 
> > Honestly, I think that most people want to know if the system they've setup 
> > is overcommited at as early a point as possible: a UML failing at startup 
> > with out of memory is better than random segvs at some later point when the 
> > system is under load.  Refer to the principle of least surprise.  And if the 
> > user truely wants to disable that, well, you can give them a command line 
> > option to shoot themselves in the foot with.
> 
> Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> umls with 64 MB each.  With your and Peter's suggestion, the system always
> goes into swap.  Whereas if the memory is only allocated on demand it
> probably doesn't.

Not unless the VM is really bad... All that is called for is that the
virtual space be available. Each umls gets 64 MB, but the rest is guaranteed
available via swap. Nothing has to swap until all processes have expanded
to use all available ram. Currently the only way to ensure that the memory
IS available is to modify every page at startup. Yes it will swap the modified
pages.

But it should only do so once, until the pages are really needed.

Otherwise the umls run until the system goes OOM - then somebody gets killed.
Much nicer to have it die at the beginning instead of after 4-5 hours of
operation when it needs just "one more page" only to find out that the system
lied when it said it was available.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 15:24                                           ` Daniel Phillips
@ 2002-03-06 16:36                                             ` Benjamin LaHaise
  2002-03-06 23:14                                               ` Daniel Phillips
  0 siblings, 1 reply; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-06 16:36 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Jeff Dike, H. Peter Anvin, Alan Cox, linux-kernel

On Wed, Mar 06, 2002 at 04:24:17PM +0100, Daniel Phillips wrote:
> On March 6, 2002 04:24 pm, Benjamin LaHaise wrote:
> > On Wed, Mar 06, 2002 at 03:59:22PM +0100, Daniel Phillips wrote:
> > > Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> > > umls with 64 MB each.  With your and Peter's suggestion, the system always
> > > goes into swap.  Whereas if the memory is only allocated on demand it
> > > probably doesn't.
> > 
> > As I said previously, going into swap is preferable over randomly killing 
> > new tasks under heavy load.
> 
> Huh?  In the example I gave, you will never oom but with your suggestion, you
> will always go needlessly go into swap.  I'm suprised that you and Peter are
> aguing in favor of wasting resources.

I'm arguing in favour of predictable behaviour.  Stability and reliability 
are more important than a bit of swap space.

		-ben
-- 
"A man with a bass just walked in,
 and he's putting it down
 on the floor."

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 10:49                                     ` David Woodhouse
  2002-03-06 14:26                                       ` Jeff Dike
@ 2002-03-06 16:50                                       ` Alan Cox
  2002-03-06 20:25                                         ` Jeff Dike
  2002-03-06 22:21                                         ` Pavel Machek
  1 sibling, 2 replies; 96+ messages in thread
From: Alan Cox @ 2002-03-06 16:50 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Jeff Dike, H. Peter Anvin, Benjamin LaHaise, Alan Cox, linux-kernel

> You say 'at once'. Does UML somehow give pages back to the host when they're 
> freed, so the pages that are no longer used by UML can be discarded by the 
> host instead of getting swapped?

Doesn't seem to but it looks like madvise might be enough to make that
happen. That BTW is an issue for more than UML - it has a bearing on
running lots of Linux instances on any supervisor/virtualising system
like S/390

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 16:03                                         ` Jesse Pollard
@ 2002-03-06 17:08                                           ` Jeff Dike
  2002-03-06 17:33                                             ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-06 17:08 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: linux-kernel

pollard@tomcat.admin.navo.hpc.mil said:
> Currently the only way to ensure that the memory IS available is to
> modify every page at startup. Yes it will swap the modified pages.

Currently, yes.

But with Alan says his address space accounting will prevent mmaps from
succeeding if populating them would OOM the system, which gives you want
you want and which sounds like the right thing.  The 8 64M UMLs will run
without needing to touch all their pages at bootup and without fear of being
killed later.  If the 9th UML would be in danger of random death, then it
will never get off the ground.

Note that this doesn't help when the UMLs are under a smaller limit than 
RAM + .5 * swap or whatever as happens when they are mmapping from tmpfs.
That's the situation that I'm concerned about.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 17:08                                           ` Jeff Dike
@ 2002-03-06 17:33                                             ` Alan Cox
  2002-03-07  0:28                                               ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-06 17:33 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Jesse Pollard, linux-kernel

> Note that this doesn't help when the UMLs are under a smaller limit than 
> RAM + .5 * swap or whatever as happens when they are mmapping from tmpfs.
> That's the situation that I'm concerned about.

Making tmpfs enforce the policy in those modes both checking the global
overcommit and also enforcing a "must be able to fill in the pages between
start and end of file" for the tmpfs file size itself is not hard from
inspection. If its needed I can add that next update to the address
accounting.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 16:50                                       ` Alan Cox
@ 2002-03-06 20:25                                         ` Jeff Dike
  2002-03-06 20:54                                           ` Alan Cox
                                                             ` (2 more replies)
  2002-03-06 22:21                                         ` Pavel Machek
  1 sibling, 3 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-06 20:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Woodhouse, H. Peter Anvin, Benjamin LaHaise, linux-kernel

alan@lxorguk.ukuu.org.uk said:
> Doesn't seem to but it looks like madvise might be enough to make that
> happen.

Yeah, MADV_DONTNEED looks right.  UML and Linux/s390 (assuming VM has the
equivalent of MADV_DONTNEED) would need a hook in free_pages to make that
happen.

> That BTW is an issue for more than UML - it has a bearing on running
> lots of Linux instances on any supervisor/virtualising system like S/390

On a side note, the "unused memory is wasted memory" behavior that UML and 
Linux/s390 inherit is also less than optimal for the host.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 20:25                                         ` Jeff Dike
@ 2002-03-06 20:54                                           ` Alan Cox
  2002-03-06 21:27                                             ` Malcolm Beattie
  2002-03-06 21:27                                           ` David Woodhouse
  2002-03-07  0:04                                           ` Richard Gooch
  2 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-06 20:54 UTC (permalink / raw)
  To: Jeff Dike
  Cc: Alan Cox, David Woodhouse, H. Peter Anvin, Benjamin LaHaise,
	linux-kernel

> Yeah, MADV_DONTNEED looks right.  UML and Linux/s390 (assuming VM has the
> equivalent of MADV_DONTNEED) would need a hook in free_pages to make that
> happen.

VM allows you to give it back a page and if you use it again you get a
clean copy. What it seems to lack is the more ideal "here have this page
and if I reuse it trap if you did throw it out" semantic.

> > That BTW is an issue for more than UML - it has a bearing on running
> > lots of Linux instances on any supervisor/virtualising system like S/390
> 
> On a side note, the "unused memory is wasted memory" behavior that UML and 
> Linux/s390 inherit is also less than optimal for the host.

Yes. I believe IBM folks are studying that

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 20:54                                           ` Alan Cox
@ 2002-03-06 21:27                                             ` Malcolm Beattie
  2002-03-06 23:26                                               ` Jeff Dike
  0 siblings, 1 reply; 96+ messages in thread
From: Malcolm Beattie @ 2002-03-06 21:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Dike, David Woodhouse, H. Peter Anvin, Benjamin LaHaise,
	linux-kernel

Alan Cox writes:
> > Yeah, MADV_DONTNEED looks right.  UML and Linux/s390 (assuming VM has the
> > equivalent of MADV_DONTNEED) would need a hook in free_pages to make that
> > happen.
> 
> VM allows you to give it back a page and if you use it again you get a
> clean copy.

Yep, clean as in a page of zeroes when you touch it. (DIAGNOSE X'10' as
documented in the "CP Programming Services" manual, to be precise).

>             What it seems to lack is the more ideal "here have this page
> and if I reuse it trap if you did throw it out" semantic.

We're looking at ways of having fancier memory management information
pass between Linux and CP (it's safer to say CP (the "kernel" part of
VM/ESA and z/VM) than VM, given the ambiguous and confusing dual
meaning of "VM" otherwise :-).

> > > That BTW is an issue for more than UML - it has a bearing on running
> > > lots of Linux instances on any supervisor/virtualising system like S/390
> > 
> > On a side note, the "unused memory is wasted memory" behavior that UML and 
> > Linux/s390 inherit is also less than optimal for the host.
> 
> Yes. I believe IBM folks are studying that

Indeed. A "quich hack" that turns out to have rather useful, fun
properties is to have a little device driver (can be a module) which
stores "negative pages" in the page cache by allocating page cache
pages for the device's inode and then invoking the CP "release page"
call mentioned above. Linux thinks the page is "useful" and so keeps
it around until memory pressure kicks it out whereas the underlying
CP knows it's a hole making the resident size and working set of the
Linux image reduce. Add in a bit of feedback to get Linux re-reading
the "device" into cache proportionally to how much CP wants to kick
*out* resident pages from the image. Fun... However, closer
integration with the main mm system is the "proper" way to do it
(but depends on stuff like the latency, overheads and information
shared with CP so is a little more than an afternoon hack.)

--Malcolm

-- 
Malcolm Beattie <mbeattie@clueful.co.uk>
Linux Technical Consultant
IBM EMEA Enterprise Server Group...
...from home, speaking only for myself

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 20:25                                         ` Jeff Dike
  2002-03-06 20:54                                           ` Alan Cox
@ 2002-03-06 21:27                                           ` David Woodhouse
  2002-03-06 22:25                                             ` Joseph Malicki
  2002-03-07  0:28                                             ` Jeff Dike
  2002-03-07  0:04                                           ` Richard Gooch
  2 siblings, 2 replies; 96+ messages in thread
From: David Woodhouse @ 2002-03-06 21:27 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, H. Peter Anvin, Benjamin LaHaise, linux-kernel


jdike@karaya.com said:
>  Yeah, MADV_DONTNEED looks right.  UML and Linux/s390 (assuming VM has
> the equivalent of MADV_DONTNEED) would need a hook in free_pages to
> make that happen. 

       MADV_DONTNEED
              Do  not expect access in the near future.  (For the
              time being, the application is  finished  with  the
              given range, so the kernel can free resources asso­
              ciated with it.)

It's not clear from that that the host kernel is actually permitted to
discard the data.

alan@lxorguk.ukuu.org.uk said:
>  VM allows you to give it back a page and if you use it again you get
> a clean copy. What it seems to lack is the more ideal "here have this
> page and if I reuse it trap if you did throw it out" semantic. 

I've wittered on occasion about other situations where such semantics might
be useful -- essentially 'drop these pages if you need to as if they were
clean, and tell me when I next touch them so I can recreate their data'. 

UML might want that kind of thing for its (clean) page cache pages or 
something, but for pages allocated for kernel stack and task struct we 
really want the opposite -- we want to make sure they're present when we 
allocate them, and explicitly discard them when we're done.

--
dwmw2



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 16:50                                       ` Alan Cox
  2002-03-06 20:25                                         ` Jeff Dike
@ 2002-03-06 22:21                                         ` Pavel Machek
  2002-03-07 11:30                                           ` Dave Jones
  2002-03-07 18:21                                           ` H. Peter Anvin
  1 sibling, 2 replies; 96+ messages in thread
From: Pavel Machek @ 2002-03-06 22:21 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Woodhouse, Jeff Dike, H. Peter Anvin, Benjamin LaHaise,
	linux-kernel

Hi!

> > You say 'at once'. Does UML somehow give pages back to the host when they're 
> > freed, so the pages that are no longer used by UML can be discarded by the 
> > host instead of getting swapped?
> 
> Doesn't seem to but it looks like madvise might be enough to make that
> happen. That BTW is an issue for more than UML - it has a bearing on
> running lots of Linux instances on any supervisor/virtualising system
> like S/390

I just imagined hardware which supports freeing memory -- just do not
refresh it any more to conserve power ;-))).

Granted, it would probably only make sense in big chunks, like 2MB or
so... It might make sense for a PDA...
									Pavel

-- 
(about SSSCA) "I don't say this lightly.  However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 21:27                                           ` David Woodhouse
@ 2002-03-06 22:25                                             ` Joseph Malicki
  2002-03-07  0:28                                             ` Jeff Dike
  1 sibling, 0 replies; 96+ messages in thread
From: Joseph Malicki @ 2002-03-06 22:25 UTC (permalink / raw)
  To: David Woodhouse, Jeff Dike
  Cc: Alan Cox, H. Peter Anvin, Benjamin LaHaise, linux-kernel

>
> jdike@karaya.com said:
> >  Yeah, MADV_DONTNEED looks right.  UML and Linux/s390 (assuming VM has
> > the equivalent of MADV_DONTNEED) would need a hook in free_pages to
> > make that happen.
>
>        MADV_DONTNEED
>               Do  not expect access in the near future.  (For the
>               time being, the application is  finished  with  the
>               given range, so the kernel can free resources asso­
>               ciated with it.)
>
> It's not clear from that that the host kernel is actually permitted to
> discard the data.

Solaris has MADV_FREE to say that the data can be discarded...

-joe


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 16:36                                             ` Benjamin LaHaise
@ 2002-03-06 23:14                                               ` Daniel Phillips
  2002-03-06 23:20                                                 ` Benjamin LaHaise
  0 siblings, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-06 23:14 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Jeff Dike, H. Peter Anvin, Alan Cox, linux-kernel

On March 6, 2002 05:36 pm, Benjamin LaHaise wrote:
> On Wed, Mar 06, 2002 at 04:24:17PM +0100, Daniel Phillips wrote:
> > On March 6, 2002 04:24 pm, Benjamin LaHaise wrote:
> > > On Wed, Mar 06, 2002 at 03:59:22PM +0100, Daniel Phillips wrote:
> > > > Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> > > > umls with 64 MB each.  With your and Peter's suggestion, the system always
> > > > goes into swap.  Whereas if the memory is only allocated on demand it
> > > > probably doesn't.
> > > 
> > > As I said previously, going into swap is preferable over randomly killing 
> > > new tasks under heavy load.
> > 
> > Huh?  In the example I gave, you will never oom but with your suggestion, you
> > will always go needlessly go into swap.  I'm suprised that you and Peter are
> > aguing in favor of wasting resources.
> 
> I'm arguing in favour of predictable behaviour.  Stability and reliability 
> are more important than a bit of swap space.

That's the same argument that says memory overcommit should not be allowed.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 23:14                                               ` Daniel Phillips
@ 2002-03-06 23:20                                                 ` Benjamin LaHaise
  2002-03-06 23:26                                                   ` Daniel Phillips
  2002-03-07  1:27                                                   ` Jeff Dike
  0 siblings, 2 replies; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-06 23:20 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Jeff Dike, H. Peter Anvin, Alan Cox, linux-kernel

On Thu, Mar 07, 2002 at 12:14:15AM +0100, Daniel Phillips wrote:
> On March 6, 2002 05:36 pm, Benjamin LaHaise wrote:
> > On Wed, Mar 06, 2002 at 04:24:17PM +0100, Daniel Phillips wrote:
> > > On March 6, 2002 04:24 pm, Benjamin LaHaise wrote:
> > > > On Wed, Mar 06, 2002 at 03:59:22PM +0100, Daniel Phillips wrote:
> > > > > Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> > > > > umls with 64 MB each.  With your and Peter's suggestion, the system always
> > > > > goes into swap.  Whereas if the memory is only allocated on demand it
> > > > > probably doesn't.
> > > > 
> > > > As I said previously, going into swap is preferable over randomly killing 
> > > > new tasks under heavy load.
> > > 
> > > Huh?  In the example I gave, you will never oom but with your suggestion, you
> > > will always go needlessly go into swap.  I'm suprised that you and Peter are
> > > aguing in favor of wasting resources.
> > 
> > I'm arguing in favour of predictable behaviour.  Stability and reliability 
> > are more important than a bit of swap space.
> 
> That's the same argument that says memory overcommit should not be allowed.

Go back in the thread: I suggested making it an option that the user has to 
turn on to allow his foot to be shot.  Remember: the common case in the kernel 
is to be using all memory.

		-ben
-- 
"A man with a bass just walked in,
 and he's putting it down
 on the floor."

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 21:27                                             ` Malcolm Beattie
@ 2002-03-06 23:26                                               ` Jeff Dike
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-06 23:26 UTC (permalink / raw)
  To: Malcolm Beattie
  Cc: Alan Cox, David Woodhouse, H. Peter Anvin, Benjamin LaHaise,
	linux-kernel

mbeattie@clueful.co.uk said:
> A "quich hack" that turns out to have rather useful, fun properties is
> to have a little device driver (can be a module) which stores
> "negative pages" in the page cache by allocating page cache pages for
> the device's inode and then invoking the CP "release page" call
> mentioned above. 

Yeah, I was thinking about something like that.  It's unclear how it should
figure out how much memory to grab, though.  You'd have to get some idea
how desperate the host is for memory and balance that off against how 
desperate the VM is.

And you want to avoid doing things that just aggravate the host's situation,
i.e. if it is swapping its brains out, you want the VM to just drop some
clean pages and you definitely don't want it swapping dirty ones and add
to the host's IO load.

>  However, closer
> integration with the main mm system is the "proper" way to do it (but
> depends on stuff like the latency, overheads and information shared
> with CP so is a little more than an afternoon hack.)

Yup.

Is any of your (you or IBM in general) thinking on this written down publically
anywhere?

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 23:20                                                 ` Benjamin LaHaise
@ 2002-03-06 23:26                                                   ` Daniel Phillips
  2002-03-06 23:33                                                     ` H. Peter Anvin
  2002-03-07  1:27                                                   ` Jeff Dike
  1 sibling, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-06 23:26 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Jeff Dike, H. Peter Anvin, Alan Cox, linux-kernel

On March 7, 2002 12:20 am, Benjamin LaHaise wrote:
> On Thu, Mar 07, 2002 at 12:14:15AM +0100, Daniel Phillips wrote:
> > On March 6, 2002 05:36 pm, Benjamin LaHaise wrote:
> > > On Wed, Mar 06, 2002 at 04:24:17PM +0100, Daniel Phillips wrote:
> > > > On March 6, 2002 04:24 pm, Benjamin LaHaise wrote:
> > > > > On Wed, Mar 06, 2002 at 03:59:22PM +0100, Daniel Phillips wrote:
> > > > > > Suppose you have 512 MB memory and an equal amount of swap.  You start 8
> > > > > > umls with 64 MB each.  With your and Peter's suggestion, the system always
> > > > > > goes into swap.  Whereas if the memory is only allocated on demand it
> > > > > > probably doesn't.
> > > > > 
> > > > > As I said previously, going into swap is preferable over randomly killing 
> > > > > new tasks under heavy load.
> > > > 
> > > > Huh?  In the example I gave, you will never oom but with your suggestion, you
> > > > will always go needlessly go into swap.  I'm suprised that you and Peter are
> > > > aguing in favor of wasting resources.
> > > 
> > > I'm arguing in favour of predictable behaviour.  Stability and reliability 
> > > are more important than a bit of swap space.
> > 
> > That's the same argument that says memory overcommit should not be allowed.
> 
> Go back in the thread: I suggested making it an option that the user has to 
> turn on to allow his foot to be shot.  Remember: the common case in the kernel 
> is to be using all memory.

OK, now suppose the user has turned on that option (I think it should be on by
default, like memory overcommit).  How is Jeff going to support it?  That's his
whole point as I understand it.

Instead of providing constructive suggestions on how to solve the problem so that
memory overcommit works properly in this case, I see people telling Jeff there is
no problem.  I think Jeff has a little more of a clue than that.
 
-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 23:26                                                   ` Daniel Phillips
@ 2002-03-06 23:33                                                     ` H. Peter Anvin
  2002-03-07  0:08                                                       ` Daniel Phillips
  0 siblings, 1 reply; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-06 23:33 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Benjamin LaHaise, Jeff Dike, Alan Cox, linux-kernel

Daniel Phillips wrote:

> 
> Instead of providing constructive suggestions on how to solve the problem so that
> memory overcommit works properly in this case, I see people telling Jeff there is
> no problem.  I think Jeff has a little more of a clue than that.
>  


Jeff has clue, but you, Daniel, quite frankly could take a cue.  You nseem
to be jumping into arguments just for the sake of them, but without ever
contribute anything useful.

Please do us all a favour and shut up for once.

	-hpa



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 20:25                                         ` Jeff Dike
  2002-03-06 20:54                                           ` Alan Cox
  2002-03-06 21:27                                           ` David Woodhouse
@ 2002-03-07  0:04                                           ` Richard Gooch
  2 siblings, 0 replies; 96+ messages in thread
From: Richard Gooch @ 2002-03-07  0:04 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Jeff Dike, Alan Cox, H. Peter Anvin, Benjamin LaHaise, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]

David Woodhouse writes:
> 
> jdike@karaya.com said:
> >  Yeah, MADV_DONTNEED looks right.  UML and Linux/s390 (assuming VM has
> > the equivalent of MADV_DONTNEED) would need a hook in free_pages to
> > make that happen. 
> 
>        MADV_DONTNEED
>               Do  not expect access in the near future.  (For the
>               time being, the application is  finished  with  the
>               given range, so the kernel can free resources asso­
>               ciated with it.)
> 
> It's not clear from that that the host kernel is actually permitted to
> discard the data.
> 
> alan@lxorguk.ukuu.org.uk said:
> >  VM allows you to give it back a page and if you use it again you get
> > a clean copy. What it seems to lack is the more ideal "here have this
> > page and if I reuse it trap if you did throw it out" semantic. 
> 
> I've wittered on occasion about other situations where such
> semantics might be useful -- essentially 'drop these pages if you
> need to as if they were clean, and tell me when I next touch them so
> I can recreate their data'.

Indeed. I'd love such a feature. It's got applications in
numerical/scientific code, not just UML.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 23:33                                                     ` H. Peter Anvin
@ 2002-03-07  0:08                                                       ` Daniel Phillips
  0 siblings, 0 replies; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07  0:08 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On March 7, 2002 12:33 am, H. Peter Anvin wrote:
> Daniel Phillips wrote:
> 
> > Instead of providing constructive suggestions on how to solve the problem so that
> > memory overcommit works properly in this case, I see people telling Jeff there is
> > no problem.  I think Jeff has a little more of a clue than that.
> 
> Jeff has clue, but you, Daniel, quite frankly could take a cue.  You nseem
> to be jumping into arguments just for the sake of them, but without ever
> contribute anything useful.

The useful contribution is to stop you and Ben from beating up on Jeff.  Thankyou,
I think I've accomplished that purpose.  Feel free to attack me for that if you feel
the need.

(objectionable comment removed)

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 17:33                                             ` Alan Cox
@ 2002-03-07  0:28                                               ` Jeff Dike
  2002-03-07  0:44                                                 ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-07  0:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jesse Pollard, linux-kernel

alan@lxorguk.ukuu.org.uk said:
> and also enforcing a "must be able to fill in the pages between start
> and end of file" for the tmpfs file size itself is not hard from
> inspection.

So if I mapped a single page from file offset 65M on a 64M tmpfs, that would 
fail?

I'd prefer maps to fail when they make the total maps exceed the tmpfs limit.

Then I can map in smaller chunks, PAGE_SIZE if necessary.  That has the 
disadvantage that the vmas in the host would be even uglier than they are
now because we don't have vma merging any more.

UML would still need that page_alloc hook, except it would map the allocated
pages instead of touching them.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 21:27                                           ` David Woodhouse
  2002-03-06 22:25                                             ` Joseph Malicki
@ 2002-03-07  0:28                                             ` Jeff Dike
  2002-03-07  0:44                                               ` Alan Cox
  1 sibling, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-07  0:28 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Alan Cox, H. Peter Anvin, Benjamin LaHaise, linux-kernel

dwmw2@infradead.org said:
>        MADV_DONTNEED
>               Do  not expect access in the near future.  (For the
>               time being, the application is  finished  with  the
>               given range, so the kernel can free resources asso­
>               ciated with it.)
> It's not clear from that that the host kernel is actually permitted to
> discard the data. 

Hmmm, you have better man pages than me.  I don't have an madvise man page
on either of my boxes (RH 6.2 and 7.2 :-)

>From that description, you're right.  The code is very clear on what happens,
as is the comment above sys_madvise:

 *  MADV_DONTNEED - the application is finished with the given range,
 *		so the kernel can free resources associated with it.

> UML might want that kind of thing for its (clean) page cache pages or
> something, but for pages allocated for kernel stack and task struct we
>  really want the opposite -- we want to make sure they're present when
> we  allocate them, and explicitly discard them when we're done. 

Yeah, that's a decent idea.  If you were going to make it fancier, you could
cover the case that the UML's clean pages are all busy but it has lots of
old dirty pages lying around.  But then you'd need some way for the host to
tell the UML that I/O would be a really bad idea and it should just dump
clean pages.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07  0:28                                               ` Jeff Dike
@ 2002-03-07  0:44                                                 ` Alan Cox
  0 siblings, 0 replies; 96+ messages in thread
From: Alan Cox @ 2002-03-07  0:44 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Alan Cox, Jesse Pollard, linux-kernel

> I'd prefer maps to fail when they make the total maps exceed the tmpfs limit.

That makes more sense and can be done yes. Probably it wants to be a tmpfs
option

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07  0:28                                             ` Jeff Dike
@ 2002-03-07  0:44                                               ` Alan Cox
  0 siblings, 0 replies; 96+ messages in thread
From: Alan Cox @ 2002-03-07  0:44 UTC (permalink / raw)
  To: Jeff Dike
  Cc: David Woodhouse, Alan Cox, H. Peter Anvin, Benjamin LaHaise,
	linux-kernel

> Hmmm, you have better man pages than me.  I don't have an madvise man pag=
> e
> on either of my boxes (RH 6.2 and 7.2 :-)

Curious. I have one on my 7.2 box 8)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 23:20                                                 ` Benjamin LaHaise
  2002-03-06 23:26                                                   ` Daniel Phillips
@ 2002-03-07  1:27                                                   ` Jeff Dike
  2002-03-07  1:52                                                     ` Benjamin LaHaise
  2002-03-07 13:49                                                     ` Alan Cox
  1 sibling, 2 replies; 96+ messages in thread
From: Jeff Dike @ 2002-03-07  1:27 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Daniel Phillips, H. Peter Anvin, Alan Cox, linux-kernel

bcrl@redhat.com said:
> Go back in the thread: I suggested making it an option that the user
> has to  turn on to allow his foot to be shot.

OK, this seems to be the relevant quote (and you seem to be referring to the
kernel build segfaults - correct me if I'm wrong):

bcrl@redhat.com said:
> a UML failing at startup  with out of memory is better than random
> segvs at some later point when the  system is under load.

I showed the kernel build segfaulting as an improvement over UML hanging, 
which is the alternative behavior.

The segfaults were caused by me implementing the simplest possible response
to alloc_pages returning unbacked pages, which is to return NULL to the 
caller.  This is actually wrong because in this failure case, it effectively
changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
compilations.

A slightly fancier recovery would loop calling alloc_pages until it got a set
of already-backed pages (with some possible sleeping in alloc_pages in there).
That would preserve the blocking semantics of GFP_USER, GFP_KERNEL, et al,
and would have allowed the UML userspace (the kernel build) to continue working
as it should.

So, a slightly improved version of the patch (which I can write up if you're
interested in seeing it) would have allowed UML and its userspace to continue
running fine (albeit in less memory than it expected) in the presence of an
overcommited tmpfs.

				Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07  1:27                                                   ` Jeff Dike
@ 2002-03-07  1:52                                                     ` Benjamin LaHaise
  2002-03-08 19:17                                                       ` Jeff Dike
  2002-03-07 13:49                                                     ` Alan Cox
  1 sibling, 1 reply; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-07  1:52 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Daniel Phillips, H. Peter Anvin, Alan Cox, linux-kernel

On Wed, Mar 06, 2002 at 08:27:51PM -0500, Jeff Dike wrote:
> I showed the kernel build segfaulting as an improvement over UML hanging, 
> which is the alternative behavior.

Versus fully allocating the backing store, which would neither hang nor 
cause segfaults.  This is the behaviour that one expects by default, and 
should be the first line of defense before going to the overcommit model.  
Get that aspect of reliability in place, then add the overcommit support.  
What is better: having uml fail before attempting to boot with an unable 
to allocate backing store message, or a random oops during early kernel 
init?  As I see it, supporting the safe mode of operation first makes more 
sense before adding yet another arch hook.

		-ben
-- 
"A man with a bass just walked in,
 and he's putting it down
 on the floor."

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 22:21                                         ` Pavel Machek
@ 2002-03-07 11:30                                           ` Dave Jones
  2002-03-07 18:21                                           ` H. Peter Anvin
  1 sibling, 0 replies; 96+ messages in thread
From: Dave Jones @ 2002-03-07 11:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, David Woodhouse, Jeff Dike, H. Peter Anvin,
	Benjamin LaHaise, linux-kernel

On Wed, Mar 06, 2002 at 11:21:50PM +0100, Pavel Machek wrote:
 > I just imagined hardware which supports freeing memory -- just do not
 > refresh it any more to conserve power ;-))).
 > Granted, it would probably only make sense in big chunks, like 2MB or
 > so... It might make sense for a PDA...

 ISTR reading about one handheld that did something like this (possibly psion)
 The hardware has the ability to migrate data from one memory bank to
 another and power down the least used one.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 13:49                                                     ` Alan Cox
@ 2002-03-07 13:36                                                       ` Daniel Phillips
  2002-03-07 14:04                                                         ` yodaiken
  0 siblings, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 13:36 UTC (permalink / raw)
  To: Alan Cox, Jeff Dike
  Cc: Benjamin LaHaise, H. Peter Anvin, Alan Cox, linux-kernel

On March 7, 2002 02:49 pm, Alan Cox wrote:
> Jeff Dike Apparently wrote
> > caller.  This is actually wrong because in this failure case, it effectively
> > changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
> > allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
> > compilations.
> 
> GFP_KERNEL will sometimes return NULL.

Sad but true.  IMHO we are on track to fix that in this kernel cycle, with
better locked/dirty accounting and rmap to forcibly unmap pages when necessary.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07  1:27                                                   ` Jeff Dike
  2002-03-07  1:52                                                     ` Benjamin LaHaise
@ 2002-03-07 13:49                                                     ` Alan Cox
  2002-03-07 13:36                                                       ` Daniel Phillips
  1 sibling, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-07 13:49 UTC (permalink / raw)
  To: Jeff Dike
  Cc: Benjamin LaHaise, Daniel Phillips, H. Peter Anvin, Alan Cox,
	linux-kernel

> caller.  This is actually wrong because in this failure case, it effectively
> changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
> allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
> compilations.

GFP_KERNEL will sometimes return NULL.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 13:36                                                       ` Daniel Phillips
@ 2002-03-07 14:04                                                         ` yodaiken
  2002-03-07 14:21                                                           ` Daniel Phillips
  0 siblings, 1 reply; 96+ messages in thread
From: yodaiken @ 2002-03-07 14:04 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin, linux-kernel

On Thu, Mar 07, 2002 at 02:36:08PM +0100, Daniel Phillips wrote:
> On March 7, 2002 02:49 pm, Alan Cox wrote:
> > Jeff Dike Apparently wrote
> > > caller.  This is actually wrong because in this failure case, it effectively
> > > changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
> > > allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
> > > compilations.
> > 
> > GFP_KERNEL will sometimes return NULL.
> 
> Sad but true.  IMHO we are on track to fix that in this kernel cycle, with
> better locked/dirty accounting and rmap to forcibly unmap pages when necessary.

Why is that a fix? And how can it work?


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 14:04                                                         ` yodaiken
@ 2002-03-07 14:21                                                           ` Daniel Phillips
  2002-03-07 14:38                                                             ` yodaiken
  2002-03-07 14:43                                                             ` Alan Cox
  0 siblings, 2 replies; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 14:21 UTC (permalink / raw)
  To: yodaiken
  Cc: Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin, linux-kernel

On March 7, 2002 03:04 pm, yodaiken@fsmlabs.com wrote:
> On Thu, Mar 07, 2002 at 02:36:08PM +0100, Daniel Phillips wrote:
> > On March 7, 2002 02:49 pm, Alan Cox wrote:
> > > Jeff Dike Apparently wrote
> > > > caller.  This is actually wrong because in this failure case, it effectively
> > > > changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
> > > > allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
> > > > compilations.
> > > 
> > > GFP_KERNEL will sometimes return NULL.
> > 
> > Sad but true.  IMHO we are on track to fix that in this kernel cycle, with
> > better locked/dirty accounting and rmap to forcibly unmap pages when necessary.
> 
> Why is that a fix? And how can it work?

Since there is always at least one freeable page in the system (or we're oom) then
we just have to find it and we know we can forcibly unmap it.  We do need to know
the total of pinned pages, I should have said locked/dirty/pinned.

Since GFP_KERNEL includes __GFP_WAIT, we are even allowed to wait for dirty page
writeout.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 14:21                                                           ` Daniel Phillips
@ 2002-03-07 14:38                                                             ` yodaiken
  2002-03-07 15:31                                                               ` Daniel Phillips
  2002-03-07 14:43                                                             ` Alan Cox
  1 sibling, 1 reply; 96+ messages in thread
From: yodaiken @ 2002-03-07 14:38 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On Thu, Mar 07, 2002 at 03:21:24PM +0100, Daniel Phillips wrote:
> On March 7, 2002 03:04 pm, yodaiken@fsmlabs.com wrote:
> > On Thu, Mar 07, 2002 at 02:36:08PM +0100, Daniel Phillips wrote:
> > > On March 7, 2002 02:49 pm, Alan Cox wrote:
> > > > Jeff Dike Apparently wrote
> > > > > caller.  This is actually wrong because in this failure case, it effectively
> > > > > changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
> > > > > allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
> > > > > compilations.
> > > > 
> > > > GFP_KERNEL will sometimes return NULL.
> > > 
> > > Sad but true.  IMHO we are on track to fix that in this kernel cycle, with
> > > better locked/dirty accounting and rmap to forcibly unmap pages when necessary.
> > 
> > Why is that a fix? And how can it work?
> 
> Since there is always at least one freeable page in the system (or we're oom) then
> we just have to find it and we know we can forcibly unmap it.  We do need to know
> the total of pinned pages, I should have said locked/dirty/pinned.


What if we are oom?
What if we are on our way to deadlock?
What if the caller of kmalloc will make less good use of the page
than the current owner of the page?

page_t *x,*p;
for(i = 0; i < SOME_MADE_UP_NUMBER_THAT_SEEMS_GOOD;i++)
	if( p = kmalloc(..)){
		copyfromuser(x++,p);
        	dispatch_to_output(p);
	    }
	else {//do the rest later
            ...
          }




	
> 
> Since GFP_KERNEL includes __GFP_WAIT, we are even allowed to wait for dirty page
> writeout.
> 
> -- 
> Daniel

-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 14:21                                                           ` Daniel Phillips
  2002-03-07 14:38                                                             ` yodaiken
@ 2002-03-07 14:43                                                             ` Alan Cox
  2002-03-07 15:32                                                               ` Daniel Phillips
  2002-03-07 15:34                                                               ` Daniel Phillips
  1 sibling, 2 replies; 96+ messages in thread
From: Alan Cox @ 2002-03-07 14:43 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

> Since there is always at least one freeable page in the system (or we're oom) then
> we just have to find it and we know we can forcibly unmap it.  We do need to know
> the total of pinned pages, I should have said locked/dirty/pinned.

What if I did a 4 page allocation ?

And if we are OOM - we want to return NULL

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 14:38                                                             ` yodaiken
@ 2002-03-07 15:31                                                               ` Daniel Phillips
  2002-03-07 16:50                                                                 ` yodaiken
  0 siblings, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 15:31 UTC (permalink / raw)
  To: yodaiken
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On March 7, 2002 03:38 pm, yodaiken@fsmlabs.com wrote:
> On Thu, Mar 07, 2002 at 03:21:24PM +0100, Daniel Phillips wrote:
> > On March 7, 2002 03:04 pm, yodaiken@fsmlabs.com wrote:
> > > On Thu, Mar 07, 2002 at 02:36:08PM +0100, Daniel Phillips wrote:
> > > > On March 7, 2002 02:49 pm, Alan Cox wrote:
> > > > > Jeff Dike Apparently wrote
> > > > > > caller.  This is actually wrong because in this failure case, it effectively
> > > > > > changes the semantics of GFP_USER, GFP_KERNEL, and the other blocking GFP_* 
> > > > > > allocations to GFP_ATOMIC.  And that's what forced UML to segfault the 
> > > > > > compilations.
> > > > > 
> > > > > GFP_KERNEL will sometimes return NULL.
> > > > 
> > > > Sad but true.  IMHO we are on track to fix that in this kernel cycle, with
> > > > better locked/dirty accounting and rmap to forcibly unmap pages when necessary.
> > > 
> > > Why is that a fix? And how can it work?
> > 
> > Since there is always at least one freeable page in the system (or we're oom) then
> > we just have to find it and we know we can forcibly unmap it.  We do need to know
> > the total of pinned pages, I should have said locked/dirty/pinned.
> 
> 
> What if we are oom?

This problem didn't get any worse, we still have to deal with it.  We can wait, so
we deal with it in the standard way (i.e., we puke, have to do something about that.)

> What if we are on our way to deadlock?

huh??

> What if the caller of kmalloc will make less good use of the page
> than the current owner of the page?

That's life, that's what lrus are for.

> page_t *x,*p;
> for(i = 0; i < SOME_MADE_UP_NUMBER_THAT_SEEMS_GOOD;i++)
> 	if( p = kmalloc(..)){
> 		copyfromuser(x++,p);
>         	dispatch_to_output(p);
> 	    }
> 	else {//do the rest later
>             ...
>           }

Please put your thinking cap on and come up with a less borked interface
for doing that ;-)

You won't find one if you don't look for it.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 14:43                                                             ` Alan Cox
@ 2002-03-07 15:32                                                               ` Daniel Phillips
  2002-03-07 16:19                                                                 ` Alan Cox
  2002-03-07 15:34                                                               ` Daniel Phillips
  1 sibling, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 15:32 UTC (permalink / raw)
  To: Alan Cox
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On March 7, 2002 03:43 pm, Alan Cox wrote:
> > Since there is always at least one freeable page in the system (or we're oom) then
> > we just have to find it and we know we can forcibly unmap it.  We do need to know
> > the total of pinned pages, I should have said locked/dirty/pinned.
> 
> What if I did a 4 page allocation ?

Higher order allocation - imho we can fix that too, eventually, however it's a lot
more work.  First we have to have reliable physical defragmentation.

> And if we are OOM - we want to return NULL

What good does that do?

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 14:43                                                             ` Alan Cox
  2002-03-07 15:32                                                               ` Daniel Phillips
@ 2002-03-07 15:34                                                               ` Daniel Phillips
  2002-03-07 19:18                                                                 ` Andrew Morton
  1 sibling, 1 reply; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 15:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On March 7, 2002 03:43 pm, Alan Cox wrote:
> And if we are OOM - we want to return NULL

Oh, right, it lets an allocator that didn't 100% need the page use a
fallback strategy, but for that we probably want a different interface
anyway, such as a GFP flag that says 'fail if this looks hard to get'.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 15:32                                                               ` Daniel Phillips
@ 2002-03-07 16:19                                                                 ` Alan Cox
  2002-03-07 17:54                                                                   ` Daniel Phillips
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-07 16:19 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

> Higher order allocation - imho we can fix that too, eventually, however it's a lot
> more work.  First we have to have reliable physical defragmentation.
> 
> > And if we are OOM - we want to return NULL
> 
> What good does that do?

It allows us to continue. It avoids the deadlocks. It lets the caller
make an intelligent decision.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 15:31                                                               ` Daniel Phillips
@ 2002-03-07 16:50                                                                 ` yodaiken
  2002-03-07 18:07                                                                   ` Daniel Phillips
  0 siblings, 1 reply; 96+ messages in thread
From: yodaiken @ 2002-03-07 16:50 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On Thu, Mar 07, 2002 at 04:31:10PM +0100, Daniel Phillips wrote:
> > > > Why is that a fix? And how can it work?
> > > 
> > > Since there is always at least one freeable page in the system (or we're oom) then
> > > we just have to find it and we know we can forcibly unmap it.  We do need to know
> > > the total of pinned pages, I should have said locked/dirty/pinned.
> > 
> > 
> > What if we are oom?
> 
> This problem didn't get any worse, we still have to deal with it.  We can wait, so
> we deal with it in the standard way (i.e., we puke, have to do something about that.)

So it can return NULL? 

> 
> > What if we are on our way to deadlock?
> 
> huh??

Process A needs 4 pages, Process B needs 4 pages, each grabs 3.
One easy, traditional unix algorithm for dealing with this is
	for(i=0; i < 4; i++)if !(p[i]=kmallloc(...))
                                free all that we have so far


> > What if the caller of kmalloc will make less good use of the page
> > than the current owner of the page?
> 
> That's life, that's what lrus are for.

Really? I thought LRUs were to approximate working sets. Obviously
if a program is kmallocing its working set is changing but that
does not tell us anything about whether it is a correct decision to
rip a page from the working set of another process.

> 
> > page_t *x,*p;
> > for(i = 0; i < SOME_MADE_UP_NUMBER_THAT_SEEMS_GOOD;i++)
> > 	if( p = kmalloc(..)){
> > 		copyfromuser(x++,p);
> >         	dispatch_to_output(p);
> > 	    }
> > 	else {//do the rest later
> >             ...
> >           }
> 
> Please put your thinking cap on and come up with a less borked interface
> for doing that ;-)
> 
> You won't find one if you don't look for it.

I'm too dumb to come up with a solution here, but you are the one
changing the interface, so surely you have a couple of "less borked"
solutions in mind - right?







-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 16:19                                                                 ` Alan Cox
@ 2002-03-07 17:54                                                                   ` Daniel Phillips
  0 siblings, 0 replies; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 17:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On March 7, 2002 05:19 pm, Alan Cox wrote:
> > Higher order allocation - imho we can fix that too, eventually, however it's a lot
> > more work.  First we have to have reliable physical defragmentation.
> > 
> > > And if we are OOM - we want to return NULL
> > 
> > What good does that do?
> 
> It allows us to continue. It avoids the deadlocks.

Could you describe the deadlock, please?

> It lets the caller make an intelligent decision.

I maintain it's the wrong interface, we're mixing two concepts together there:

  - VM can't find blocks that are freeable, so fails and dumps the problem
    on the caller, which has to busy wait.  This sucks.

  - The VM is under heavy load and the caller doesn't really need the memory
    that badly because it has a fallback, the VM somehow knows this, so fails
    the allocation and everybody is happy.

These should be separated, and we should fix the former.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 16:50                                                                 ` yodaiken
@ 2002-03-07 18:07                                                                   ` Daniel Phillips
  2002-03-07 18:15                                                                     ` yodaiken
  2002-03-07 19:22                                                                     ` Alan Cox
  0 siblings, 2 replies; 96+ messages in thread
From: Daniel Phillips @ 2002-03-07 18:07 UTC (permalink / raw)
  To: yodaiken
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On March 7, 2002 05:50 pm, yodaiken@fsmlabs.com wrote:
> On Thu, Mar 07, 2002 at 04:31:10PM +0100, Daniel Phillips wrote:
> > > > > Why is that a fix? And how can it work?
> > > > 
> > > > Since there is always at least one freeable page in the system (or we're oom) then
> > > > we just have to find it and we know we can forcibly unmap it.  We do need to know
> > > > the total of pinned pages, I should have said locked/dirty/pinned.
> > > 
> > > What if we are oom?
> > 
> > This problem didn't get any worse, we still have to deal with it.  We can wait, so
> > we deal with it in the standard way (i.e., we puke, have to do something about that.)
> 
> So it can return NULL? 

Returning null here won't help if the caller doesn't have a fallback, or if the fallback
is unacceptable, such as losing a filesystem transaction.

> > > What if we are on our way to deadlock?
> > 
> > huh??
> 
> Process A needs 4 pages, Process B needs 4 pages, each grabs 3.

This is no new deadlock.  Supposing each has successfully grabbed 3, what
good does it do if the process is too clueless to release the pages it's
already grabbed, because the 4th page alloc fails?  (The first 3 may have
been alloced in a completely different part of the program.)  And if the
process does know how to do this, it should tell the VM that *then* the VM
should feel free to fail it.

> One easy, traditional unix algorithm for dealing with this is
> 	for(i=0; i < 4; i++)if !(p[i]=kmallloc(...))
>                                 free all that we have so far

Just or in GFP_ok_to_fail there.

> > > What if the caller of kmalloc will make less good use of the page
> > > than the current owner of the page?
> > 
> > That's life, that's what lrus are for.
> 
> Really? I thought LRUs were to approximate working sets. Obviously
> if a program is kmallocing its working set is changing but that
> does not tell us anything about whether it is a correct decision to
> rip a page from the working set of another process.

We're getting way far from the original question here.  Our lru has no
concept of working set, it's completely global.  That's not so great and
it's another problem to tackle.  Sometime.

> > You won't find one if you don't look for it.
> 
> I'm too dumb to come up with a solution here, but you are the one
> changing the interface, so surely you have a couple of "less borked"
> solutions in mind - right?

Yes.  Well, I'm not alone here, ping Marcelo on that if you like.  This is
known borkness that's been deferred while more pressing borkness is dealt
with.

-- 
Daniel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 18:07                                                                   ` Daniel Phillips
@ 2002-03-07 18:15                                                                     ` yodaiken
  2002-03-07 19:22                                                                     ` Alan Cox
  1 sibling, 0 replies; 96+ messages in thread
From: yodaiken @ 2002-03-07 18:15 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

On Thu, Mar 07, 2002 at 07:07:23PM +0100, Daniel Phillips wrote:
> > Really? I thought LRUs were to approximate working sets. Obviously
> > if a program is kmallocing its working set is changing but that
> > does not tell us anything about whether it is a correct decision to
> > rip a page from the working set of another process.
> 
> We're getting way far from the original question here.  Our lru has no
> concept of working set, it's completely global.  That's not so great and
> it's another problem to tackle.  Sometime.

Global lru is an approximation of per-task working set. That's why it
works. But it's not perfect.

> 
> > > You won't find one if you don't look for it.
> > 
> > I'm too dumb to come up with a solution here, but you are the one
> > changing the interface, so surely you have a couple of "less borked"
> > solutions in mind - right?
> 
> Yes.  Well, I'm not alone here, ping Marcelo on that if you like.  This is
> known borkness that's been deferred while more pressing borkness is dealt
> with.

So you and Marcelo are planning on making changes to the semantics
of primitive memory allocation modules in the production kernel?

Can that be true? I hope not.



-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-06 22:21                                         ` Pavel Machek
  2002-03-07 11:30                                           ` Dave Jones
@ 2002-03-07 18:21                                           ` H. Peter Anvin
  1 sibling, 0 replies; 96+ messages in thread
From: H. Peter Anvin @ 2002-03-07 18:21 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, David Woodhouse, Jeff Dike, Benjamin LaHaise, linux-kernel

Pavel Machek wrote:

> Hi!
> 
> 
>>>You say 'at once'. Does UML somehow give pages back to the host when they're 
>>>freed, so the pages that are no longer used by UML can be discarded by the 
>>>host instead of getting swapped?
>>>
>>Doesn't seem to but it looks like madvise might be enough to make that
>>happen. That BTW is an issue for more than UML - it has a bearing on
>>running lots of Linux instances on any supervisor/virtualising system
>>like S/390
>>
> 
> I just imagined hardware which supports freeing memory -- just do not
> refresh it any more to conserve power ;-))).
> 

> Granted, it would probably only make sense in big chunks, like 2MB or
> so... It might make sense for a PDA...
> 									Pavel


Unlikely.  Also, if you're using ECC, then that really screws with you.

However, if it is an issue for more than UML (I still consider the 
particular UML case "in case you have a UML on a tmpfs set up by an 
idiot admin" completely bogus) then it's another issue.  The S/390 issue 
is real.

	-hpa



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 15:34                                                               ` Daniel Phillips
@ 2002-03-07 19:18                                                                 ` Andrew Morton
  2002-03-07 20:10                                                                   ` Rik van Riel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Morton @ 2002-03-07 19:18 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

Daniel Phillips wrote:
> 
> a GFP flag that says 'fail if this looks hard to get'.

Something like that would provide a solution to the
readahead thrashing problem.

-

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 18:07                                                                   ` Daniel Phillips
  2002-03-07 18:15                                                                     ` yodaiken
@ 2002-03-07 19:22                                                                     ` Alan Cox
  2002-03-07 22:43                                                                       ` David Woodhouse
  1 sibling, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-07 19:22 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: yodaiken, Alan Cox, Jeff Dike, Benjamin LaHaise, H. Peter Anvin,
	linux-kernel

> > So it can return NULL? 
> 
> Returning null here won't help if the caller doesn't have a fallback, or if the fallback
> is unacceptable, such as losing a filesystem transaction.

Not having a fallback is unacceptable. Thats the real problem. You can't
go around pandering to sloppy coders who can't work a memory allocator


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 19:18                                                                 ` Andrew Morton
@ 2002-03-07 20:10                                                                   ` Rik van Riel
  2002-03-07 20:56                                                                     ` Andrew Morton
  0 siblings, 1 reply; 96+ messages in thread
From: Rik van Riel @ 2002-03-07 20:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Phillips, Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel

On Thu, 7 Mar 2002, Andrew Morton wrote:
> Daniel Phillips wrote:
> >
> > a GFP flag that says 'fail if this looks hard to get'.
>
> Something like that would provide a solution to the
> readahead thrashing problem.

Nope.  Readahead pages are clean and very easy to evict, so
it's still trivial to evict all the pages from another readahead
window because everybody's readahead window is too large.

regards,

Rik
-- 
<insert bitkeeper endorsement here>

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 20:10                                                                   ` Rik van Riel
@ 2002-03-07 20:56                                                                     ` Andrew Morton
  2002-03-07 21:23                                                                       ` Rik van Riel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Morton @ 2002-03-07 20:56 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Daniel Phillips, Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel

Rik van Riel wrote:
> 
> On Thu, 7 Mar 2002, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > >
> > > a GFP flag that says 'fail if this looks hard to get'.
> >
> > Something like that would provide a solution to the
> > readahead thrashing problem.
> 
> Nope.  Readahead pages are clean and very easy to evict, so
> it's still trivial to evict all the pages from another readahead
> window because everybody's readahead window is too large.
> 

I was thinking an explicit GFP_READAHEAD and PG_readahead.
Where a GFP_READAHEAD allocation would fail if it can't 
find any non-readahead pages.  And it would fail if it
had to perform I/O.

That's not nice - it'd result in large LRU walks.  But it'd
be better than the 10x slowdown which readahead thrashing
causes.

Any clever ideas?

-

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 20:56                                                                     ` Andrew Morton
@ 2002-03-07 21:23                                                                       ` Rik van Riel
  2002-03-07 22:02                                                                         ` Andrew Morton
  0 siblings, 1 reply; 96+ messages in thread
From: Rik van Riel @ 2002-03-07 21:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Phillips, Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel

On Thu, 7 Mar 2002, Andrew Morton wrote:

> > Nope.  Readahead pages are clean and very easy to evict, so
> > it's still trivial to evict all the pages from another readahead
> > window because everybody's readahead window is too large.

> Any clever ideas?

1) keep track of which pages we are reading ahead
   ... the readahead code already does this

2) at read() or fault time, see if the page
   (a) is resident
   (b) is in the current readahead window,
       ie. already read ahead

3) if the page is in the current readahead window
   but NOT resident, the page was read in and
   evicted before we got around to using it, so
   readahead window thrashing is going on
   ... in that case, collapse the size of the
   readahead window TCP-style

4) slowly growing the readahead window when there is
   enough memory available, in order to minimise the
   number of disk seeks

5) the growing in (3) and shrinking in (4) mean that
   the readahead size of all streaming IO in the system
   gets automatically balanced against each other and
   against other memory demand in the system

regards,

Rik
-- 
<insert bitkeeper endorsement here>

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 21:23                                                                       ` Rik van Riel
@ 2002-03-07 22:02                                                                         ` Andrew Morton
  2002-03-07 22:10                                                                           ` Rik van Riel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Morton @ 2002-03-07 22:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Daniel Phillips, Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel

Rik van Riel wrote:
> 
> On Thu, 7 Mar 2002, Andrew Morton wrote:
> 
> > > Nope.  Readahead pages are clean and very easy to evict, so
> > > it's still trivial to evict all the pages from another readahead
> > > window because everybody's readahead window is too large.
> 
> > Any clever ideas?
> 
> 1) keep track of which pages we are reading ahead
>    ... the readahead code already does this
> 
> 2) at read() or fault time, see if the page
>    (a) is resident
>    (b) is in the current readahead window,
>        ie. already read ahead
> 
> 3) if the page is in the current readahead window
>    but NOT resident, the page was read in and
>    evicted before we got around to using it, so
>    readahead window thrashing is going on
>    ... in that case, collapse the size of the
>    readahead window TCP-style

I have all that.  See handle_ra_thrashing() in 
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.6-pre2/dallocbase-10-readahead.patch

> 4) slowly growing the readahead window when there is
>    enough memory available, in order to minimise the
>    number of disk seeks
> 
> 5) the growing in (3) and shrinking in (4) mean that
>    the readahead size of all streaming IO in the system
>    gets automatically balanced against each other and
>    against other memory demand in the system

Doesn't work.

Ah, this is hard to describe.

umm.

a) Suppose that we're getting readahead thrashing.  readahead
   pages are getting dropped.  So we keep seeking to each
   file to get new data, so we do a ton of seeking.

b) Suppose that we nicely detect thrashing and reduce the readahead
   window.  Well, we *still* need to seek to each file to read
   some blocks.

See?  They're equivalent.  In case a) we're doing more (pointless)
I/O, but the cost of that is vanishingly small because it's just
one request.

So what *is* a solution.  Well, there's only so much memory available.
In either case a) or case b) we're "fairly" distributing that memory
between all files.  And that's the problem.  *All* the files have too
small a readahead window.  Which points one at: we need to stop being
fair. We need to give some files a good readahead window and others
not.   The "soft pinning" which I propose with GFP_READAHEAD and
PG_readhead might have that effect, I think.

I'll try it, see how it feels.

-

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 22:02                                                                         ` Andrew Morton
@ 2002-03-07 22:10                                                                           ` Rik van Riel
  2002-03-07 22:23                                                                             ` Andrew Morton
  0 siblings, 1 reply; 96+ messages in thread
From: Rik van Riel @ 2002-03-07 22:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Phillips, Alan Cox, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel

On Thu, 7 Mar 2002, Andrew Morton wrote:

> > 5) the growing in (3) and shrinking in (4) mean that
> >    the readahead size of all streaming IO in the system
> >    gets automatically balanced against each other and
> >    against other memory demand in the system
>
> Doesn't work.
>
> Ah, this is hard to describe.
>
> umm.
>
> a) Suppose that we're getting readahead thrashing.  readahead
>    pages are getting dropped.  So we keep seeking to each
>    file to get new data, so we do a ton of seeking.
>
> b) Suppose that we nicely detect thrashing and reduce the readahead
>    window.  Well, we *still* need to seek to each file to read
>    some blocks.
>
> See?  They're equivalent.  In case a) we're doing more (pointless)
> I/O, but the cost of that is vanishingly small because it's just
> one request.
>
> So what *is* a solution.  Well, there's only so much memory available.
> In either case a) or case b) we're "fairly" distributing that memory
> between all files.  And that's the problem.  *All* the files have too
> small a readahead window.  Which points one at: we need to stop being
> fair. We need to give some files a good readahead window and others
> not.   The "soft pinning" which I propose with GFP_READAHEAD and
> PG_readhead might have that effect, I think.

Actually, it could boil down to something more:

use-once reduces the VM to FIFO order, which suffers from
belady's anomaly so it doesn't matter much how much memory
you throw at it

drop-behind will suffer the same problem once the readahead
memory is too large to keep in the system, but at least the
already-used pages won't kick out readahead pages

regards,

Rik
-- 
<insert bitkeeper endorsement here>

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 22:10                                                                           ` Rik van Riel
@ 2002-03-07 22:23                                                                             ` Andrew Morton
  2002-03-07 22:27                                                                               ` Rik van Riel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Morton @ 2002-03-07 22:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

Rik van Riel wrote:
>
> > So what *is* a solution.  Well, there's only so much memory available.
> > In either case a) or case b) we're "fairly" distributing that memory
> > between all files.  And that's the problem.  *All* the files have too
> > small a readahead window.  Which points one at: we need to stop being
> > fair. We need to give some files a good readahead window and others
> > not.   The "soft pinning" which I propose with GFP_READAHEAD and
> > PG_readhead might have that effect, I think.
> 
> Actually, it could boil down to something more:
> 
> use-once reduces the VM to FIFO order, which suffers from
> belady's anomaly so it doesn't matter much how much memory
> you throw at it
> 
> drop-behind will suffer the same problem once the readahead
> memory is too large to keep in the system, but at least the
> already-used pages won't kick out readahead pages

err..  Was there a fix in there somewhere, or are we stuck?

-

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 22:23                                                                             ` Andrew Morton
@ 2002-03-07 22:27                                                                               ` Rik van Riel
  2002-03-07 22:41                                                                                 ` Andrew Morton
  2002-03-07 22:42                                                                                 ` David Lang
  0 siblings, 2 replies; 96+ messages in thread
From: Rik van Riel @ 2002-03-07 22:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, 7 Mar 2002, Andrew Morton wrote:

> > use-once reduces the VM to FIFO order, which suffers from
> > belady's anomaly so it doesn't matter much how much memory
> > you throw at it
> >
> > drop-behind will suffer the same problem once the readahead
> > memory is too large to keep in the system, but at least the
> > already-used pages won't kick out readahead pages
>
> err..  Was there a fix in there somewhere, or are we stuck?

Imagine how TCP backoff would work if it kept old packets
around and would drop random packets because of too many
old packets in the buffers.

I suspect that the readahead window resizing might work
when we throw away the already-used streaming IO pages
before we start throwing away any pages we're about to
use.

regards,

Rik
-- 
<insert bitkeeper endorsement here>

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 22:27                                                                               ` Rik van Riel
@ 2002-03-07 22:41                                                                                 ` Andrew Morton
  2002-03-07 22:42                                                                                 ` David Lang
  1 sibling, 0 replies; 96+ messages in thread
From: Andrew Morton @ 2002-03-07 22:41 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

Rik van Riel wrote:
> 
> On Thu, 7 Mar 2002, Andrew Morton wrote:
> 
> > > use-once reduces the VM to FIFO order, which suffers from
> > > belady's anomaly so it doesn't matter much how much memory
> > > you throw at it
> > >
> > > drop-behind will suffer the same problem once the readahead
> > > memory is too large to keep in the system, but at least the
> > > already-used pages won't kick out readahead pages
> >
> > err..  Was there a fix in there somewhere, or are we stuck?
> 
> Imagine how TCP backoff would work if it kept old packets
> around and would drop random packets because of too many
> old packets in the buffers.
> 
> I suspect that the readahead window resizing might work
> when we throw away the already-used streaming IO pages
> before we start throwing away any pages we're about to
> use.

ewww..  You seem to be implying that when the readahead
code goes to get a new page, it's reclaiming unused
readahead pages *in preference to* already-used pages.

That would be awful, wouldn't it?

Perhaps an algorithm would be:

a) Call mark_page_accessed once against readahead pages.

b) If thrashing is detected, call mark_page_accessed
   twice against readahead pages, to move them onto the
   active list.

   The intent being to say "this page is important.  Throw
   something else away".

Seems this would delay the onset of the problem significantly?

-

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 22:27                                                                               ` Rik van Riel
  2002-03-07 22:41                                                                                 ` Andrew Morton
@ 2002-03-07 22:42                                                                                 ` David Lang
  1 sibling, 0 replies; 96+ messages in thread
From: David Lang @ 2002-03-07 22:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, linux-kernel

in addition by rducing the amount of readahead you do for each file you
can stabilize into a mode where you are doing _some_ readahead and not
thrashing so this will reduce your seeks.

David Lang



On Thu, 7 Mar 2002, Rik van Riel wrote:

> Date: Thu, 7 Mar 2002 19:27:49 -0300 (BRT)
> From: Rik van Riel <riel@conectiva.com.br>
> To: Andrew Morton <akpm@zip.com.au>
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [RFC] Arch option to touch newly allocated pages
>
> On Thu, 7 Mar 2002, Andrew Morton wrote:
>
> > > use-once reduces the VM to FIFO order, which suffers from
> > > belady's anomaly so it doesn't matter much how much memory
> > > you throw at it
> > >
> > > drop-behind will suffer the same problem once the readahead
> > > memory is too large to keep in the system, but at least the
> > > already-used pages won't kick out readahead pages
> >
> > err..  Was there a fix in there somewhere, or are we stuck?
>
> Imagine how TCP backoff would work if it kept old packets
> around and would drop random packets because of too many
> old packets in the buffers.
>
> I suspect that the readahead window resizing might work
> when we throw away the already-used streaming IO pages
> before we start throwing away any pages we're about to
> use.
>
> regards,
>
> Rik
> --
> <insert bitkeeper endorsement here>
>
> http://www.surriel.com/		http://distro.conectiva.com/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 19:22                                                                     ` Alan Cox
@ 2002-03-07 22:43                                                                       ` David Woodhouse
  2002-03-07 23:09                                                                         ` Alan Cox
  0 siblings, 1 reply; 96+ messages in thread
From: David Woodhouse @ 2002-03-07 22:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: Daniel Phillips, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel


alan@lxorguk.ukuu.org.uk said:
>  Not having a fallback is unacceptable. Thats the real problem. You
> can't go around pandering to sloppy coders who can't work a memory
> allocator 

OTOH there is perhaps some justification for distinguishing between 'If you 
fail this I'll tell the user -ENOMEM and continue happily on my way' 
allocations and 'If you fail this I lose track of hardware state and all is 
fucked till we reboot' ones.

--
dwmw2



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 23:09                                                                         ` Alan Cox
@ 2002-03-07 22:57                                                                           ` David Woodhouse
  0 siblings, 0 replies; 96+ messages in thread
From: David Woodhouse @ 2002-03-07 22:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Daniel Phillips, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel


alan@lxorguk.ukuu.org.uk said:
>  None at all. If you needed the memory before you committed to an
> operation you should have reserved it before you started. See "sloppy
> coders"

This is true. I must admit I was having trouble trying to think of a real 
case where the latter applied in _sane_ code.

--
dwmw2



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07 22:43                                                                       ` David Woodhouse
@ 2002-03-07 23:09                                                                         ` Alan Cox
  2002-03-07 22:57                                                                           ` David Woodhouse
  0 siblings, 1 reply; 96+ messages in thread
From: Alan Cox @ 2002-03-07 23:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Alan Cox, Daniel Phillips, yodaiken, Jeff Dike, Benjamin LaHaise,
	H. Peter Anvin, linux-kernel

> alan@lxorguk.ukuu.org.uk said:
> >  Not having a fallback is unacceptable. Thats the real problem. You
> > can't go around pandering to sloppy coders who can't work a memory
> > allocator 
> 
> OTOH there is perhaps some justification for distinguishing between 'If you 
> fail this I'll tell the user -ENOMEM and continue happily on my way' 
> allocations and 'If you fail this I lose track of hardware state and all is 
> fucked till we reboot' ones.

None at all. If you needed the memory before you committed to an operation
you should have reserved it before you started. See "sloppy coders"

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-07  1:52                                                     ` Benjamin LaHaise
@ 2002-03-08 19:17                                                       ` Jeff Dike
  2002-03-08 21:22                                                         ` Benjamin LaHaise
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Dike @ 2002-03-08 19:17 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Daniel Phillips, H. Peter Anvin, Alan Cox, linux-kernel

bcrl@redhat.com said:
> Versus fully allocating the backing store, which would neither hang
> nor  cause segfaults.  This is the behaviour that one expects by
> default, and  should be the first line of defense before going to the
> overcommit model.   Get that aspect of reliability in place, then add
> the overcommit support.

OK, the patch below (against UML 2.4.18-2) implements reliable overcommit 
for UML.

The test was the same as before -
	64M tmpfs on /tmp
	two 64M UMLs
	one -j 2 kernel build running in each

tmpfs was exhausted nearly immediately.  Both builds ran to completion.
At the end, the 64M tmpfs was divided roughly 30M/35M between the two UMLs.

The first chunk of the patch (mm.h) is the hook that I started this thread 
talking about.  It's a noop for all arches except UML (or s390 if they decide
they can use it).

The next two (asm/page.h and mem.c) implement the hook for UML.  I believe
it correctly preserves the failure semantics of alloc_pages.  Please let me
know if I missed something.

It tests for unbacked pages by writing to them and catching the resulting
SIGBUS.  On a host with address space accounting, it would instead map the
page and catch the map failures.

The rest of the patch is UML bug fixes which you're only interested in if
you want to boot it up.

One bug - if alloc_pages returns a combination of backed and unbacked pages
for an order > 0 allocation, the backed pages will effectively be leaked.

TBD -
	a corresponding arch hook in free_pages which UML can use for 
	MADV_DONTNEED

	some way of poking at unbacked pages to see if they are now backed
	and can be released back to free_pages

These two items would go some way to allowing multiple UMLs to pass host
memory back and forth as needed when it gets scarce.

				Jeff

diff -Naur um/include/linux/mm.h back/include/linux/mm.h
--- um/include/linux/mm.h	Thu Mar  7 11:56:36 2002
+++ back/include/linux/mm.h	Thu Mar  7 11:57:31 2002
@@ -358,6 +358,13 @@
 extern struct page * FASTCALL(__alloc_pages(unsigned int gfp_mask, unsigned int order, zonelist_t *zonelist));
 extern struct page * alloc_pages_node(int nid, unsigned int gfp_mask, unsigned int order);
 
+#ifndef HAVE_ARCH_VALIDATE
+static inline struct page *arch_validate(struct page *page, unsigned int gfp_mask, int order)
+{
+        return(page);
+}
+#endif
+
 static inline struct page * alloc_pages(unsigned int gfp_mask, unsigned int order)
 {
 	/*
@@ -365,7 +372,7 @@
 	 */
 	if (order >= MAX_ORDER)
 		return NULL;
-	return _alloc_pages(gfp_mask, order);
+	return arch_validate(_alloc_pages(gfp_mask, order), gfp_mask, order);
 }
 
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
diff -Naur um/include/asm-um/page.h back/include/asm-um/page.h
--- um/include/asm-um/page.h	Mon Mar  4 17:27:34 2002
+++ back/include/asm-um/page.h	Thu Mar  7 11:57:01 2002
@@ -42,4 +42,7 @@
 #define virt_to_page(kaddr)	(mem_map + (__pa(kaddr) >> PAGE_SHIFT))
 #define VALID_PAGE(page)	((page - mem_map) < max_mapnr)
 
+extern struct page *arch_validate(struct page *page, int mask, int order);
+#define HAVE_ARCH_VALIDATE
+
 #endif
diff -Naur um/arch/um/kernel/mem.c back/arch/um/kernel/mem.c
--- um/arch/um/kernel/mem.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/mem.c	Thu Mar  7 11:57:17 2002
@@ -212,6 +212,39 @@
 "    just be swapped out.\n        Example: mem=64M\n\n"
 );
 
+struct page *arch_validate(struct page *page, int mask, int order)
+{
+	unsigned long addr, zero = 0;
+	int i;
+
+ again:
+	if(page == NULL) return(page);
+	addr = (unsigned long) page_address(page);
+	for(i = 0; i < (1 << order); i++){
+		current->thread.fault_addr = (void *) addr;
+		if(__do_copy_to_user((void *) addr, &zero, 
+				     sizeof(zero),
+				     &current->thread.fault_addr,
+				     &current->thread.fault_catcher)){
+			if(!(mask & __GFP_WAIT)) return(NULL);
+			else break;
+		}
+		addr += PAGE_SIZE;
+	}
+	if(i == (1 << order)) return(page);
+	page = _alloc_pages(mask, order);
+	goto again;
+}
+
+extern void relay_signal(int sig, void *sc, int usermode);
+
+void bus_handler(int sig, void *sc, int usermode)
+{
+	if(current->thread.fault_catcher != NULL)
+		do_longjmp(current->thread.fault_catcher);
+	else relay_signal(sig, sc, usermode);
+}
+
 /*
  * Overrides for Emacs so that we follow Linus's tabbing style.
  * Emacs will notice this stuff at the end of the file and automatically
diff -Naur um/arch/um/kernel/exec_kern.c back/arch/um/kernel/exec_kern.c
--- um/arch/um/kernel/exec_kern.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/exec_kern.c	Mon Mar  4 18:05:20 2002
@@ -38,6 +38,12 @@
 	int new_pid;
 
 	stack = alloc_stack();
+	if(stack == 0){
+		printk(KERN_ERR 
+		       "flush_thread : failed to allocate temporary stack\n");
+		do_exit(SIGKILL);
+	}
+		
 	new_pid = start_fork_tramp((void *) current->thread.kernel_stack,
 				   stack, 0, exec_tramp);
 	if(new_pid < 0){
diff -Naur um/arch/um/kernel/process_kern.c back/arch/um/kernel/process_kern.c
--- um/arch/um/kernel/process_kern.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/process_kern.c	Mon Mar  4 18:05:20 2002
@@ -141,7 +141,7 @@
 	unsigned long page;
 
 	if((page = __get_free_page(GFP_KERNEL)) == 0)
-		panic("Couldn't allocate new stack");
+		return(0);
 	stack_protections(page);
 	return(page);
 }
@@ -318,6 +318,11 @@
 		panic("copy_thread : pipe failed");
 	if(current->thread.forking){
 		stack = alloc_stack();
+		if(stack == 0){
+			printk(KERN_ERR "copy_thread : failed to allocate "
+			       "temporary stack\n");
+			return(-ENOMEM);
+		}
 		clone_vm = (p->mm == current->mm);
 		p->thread.temp_stack = stack;
 		new_pid = start_fork_tramp((void *) p->thread.kernel_stack,
diff -Naur um/arch/um/kernel/trap_kern.c back/arch/um/kernel/trap_kern.c
--- um/arch/um/kernel/trap_kern.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/trap_kern.c	Mon Mar  4 18:05:20 2002
@@ -30,6 +30,7 @@
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	struct siginfo si;
+	void *catcher;
 	pgd_t *pgd;
 	pmd_t *pmd;
 	pte_t *pte;
@@ -40,6 +41,7 @@
 		return(0);
 	}
 	if(mm == NULL) panic("Segfault with no mm");
+	catcher = current->thread.fault_catcher;
 	si.si_code = SEGV_MAPERR;
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -84,10 +86,10 @@
 	up_read(&mm->mmap_sem);
 	return(0);
  bad:
-	if (current->thread.fault_catcher != NULL) {
+	if(catcher != NULL) {
 		current->thread.fault_addr = (void *) address;
 		up_read(&mm->mmap_sem);
-		do_longjmp(current->thread.fault_catcher);
+		do_longjmp(catcher);
 	} 
 	else if(current->thread.fault_addr != NULL){
 		panic("fault_addr set but no fault catcher");
@@ -120,6 +122,7 @@
 
 void relay_signal(int sig, void *sc, int usermode)
 {
+	if(!usermode) panic("Kernel mode signal %d", sig);
 	force_sig(sig, current);
 }
 
diff -Naur um/arch/um/kernel/trap_user.c back/arch/um/kernel/trap_user.c
--- um/arch/um/kernel/trap_user.c	Mon Mar  4 17:27:34 2002
+++ back/arch/um/kernel/trap_user.c	Mon Mar  4 18:05:20 2002
@@ -420,11 +420,13 @@
 
 extern int timer_ready, timer_on;
 
+extern void bus_handler(int sig, void *sc, int usermode);
+
 static void (*handlers[])(int, void *, int) = {
 	[ SIGTRAP ] relay_signal,
 	[ SIGFPE ] relay_signal,
 	[ SIGILL ] relay_signal,
-	[ SIGBUS ] relay_signal,
+	[ SIGBUS ] bus_handler,
 	[ SIGSEGV] segv_handler,
 	[ SIGIO ] sigio_handler,
 	[ SIGVTALRM ] timer_handler,


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC] Arch option to touch newly allocated pages
  2002-03-08 19:17                                                       ` Jeff Dike
@ 2002-03-08 21:22                                                         ` Benjamin LaHaise
  0 siblings, 0 replies; 96+ messages in thread
From: Benjamin LaHaise @ 2002-03-08 21:22 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Daniel Phillips, H. Peter Anvin, Alan Cox, linux-kernel

On Fri, Mar 08, 2002 at 02:17:53PM -0500, Jeff Dike wrote:
> OK, the patch below (against UML 2.4.18-2) implements reliable overcommit 
> for UML.

Well, I still dislike it, but I guess it'll have to do.  The only nits I see 
about the patch are: could you make the inline function a #define for the 
no-arch_validate case?  Also, the format of if statements is a bit abnormal: 
please add line breaks as appropriate.  Aside from that, go ahead.

		-ben

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2002-03-08 21:23 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-03 21:12 [RFC] Arch option to touch newly allocated pages Jeff Dike
2002-03-03 22:01 ` Alan Cox
2002-03-03 23:27   ` Jeff Dike
2002-03-03 23:48     ` Alan Cox
2002-03-04  3:16       ` Jeff Dike
2002-03-04  3:35         ` Alan Cox
2002-03-04  5:04           ` Jeff Dike
2002-03-04 15:09             ` Alan Cox
2002-03-04 17:42               ` Jeff Dike
2002-03-04 18:29                 ` Alan Cox
2002-03-04 18:36                   ` Jeff Dike
2002-03-04 18:49                     ` Alan Cox
2002-03-04 20:46                       ` Jeff Dike
2002-03-04 22:49                         ` Alan Cox
2002-03-04 17:46             ` H. Peter Anvin
2002-03-04 18:34               ` Jeff Dike
2002-03-04 18:33                 ` H. Peter Anvin
2002-03-04 20:36                   ` Jeff Dike
2002-03-04 22:51                     ` Alan Cox
2002-03-05  4:15                       ` Jeff Dike
2002-03-05  4:28                         ` Benjamin LaHaise
2002-03-05  4:40                           ` Jeff Dike
2002-03-05  5:34                             ` H. Peter Anvin
2002-03-05 14:43                               ` Jeff Dike
2002-03-05 16:37                                 ` H. Peter Anvin
2002-03-05 18:12                                   ` Jeff Dike
2002-03-05 18:30                                     ` Benjamin LaHaise
2002-03-06 14:59                                       ` Daniel Phillips
2002-03-06 15:24                                         ` Benjamin LaHaise
2002-03-06 15:24                                           ` Daniel Phillips
2002-03-06 16:36                                             ` Benjamin LaHaise
2002-03-06 23:14                                               ` Daniel Phillips
2002-03-06 23:20                                                 ` Benjamin LaHaise
2002-03-06 23:26                                                   ` Daniel Phillips
2002-03-06 23:33                                                     ` H. Peter Anvin
2002-03-07  0:08                                                       ` Daniel Phillips
2002-03-07  1:27                                                   ` Jeff Dike
2002-03-07  1:52                                                     ` Benjamin LaHaise
2002-03-08 19:17                                                       ` Jeff Dike
2002-03-08 21:22                                                         ` Benjamin LaHaise
2002-03-07 13:49                                                     ` Alan Cox
2002-03-07 13:36                                                       ` Daniel Phillips
2002-03-07 14:04                                                         ` yodaiken
2002-03-07 14:21                                                           ` Daniel Phillips
2002-03-07 14:38                                                             ` yodaiken
2002-03-07 15:31                                                               ` Daniel Phillips
2002-03-07 16:50                                                                 ` yodaiken
2002-03-07 18:07                                                                   ` Daniel Phillips
2002-03-07 18:15                                                                     ` yodaiken
2002-03-07 19:22                                                                     ` Alan Cox
2002-03-07 22:43                                                                       ` David Woodhouse
2002-03-07 23:09                                                                         ` Alan Cox
2002-03-07 22:57                                                                           ` David Woodhouse
2002-03-07 14:43                                                             ` Alan Cox
2002-03-07 15:32                                                               ` Daniel Phillips
2002-03-07 16:19                                                                 ` Alan Cox
2002-03-07 17:54                                                                   ` Daniel Phillips
2002-03-07 15:34                                                               ` Daniel Phillips
2002-03-07 19:18                                                                 ` Andrew Morton
2002-03-07 20:10                                                                   ` Rik van Riel
2002-03-07 20:56                                                                     ` Andrew Morton
2002-03-07 21:23                                                                       ` Rik van Riel
2002-03-07 22:02                                                                         ` Andrew Morton
2002-03-07 22:10                                                                           ` Rik van Riel
2002-03-07 22:23                                                                             ` Andrew Morton
2002-03-07 22:27                                                                               ` Rik van Riel
2002-03-07 22:41                                                                                 ` Andrew Morton
2002-03-07 22:42                                                                                 ` David Lang
2002-03-06 16:03                                         ` Jesse Pollard
2002-03-06 17:08                                           ` Jeff Dike
2002-03-06 17:33                                             ` Alan Cox
2002-03-07  0:28                                               ` Jeff Dike
2002-03-07  0:44                                                 ` Alan Cox
2002-03-05 18:46                                     ` H. Peter Anvin
2002-03-06  1:30                                     ` Alan Cox
2002-03-06 10:49                                     ` David Woodhouse
2002-03-06 14:26                                       ` Jeff Dike
2002-03-06 16:50                                       ` Alan Cox
2002-03-06 20:25                                         ` Jeff Dike
2002-03-06 20:54                                           ` Alan Cox
2002-03-06 21:27                                             ` Malcolm Beattie
2002-03-06 23:26                                               ` Jeff Dike
2002-03-06 21:27                                           ` David Woodhouse
2002-03-06 22:25                                             ` Joseph Malicki
2002-03-07  0:28                                             ` Jeff Dike
2002-03-07  0:44                                               ` Alan Cox
2002-03-07  0:04                                           ` Richard Gooch
2002-03-06 22:21                                         ` Pavel Machek
2002-03-07 11:30                                           ` Dave Jones
2002-03-07 18:21                                           ` H. Peter Anvin
2002-03-05 16:56                                 ` Wayne Whitney
2002-03-05 14:43                           ` Jeff Dike
2002-03-05 16:57                             ` H. Peter Anvin
2002-03-05 18:14                               ` Jeff Dike
2002-03-05 18:45                                 ` H. Peter Anvin
2002-03-05 17:30                             ` Jan Harkes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).