All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Remove unswappable anonymous pages off the LRU
@ 2007-02-15 21:05 Christoph Lameter
  2007-02-15 22:31 ` Rik van Riel
  2007-02-16  1:13 ` Andrew Morton
  0 siblings, 2 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-15 21:05 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh, Rik van Riel

If we do not have any swap or we have run out of swap then anonymous pages
can no longer be removed from memory. In that case we simply treat them
like mlocked pages. For a kernel compiled CONFIG_SWAP off this means
that all anonymous pages are marked mlocked when they are allocated.

If there is no swap available then anonymous pages will be removed when we 
attempt to reclaim and find that there is no swap space available.

I think it is best to account unreclaimable anonymous pages under NR_MLOCK 
because mlock is a way of treating pages that is defined by POSIX. It is 
clear then that these pages are not reclaimed. NONLRU would not 
communicate clearly what is happening to the pages and it would also 
include mlocked pages. The possible confusion that may arise here is that 
pages are mlocked without an mlock() syscall but I think that the sudden 
increase in NR_MLOCK will help people to reconsider what they are doing if 
they switch off swap.

Pages may also be marked as mlocked() if we are running out of swap.

One unresolved issue is how to get anonymous pages back to an unmlocked
state if more swap is added to the system. Pages are checked for the mlocked
state whenever a process terminates. However, anonymous pages of processes
that do not terminate may stay mlocked. The only way to get rid of
those would be to scan all mlocked pages on the system since we have
no list of mlocked pages. That may be too expensive. Maybe the best
is to leave the pages mlocked?

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-git11/include/linux/swap.h
===================================================================
--- linux-2.6.20-git11.orig/include/linux/swap.h	2007-02-15 11:03:27.000000000 -0800
+++ linux-2.6.20-git11/include/linux/swap.h	2007-02-15 11:04:27.000000000 -0800
@@ -362,6 +362,11 @@ static inline swp_entry_t get_swap_page(
 	return entry;
 }
 
+static inline int add_to_swap(struct page *page, gfp_t flags)
+{
+	return -ENOSPC;
+}
+
 /* linux/mm/thrash.c */
 #define put_swap_token(x) do { } while(0)
 #define grab_swap_token()  do { } while(0)
Index: linux-2.6.20-git11/mm/memory.c
===================================================================
--- linux-2.6.20-git11.orig/mm/memory.c	2007-02-15 10:56:49.000000000 -0800
+++ linux-2.6.20-git11/mm/memory.c	2007-02-15 11:09:30.000000000 -0800
@@ -683,7 +683,7 @@ static unsigned long zap_pte_range(struc
 				file_rss--;
 			}
 			page_remove_rmap(page, vma);
-			if (PageMlocked(page) && vma->vm_flags & VM_LOCKED)
+			if (PageMlocked(page))
 				lru_cache_add_mlock(page);
 			tlb_remove_page(tlb, page);
 			continue;
@@ -907,17 +907,27 @@ static void add_anon_page(struct vm_area
 				unsigned long address)
 {
 	inc_mm_counter(vma->vm_mm, anon_rss);
-	if (vma->vm_flags & VM_LOCKED) {
-		/*
-		 * Page is new and therefore not on the LRU
-		 * so we can directly mark it as mlocked
-		 */
-		SetPageMlocked(page);
-		ClearPageActive(page);
-		inc_zone_page_state(page, NR_MLOCK);
-	} else
-		lru_cache_add_active(page);
 	page_add_new_anon_rmap(page, vma, address);
+
+#ifdef CONFIG_SWAP
+	/*
+	 * If there is no swap then there is no
+	 * point in adding an anon page to the LRU
+	 * because we can never reclaim the page.
+	 */
+	if (!(vma->vm_flags & VM_LOCKED)) {
+		lru_cache_add_active(page);
+		return;
+	}
+#endif
+
+	/*
+	 * Page is new and therefore not on the LRU
+	 * so we can directly mark it as mlocked
+	 */
+	SetPageMlocked(page);
+	ClearPageActive(page);
+	inc_zone_page_state(page, NR_MLOCK);
 }
 
 /*
Index: linux-2.6.20-git11/mm/swap_state.c
===================================================================
--- linux-2.6.20-git11.orig/mm/swap_state.c	2007-02-15 10:57:47.000000000 -0800
+++ linux-2.6.20-git11/mm/swap_state.c	2007-02-15 10:59:52.000000000 -0800
@@ -153,7 +153,7 @@ int add_to_swap(struct page * page, gfp_
 	for (;;) {
 		entry = get_swap_page();
 		if (!entry.val)
-			return 0;
+			return -ENOSPC;
 
 		/*
 		 * Radix-tree node allocations from PF_MEMALLOC contexts could
@@ -174,7 +174,7 @@ int add_to_swap(struct page * page, gfp_
 			SetPageUptodate(page);
 			SetPageDirty(page);
 			INC_CACHE_INFO(add_total);
-			return 1;
+			return 0;
 		case -EEXIST:
 			/* Raced with "speculative" read_swap_cache_async */
 			INC_CACHE_INFO(exist_race);
@@ -183,7 +183,7 @@ int add_to_swap(struct page * page, gfp_
 		default:
 			/* -ENOMEM radix-tree allocation failure */
 			swap_free(entry);
-			return 0;
+			return -ENOMEM;
 		}
 	}
 }
Index: linux-2.6.20-git11/mm/vmscan.c
===================================================================
--- linux-2.6.20-git11.orig/mm/vmscan.c	2007-02-15 10:59:57.000000000 -0800
+++ linux-2.6.20-git11/mm/vmscan.c	2007-02-15 11:07:57.000000000 -0800
@@ -488,15 +488,24 @@ static unsigned long shrink_page_list(st
 		if (referenced && page_mapping_inuse(page))
 			goto activate_locked;
 
-#ifdef CONFIG_SWAP
-		/*
-		 * Anonymous process memory has backing store?
-		 * Try to allocate it some swap space here.
-		 */
-		if (PageAnon(page) && !PageSwapCache(page))
-			if (!add_to_swap(page, GFP_ATOMIC))
+		if (PageAnon(page) && !PageSwapCache(page)) {
+			/*
+			 * Anonymous process memory has backing store?
+			 * Try to allocate it some swap space here.
+			 */
+			int rc = add_to_swap(page, GFP_ATOMIC);
+
+			if (rc == -ENOMEM)
 				goto activate_locked;
-#endif /* CONFIG_SWAP */
+
+			/*
+			 *  If we are unable to allocate a swap
+			 *  page then the anonymous page can never
+			 *  be reclaimed. In effect it is mlocked.
+			 */
+			if (rc == -ENOSPC)
+				goto mlocked;
+		}
 
 		mapping = page_mapping(page);
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 21:05 [RFC] Remove unswappable anonymous pages off the LRU Christoph Lameter
@ 2007-02-15 22:31 ` Rik van Riel
  2007-02-15 22:41   ` Christoph Lameter
  2007-02-16  1:13 ` Andrew Morton
  1 sibling, 1 reply; 44+ messages in thread
From: Rik van Riel @ 2007-02-15 22:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh

Christoph Lameter wrote:
> If we do not have any swap or we have run out of swap then anonymous pages
> can no longer be removed from memory. In that case we simply treat them
> like mlocked pages.

Running out of swap is a temporary condition.
You need to have some way for those pages to
make it back onto the LRU list when swap
becomes available.

Better yet, we could implement a better way to
reclaim swap space, or reclaim swap space in a
different part of the code.

For example, we could try to reclaim the swap
space of every page that we scan on the active
list - when swap space starts getting tight.

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 22:31 ` Rik van Riel
@ 2007-02-15 22:41   ` Christoph Lameter
  2007-02-15 22:50     ` Rik van Riel
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-15 22:41 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh

On Thu, 15 Feb 2007, Rik van Riel wrote:

> Running out of swap is a temporary condition.
> You need to have some way for those pages to
> make it back onto the LRU list when swap
> becomes available.

Yup any ideas how?
 
> Better yet, we could implement a better way to
> reclaim swap space, or reclaim swap space in a
> different part of the code.

Certainly an interesting project.
 
> For example, we could try to reclaim the swap
> space of every page that we scan on the active
> list - when swap space starts getting tight.

Good idea.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 22:41   ` Christoph Lameter
@ 2007-02-15 22:50     ` Rik van Riel
  2007-02-15 22:53       ` Christoph Lameter
                         ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Rik van Riel @ 2007-02-15 22:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh

Christoph Lameter wrote:
> On Thu, 15 Feb 2007, Rik van Riel wrote:
> 
>> Running out of swap is a temporary condition.
>> You need to have some way for those pages to
>> make it back onto the LRU list when swap
>> becomes available.
> 
> Yup any ideas how?

Not really.

>> For example, we could try to reclaim the swap
>> space of every page that we scan on the active
>> list - when swap space starts getting tight.
> 
> Good idea.

I suspect this will be a better approach.  That way
the least used pages can cycle into swap space, and
the more used pages can be in RAM.

The only reason pages are unswappable when we run
out of swap is that we don't free up the swap space
used by pages that are in memory.

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 22:50     ` Rik van Riel
@ 2007-02-15 22:53       ` Christoph Lameter
  2007-02-15 23:19       ` Andrew Morton
  2007-02-15 23:20       ` Lee Schermerhorn
  2 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-15 22:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh

On Thu, 15 Feb 2007, Rik van Riel wrote:

> Christoph Lameter wrote:
> > On Thu, 15 Feb 2007, Rik van Riel wrote:
> > 
> > > Running out of swap is a temporary condition.
> > > You need to have some way for those pages to
> > > make it back onto the LRU list when swap
> > > becomes available.
> > 
> > Yup any ideas how?
> 
> Not really.

Maybe its then best to not move the pages off the LRU when there is some 
swap available. But even if there is no swap available: The user could 
add some later. So there is really no criterion for removing anonymous 
pages off the LRU. We would at least need some list of mlocked pages in 
orderto feed them back to the LRU.
 
> > > For example, we could try to reclaim the swap
> > > space of every page that we scan on the active
> > > list - when swap space starts getting tight.
> > 
> > Good idea.
> 
> I suspect this will be a better approach.  That way
> the least used pages can cycle into swap space, and
> the more used pages can be in RAM.
> 
> The only reason pages are unswappable when we run
> out of swap is that we don't free up the swap space
> used by pages that are in memory.

Well that is another project and not moving pages 
off the LRU.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 22:50     ` Rik van Riel
  2007-02-15 22:53       ` Christoph Lameter
@ 2007-02-15 23:19       ` Andrew Morton
  2007-02-15 23:20       ` Lee Schermerhorn
  2 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-15 23:19 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Martin J. Bligh

On Thu, 15 Feb 2007 17:50:30 -0500
Rik van Riel <riel@redhat.com> wrote:

> Christoph Lameter wrote:
> > On Thu, 15 Feb 2007, Rik van Riel wrote:
> > 
> >> Running out of swap is a temporary condition.
> >> You need to have some way for those pages to
> >> make it back onto the LRU list when swap
> >> becomes available.
> > 
> > Yup any ideas how?
> 
> Not really.

I guess we could be less ambitious.

Obviously, CONFIG_SWAP=n is a no-brainer.

And perhaps it's OK to treat no-swap-online as CONFIG_SWAP=n.  So any pages
which we _tried_ to swap out before any swap was online get treated as
locked memory.  Well, that's just bad luck.  Perhaps we could do some
stupid little manual thing based on the smaps walker:

	echo 1 > /proc/pid/add-your-anon-pages-back-to-the-lru

ug.

Which leaves us wondering what to do about the temporary out-of-swap
problem.  That''ll be hard - we don't want to do a full virtual scan of all
the mm's each time free swap goes from 0kb to 4kb.  I'd suggest that for
now we forget about this case and just put up with the additional scanning.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 22:50     ` Rik van Riel
  2007-02-15 22:53       ` Christoph Lameter
  2007-02-15 23:19       ` Andrew Morton
@ 2007-02-15 23:20       ` Lee Schermerhorn
  2007-02-16  0:15         ` Andrew Morton
  2 siblings, 1 reply; 44+ messages in thread
From: Lee Schermerhorn @ 2007-02-15 23:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, akpm, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Martin J. Bligh, Larry Woodman

On Thu, 2007-02-15 at 17:50 -0500, Rik van Riel wrote:
> Christoph Lameter wrote:
> > On Thu, 15 Feb 2007, Rik van Riel wrote:
> > 
> >> Running out of swap is a temporary condition.
> >> You need to have some way for those pages to
> >> make it back onto the LRU list when swap
> >> becomes available.
> > 
> > Yup any ideas how?
> 
> Not really.
> 
> >> For example, we could try to reclaim the swap
> >> space of every page that we scan on the active
> >> list - when swap space starts getting tight.
> > 
> > Good idea.
> 
> I suspect this will be a better approach.  That way
> the least used pages can cycle into swap space, and
> the more used pages can be in RAM.
> 
> The only reason pages are unswappable when we run
> out of swap is that we don't free up the swap space
> used by pages that are in memory.

Many large memory systems [e.g., 64G-128G x86_64] running large database
servers run with little [~2G] to no swap.  Most of physical memory is
allocated to large shared memory areas which are never expected to swap
out [even tho' some db apps may not lock the shmem down :-(].  In these
systems, removing the shared memory pages from reclaim consideration may
alleviate some nasty lockups we've seen when one of these systems gets
pushed into reclaim because, e.g., someone ran a backup that filled the
page cache.   We find all of the cpus walking the LRU list [millions of
pages] to find eligible reclaim candidates.  [Almost] none of the shmem
pages are reclaimable because of insufficient swap, and we don't want
them swapped anyway.

Now one could argue that this is an application error, because it
doesn't lock the shared memory regions that it doesn't want swapped
anyway.  This doesn't help the customers in the short term.  They're
looking for a way to take control outside of the application and make
their needs known to the system.  Needs like, never push out shmem [and
maybe even anon] memory to make room for page cache pages.  This, I
believe, is the motivation behind the "limit the page cache"
patches/requests that we keep seeing.

An idea for handling these:

With the addition of Christoph's patch to move mlock()ed pages out of
the LRU, we could add a mechanism to automagically lock shared memory
regions that either exceed some tunable threshold or that exceed the
available amount of swap.

Larry Woodman at Red Hat has been experimenting with patches to move
shmem [and anon?] pages in excess of swap to a separate "wired list".
This has alleviated part of the problems [apparent system hangs].  There
are other issues, some that have been discussed on the mailing lists
recently, with page cache pages messing up the LRU-ness of the active
and inactive lists; vmscan not being proactive enough in keeping
available memory [limits too low for large systems]; etc.  Those issues
are exacerbated by a long active list with a high fraction of
unreclaimable pages.

Lee


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 23:20       ` Lee Schermerhorn
@ 2007-02-16  0:15         ` Andrew Morton
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  0:15 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Nick Piggin,
	Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh,
	Larry Woodman

On Thu, 15 Feb 2007 18:20:58 -0500
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> With the addition of Christoph's patch to move mlock()ed pages out of
> the LRU, we could add a mechanism to automagically lock shared memory
> regions that either exceed some tunable threshold or that exceed the
> available amount of swap.

But we have an out-of-band way of diddling shm segments?  So we could
create

	/usr/bin/ipclock --lock -i 2432

?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-15 21:05 [RFC] Remove unswappable anonymous pages off the LRU Christoph Lameter
  2007-02-15 22:31 ` Rik van Riel
@ 2007-02-16  1:13 ` Andrew Morton
  2007-02-16  1:24   ` KAMEZAWA Hiroyuki
                     ` (3 more replies)
  1 sibling, 4 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  1:13 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh, Rik van Riel

On Thu, 15 Feb 2007 13:05:47 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:

> If we do not have any swap or we have run out of swap then anonymous pages
> can no longer be removed from memory. In that case we simply treat them
> like mlocked pages. For a kernel compiled CONFIG_SWAP off this means
> that all anonymous pages are marked mlocked when they are allocated.

It's nice and simple, but I think I'd prefer to wait for the existing mlock
changes to crash a bit less before we do this.

Is it true that PageMlocked() pages are never on the LRU?  If so, perhaps
we could overload the lru.next/prev on these pages to flag an mlocked page.

#define PageMlocked(page)	(page->lru.next == some_address_which_isnt_used_for_anwything_else)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:13 ` Andrew Morton
@ 2007-02-16  1:24   ` KAMEZAWA Hiroyuki
  2007-02-16  1:40   ` Martin Bligh
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-16  1:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-mm, nickpiggin, a.p.zijlstra, mbligh, riel

On Thu, 15 Feb 2007 17:13:55 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 15 Feb 2007 13:05:47 -0800 (PST)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > If we do not have any swap or we have run out of swap then anonymous pages
> > can no longer be removed from memory. In that case we simply treat them
> > like mlocked pages. For a kernel compiled CONFIG_SWAP off this means
> > that all anonymous pages are marked mlocked when they are allocated.
> 
> It's nice and simple, but I think I'd prefer to wait for the existing mlock
> changes to crash a bit less before we do this.
> 
> Is it true that PageMlocked() pages are never on the LRU?  If so, perhaps
> we could overload the lru.next/prev on these pages to flag an mlocked page.
> 
> #define PageMlocked(page)	(page->lru.next == some_address_which_isnt_used_for_anwything_else)
> 

I think mlocked pages are not reclaimable but movable.
So some structure should link them to a list...


-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:13 ` Andrew Morton
  2007-02-16  1:24   ` KAMEZAWA Hiroyuki
@ 2007-02-16  1:40   ` Martin Bligh
  2007-02-16  1:49     ` Andrew Morton
                       ` (2 more replies)
  2007-02-16  2:15   ` Christoph Lameter
  2007-02-16  2:55   ` Christoph Lameter
  3 siblings, 3 replies; 44+ messages in thread
From: Martin Bligh @ 2007-02-16  1:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

Andrew Morton wrote:
> On Thu, 15 Feb 2007 13:05:47 -0800 (PST)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
>> If we do not have any swap or we have run out of swap then anonymous pages
>> can no longer be removed from memory. In that case we simply treat them
>> like mlocked pages. For a kernel compiled CONFIG_SWAP off this means
>> that all anonymous pages are marked mlocked when they are allocated.
> 
> It's nice and simple, but I think I'd prefer to wait for the existing mlock
> changes to crash a bit less before we do this.
> 
> Is it true that PageMlocked() pages are never on the LRU?  If so, perhaps
> we could overload the lru.next/prev on these pages to flag an mlocked page.
> 
> #define PageMlocked(page)	(page->lru.next == some_address_which_isnt_used_for_anwything_else)

Mine just created a locked list. If you stick them there, there's no
need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-)

Suspect most of the rest of my patch is crap, but that might be useful?

M.


--- linux-2.6.17/include/linux/mm_inline.h      2006-06-17 
18:49:35.000000000 -0
700
+++ linux-2.6.17-mlock_lru/include/linux/mm_inline.h    2006-07-28 
15:53:15.0000
00000 -0700

@@ -28,6 +27,20 @@ del_page_from_inactive_list(struct zone
  }

  static inline void
+add_page_to_mlocked_list(struct zone *zone, struct page *page)
+{
+       list_add(&page->lru, &zone->mlocked_list);
+       zone->nr_mlocked--;
+}
+
+static inline void
+del_page_from_mlocked_list(struct zone *zone, struct page *page)
+{
+       list_del(&page->lru);
+       zone->nr_mlocked--;
+}
+
+static inline void
  del_page_from_lru(struct zone *zone, struct page *page)
  {
         list_del(&page->lru);
diff -aurpN -X /home/mbligh/.diff.exclude 
linux-2.6.17/include/linux/mmzone.h li
nux-2.6.17-mlock_lru/include/linux/mmzone.h
--- linux-2.6.17/include/linux/mmzone.h 2006-06-17 18:49:35.000000000 -0700
+++ linux-2.6.17-mlock_lru/include/linux/mmzone.h       2006-07-28 
15:49:05.0000
00000 -0700
@@ -156,10 +156,12 @@ struct zone {
         spinlock_t              lru_lock;
         struct list_head        active_list;
         struct list_head        inactive_list;
+       struct list_head        mlocked_list;
         unsigned long           nr_scan_active;
         unsigned long           nr_scan_inactive;
         unsigned long           nr_active;
         unsigned long           nr_inactive;
+       unsigned long           nr_mlocked;
         unsigned long           pages_scanned;     /* since last reclaim */
         int                     all_unreclaimable; /* All pages pinned */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:40   ` Martin Bligh
@ 2007-02-16  1:49     ` Andrew Morton
  2007-02-16  2:21       ` Martin Bligh
  2007-02-16  2:34       ` Christoph Lameter
  2007-02-16  2:16     ` Christoph Lameter
  2007-02-16  8:10     ` Peter Zijlstra
  2 siblings, 2 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  1:49 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007 17:40:09 -0800
Martin Bligh <mbligh@mbligh.org> wrote:

> Andrew Morton wrote:
> > On Thu, 15 Feb 2007 13:05:47 -0800 (PST)
> > Christoph Lameter <clameter@sgi.com> wrote:
> > 
> >> If we do not have any swap or we have run out of swap then anonymous pages
> >> can no longer be removed from memory. In that case we simply treat them
> >> like mlocked pages. For a kernel compiled CONFIG_SWAP off this means
> >> that all anonymous pages are marked mlocked when they are allocated.
> > 
> > It's nice and simple, but I think I'd prefer to wait for the existing mlock
> > changes to crash a bit less before we do this.
> > 
> > Is it true that PageMlocked() pages are never on the LRU?  If so, perhaps
> > we could overload the lru.next/prev on these pages to flag an mlocked page.
> > 
> > #define PageMlocked(page)	(page->lru.next == some_address_which_isnt_used_for_anwything_else)
> 
> Mine just created a locked list. If you stick them there, there's no
> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-)

I don't think there's a need for a mlocked list in the mlock patches:
nothing ever needs to walk it.

However this might be a good way of solving the someone-did-a-swapon
problem for this anon patch.

Guys, this page-flag problem is really serious.  -mm adds PG_mlocked and
PG_readahead and the ext4 patches add PG_booked (am currently fighting the
good fight there).  There's ongoing steady growth in these things and soon
we're going to be in a lot of pain.

> Suspect most of the rest of my patch is crap, but that might be useful?

wordwrapped, space-stuffed and tab-replaced.  The trifecta!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:13 ` Andrew Morton
  2007-02-16  1:24   ` KAMEZAWA Hiroyuki
  2007-02-16  1:40   ` Martin Bligh
@ 2007-02-16  2:15   ` Christoph Lameter
  2007-02-16  2:55   ` Christoph Lameter
  3 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  2:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> Is it true that PageMlocked() pages are never on the LRU?  If so, perhaps
> we could overload the lru.next/prev on these pages to flag an mlocked page.

Yes. We could even use the lru to build a list of mlocked pages but then
certain optimizations with anonymous pages would no longer work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:40   ` Martin Bligh
  2007-02-16  1:49     ` Andrew Morton
@ 2007-02-16  2:16     ` Christoph Lameter
  2007-02-16  3:17       ` Martin Bligh
  2007-02-16  8:10     ` Peter Zijlstra
  2 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  2:16 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Andrew Morton, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Martin Bligh wrote:

> Mine just created a locked list. If you stick them there, there's no
> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-)

How would that work without a page flag? Without a flags there is no way 
of checking that a page is on a particular list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:49     ` Andrew Morton
@ 2007-02-16  2:21       ` Martin Bligh
  2007-02-16  2:34       ` Christoph Lameter
  1 sibling, 0 replies; 44+ messages in thread
From: Martin Bligh @ 2007-02-16  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

>>> #define PageMlocked(page)	(page->lru.next == some_address_which_isnt_used_for_anwything_else)
>> Mine just created a locked list. If you stick them there, there's no
>> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-)
> 
> I don't think there's a need for a mlocked list in the mlock patches:
> nothing ever needs to walk it.
> 
> However this might be a good way of solving the someone-did-a-swapon
> problem for this anon patch.
> 
> Guys, this page-flag problem is really serious.  -mm adds PG_mlocked and
> PG_readahead and the ext4 patches add PG_booked (am currently fighting the
> good fight there).  There's ongoing steady growth in these things and soon
> we're going to be in a lot of pain.

Well, if the list is sufficient to fix that, I don't see why we'd
care about the overhead of list manipulation vs a flag, it's not
a fast path.

>> Suspect most of the rest of my patch is crap, but that might be useful?
> 
> wordwrapped, space-stuffed and tab-replaced.  The trifecta!

That's cause it was fairly obviously useless as-was so I just cut
and pasted it. But nonetheless, I appreciate your adulation ;-)

I'll try to add CamelCaps, bracing fuckups, and lots of #ifdefs
for the next round.

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:49     ` Andrew Morton
  2007-02-16  2:21       ` Martin Bligh
@ 2007-02-16  2:34       ` Christoph Lameter
  2007-02-16  2:48         ` Andrew Morton
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  2:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> Guys, this page-flag problem is really serious.  -mm adds PG_mlocked and
> PG_readahead and the ext4 patches add PG_booked (am currently fighting the
> good fight there).  There's ongoing steady growth in these things and soon
> we're going to be in a lot of pain.

Well is it possible to restrict some of the features to 64 bit only? There 
we have lots of page flags.

One additional measure that may be possible is to have a page type field
(maybe 3 bits long) that would consolidate a series of page flags that 
cannot occur together. But then we have issues with the atomicity of 
updates to that field.

F.e.

page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  2:34       ` Christoph Lameter
@ 2007-02-16  2:48         ` Andrew Morton
  2007-02-16  2:50           ` Christoph Lameter
  2007-02-16  8:15           ` Peter Zijlstra
  0 siblings, 2 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  2:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007 18:34:12 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 15 Feb 2007, Andrew Morton wrote:
> 
> > Guys, this page-flag problem is really serious.  -mm adds PG_mlocked and
> > PG_readahead and the ext4 patches add PG_booked (am currently fighting the
> > good fight there).  There's ongoing steady growth in these things and soon
> > we're going to be in a lot of pain.
> 
> Well is it possible to restrict some of the features to 64 bit only? There 
> we have lots of page flags.

We discussed that a while back and iirc ia64 has gone and gobbled most of
the upper 32bits.  Someone went and added some ascii art around the
PG_uncached definition but it is incomprehensible.  It seems to claim that
ia64 has gone and used all 32 bits, dammit.  If so, some adjustments to
ia64 might be called for.

> One additional measure that may be possible is to have a page type field
> (maybe 3 bits long) that would consolidate a series of page flags that 
> cannot occur together. But then we have issues with the atomicity of 
> updates to that field.
> 
> F.e.
> 
> page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> }

Yeah, maybe.  There doesn't seem to be a lot of room for that though - a
lot of those flags are quite independent and can occur simultaneously.

Maybe PageSwapCache can be worked out by other means.

The two swsusp bits can be removed: they're only needed at suspend/resume
time and can be replaced by an external data structure.

I still reckon there must be a way to avoid PG_buddy but Martin put up
stiff-and-squealy resistance when I resisted the addition of that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  2:48         ` Andrew Morton
@ 2007-02-16  2:50           ` Christoph Lameter
  2007-02-16  3:18             ` Andrew Morton
  2007-02-16  8:15           ` Peter Zijlstra
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  2:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> We discussed that a while back and iirc ia64 has gone and gobbled most of
> the upper 32bits.  Someone went and added some ascii art around the
> PG_uncached definition but it is incomprehensible.  It seems to claim that
> ia64 has gone and used all 32 bits, dammit.  If so, some adjustments to
> ia64 might be called for.

Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully 
usable. So we have 32-20 = 12 bits to play with on 64 bit.
> 
> > page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> }
> 
> Yeah, maybe.  There doesn't seem to be a lot of room for that though - a
> lot of those flags are quite independent and can occur simultaneously.

None of the above can occur simultaneously.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:13 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2007-02-16  2:15   ` Christoph Lameter
@ 2007-02-16  2:55   ` Christoph Lameter
  2007-02-16  5:02     ` Christoph Lameter
  3 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  2:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> It's nice and simple, but I think I'd prefer to wait for the existing mlock
> changes to crash a bit less before we do this.

Sigh. My optimizations must have done me in. Drop the last two patches and 
it will be fine. I am not sure what is going on there but things work 
right without the optimizations.

avoid-putting-new-mlocked-anonymous-pages-on-lru.patch
opportunistically-move-mlocked-pages-off-the-lru.patch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  2:16     ` Christoph Lameter
@ 2007-02-16  3:17       ` Martin Bligh
  2007-02-16  3:29         ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Martin Bligh @ 2007-02-16  3:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

Christoph Lameter wrote:
> On Thu, 15 Feb 2007, Martin Bligh wrote:
> 
>> Mine just created a locked list. If you stick them there, there's no
>> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-)
> 
> How would that work without a page flag? Without a flags there is no way 
> of checking that a page is on a particular list.

Depends what contexts you need to access it from. If you know the state
before and after, list_del and list_add work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  2:50           ` Christoph Lameter
@ 2007-02-16  3:18             ` Andrew Morton
  2007-02-16  3:36               ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  3:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007 18:50:39 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 15 Feb 2007, Andrew Morton wrote:
> 
> > We discussed that a while back and iirc ia64 has gone and gobbled most of
> > the upper 32bits.  Someone went and added some ascii art around the
> > PG_uncached definition but it is incomprehensible.  It seems to claim that
> > ia64 has gone and used all 32 bits, dammit.  If so, some adjustments to
> > ia64 might be called for.
> 
> Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully 
> usable. So we have 32-20 = 12 bits to play with on 64 bit.

OK.  But not many things are 64-bit-only?

> > 
> > > page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> }
> > 
> > Yeah, maybe.  There doesn't seem to be a lot of room for that though - a
> > lot of those flags are quite independent and can occur simultaneously.
> 
> None of the above can occur simultaneously.

<actually pays attention>

OK.

The actual implementation details might get messy though.  We can do a
non-atomic rmw of the three bits but that could corrupt a concurrent
modification of a different flag.  Or we could do a succession of three
set_bit/clear_bit operations, but that exposes intermediate invalid states.

It can be done I guess, but it'd be fiddly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:17       ` Martin Bligh
@ 2007-02-16  3:29         ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  3:29 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Andrew Morton, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Martin Bligh wrote:

> > How would that work without a page flag? Without a flags there is no way of
> > checking that a page is on a particular list.
> 
> Depends what contexts you need to access it from. If you know the state
> before and after, list_del and list_add work.

What state before and after do you know?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:18             ` Andrew Morton
@ 2007-02-16  3:36               ` Christoph Lameter
  2007-02-16  3:42                 ` Andrew Morton
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  3:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> > Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully 
> > usable. So we have 32-20 = 12 bits to play with on 64 bit.
> 
> OK.  But not many things are 64-bit-only?

We could restrict some newer features to 64 bits? (ducks and runs ...)

> > None of the above can occur simultaneously.
> The actual implementation details might get messy though.  We can do a
> non-atomic rmw of the three bits but that could corrupt a concurrent
> modification of a different flag.  Or we could do a succession of three
> set_bit/clear_bit operations, but that exposes intermediate invalid states.
> 
> It can be done I guess, but it'd be fiddly.

Right.

Maybe we could somehow splite up page->flags into 4 separate bytes?
Updating one byte would not endanger the other bytes in the other 
sets?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:36               ` Christoph Lameter
@ 2007-02-16  3:42                 ` Andrew Morton
  2007-02-16  3:50                   ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  3:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007 19:36:01 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 15 Feb 2007, Andrew Morton wrote:
> 
> > > Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully 
> > > usable. So we have 32-20 = 12 bits to play with on 64 bit.
> > 
> > OK.  But not many things are 64-bit-only?
> 
> We could restrict some newer features to 64 bits? (ducks and runs ...)

Well.  We haven't come across many such things, and doing this would mucky
up the VM and would reduce testing coverage.  But yeah, it's always an
option if these things crop up.

> > > None of the above can occur simultaneously.
> > The actual implementation details might get messy though.  We can do a
> > non-atomic rmw of the three bits but that could corrupt a concurrent
> > modification of a different flag.  Or we could do a succession of three
> > set_bit/clear_bit operations, but that exposes intermediate invalid states.
> > 
> > It can be done I guess, but it'd be fiddly.
> 
> Right.
> 
> Maybe we could somehow splite up page->flags into 4 separate bytes?
> Updating one byte would not endanger the other bytes in the other 
> sets?

yipes.  I'm not sure that'd work?

compare-and-swap-in-a-loop could be used, I guess.  With the obvious problem..

I do think that those two swsusp flags are low-hanging-fruit.  It'd be
trivial to vmalloc a bitmap or use a radix-tree-holding-longs, but I have a
vague feeling that there were subtle issues with that.  Still, Something
Needs To Be Done.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:42                 ` Andrew Morton
@ 2007-02-16  3:50                   ` Christoph Lameter
  2007-02-16  4:02                     ` Andrew Morton
                                       ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  3:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> > Maybe we could somehow splite up page->flags into 4 separate bytes?
> > Updating one byte would not endanger the other bytes in the other 
> > sets?
> 
> yipes.  I'm not sure that'd work?

Are all arches able to do atomic ops on bytes?
 
> compare-and-swap-in-a-loop could be used, I guess.  With the obvious problem..

Yucks. There seems to be no easy solution.
 
> I do think that those two swsusp flags are low-hanging-fruit.  It'd be
> trivial to vmalloc a bitmap or use a radix-tree-holding-longs, but I have a
> vague feeling that there were subtle issues with that.  Still, Something
> Needs To Be Done.

I tinkered with some similar radical ideas lately. Maybe a bit vector
could be used instead? For 1G of memory we would need 

2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.

Seems to be reasonable?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:50                   ` Christoph Lameter
@ 2007-02-16  4:02                     ` Andrew Morton
  2007-02-16  4:07                       ` Christoph Lameter
  2007-02-16  4:03                     ` Andrew Morton
  2007-02-16  4:14                     ` Rik van Riel
  2 siblings, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  4:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007 19:50:45 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 15 Feb 2007, Andrew Morton wrote:
> 
> > > Maybe we could somehow splite up page->flags into 4 separate bytes?
> > > Updating one byte would not endanger the other bytes in the other 
> > > sets?
> > 
> > yipes.  I'm not sure that'd work?
> 
> Are all arches able to do atomic ops on bytes?

I think they are, but you only wanted three bits.  I don't think we'll be
able to convert eight bits into a 256-value scalar efficiently.

> > compare-and-swap-in-a-loop could be used, I guess.  With the obvious problem..
> 
> Yucks. There seems to be no easy solution.
>  
> > I do think that those two swsusp flags are low-hanging-fruit.  It'd be
> > trivial to vmalloc a bitmap or use a radix-tree-holding-longs, but I have a
> > vague feeling that there were subtle issues with that.  Still, Something
> > Needs To Be Done.
> 
> I tinkered with some similar radical ideas lately. Maybe a bit vector
> could be used instead? For 1G of memory we would need 
> 
> 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> 
> Seems to be reasonable?
> 

32k per bit per gig, yes.  Better for large PAGE_SIZE.  More cachemisses.

But will it come unstuck for machines which have a super-sparse pfn space?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:50                   ` Christoph Lameter
  2007-02-16  4:02                     ` Andrew Morton
@ 2007-02-16  4:03                     ` Andrew Morton
  2007-02-16  4:14                     ` Rik van Riel
  2 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  4:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007 19:50:45 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:

> I tinkered with some similar radical ideas lately. Maybe a bit vector
> could be used instead? For 1G of memory we would need 
> 
> 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> 
> Seems to be reasonable?


Dave Hansen did have a patchset which did something along these lines, btw.  iirc
it used a tree and/or a hash of some form.  Much terror ensued.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  4:02                     ` Andrew Morton
@ 2007-02-16  4:07                       ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  4:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 15 Feb 2007, Andrew Morton wrote:

> > could be used instead? For 1G of memory we would need 
> > 
> > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> > 
> > Seems to be reasonable?
> > 
> 
> 32k per bit per gig, yes.  Better for large PAGE_SIZE.  More cachemisses.
> 
> But will it come unstuck for machines which have a super-sparse pfn space?

IA64 is such a beast. I think IA64 would work fine if we had bitmap 
vectors per zone. However, powerpc may have even super sparse zones. We 
may have to ask them first.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  3:50                   ` Christoph Lameter
  2007-02-16  4:02                     ` Andrew Morton
  2007-02-16  4:03                     ` Andrew Morton
@ 2007-02-16  4:14                     ` Rik van Riel
  2007-02-16  4:15                       ` Christoph Lameter
  2007-02-16  4:24                       ` Andrew Morton
  2 siblings, 2 replies; 44+ messages in thread
From: Rik van Riel @ 2007-02-16  4:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin,
	Peter Zijlstra, KAMEZAWA Hiroyuki

Christoph Lameter wrote:

> I tinkered with some similar radical ideas lately. Maybe a bit vector
> could be used instead? For 1G of memory we would need 
> 
> 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> 
> Seems to be reasonable?

At that point, wouldn't it be easier to simply increase
the size of struct page?  I don't think they're power of
two sized anyway, at least on 64 bit architectures.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  4:14                     ` Rik van Riel
@ 2007-02-16  4:15                       ` Christoph Lameter
  2007-02-16  4:57                         ` KAMEZAWA Hiroyuki
  2007-02-16  4:24                       ` Andrew Morton
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  4:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin,
	Peter Zijlstra, KAMEZAWA Hiroyuki

On Thu, 15 Feb 2007, Rik van Riel wrote:

> Christoph Lameter wrote:
> 
> > I tinkered with some similar radical ideas lately. Maybe a bit vector
> > could be used instead? For 1G of memory we would need 
> > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> > 
> > Seems to be reasonable?
> 
> At that point, wouldn't it be easier to simply increase
> the size of struct page?  I don't think they're power of
> two sized anyway, at least on 64 bit architectures.

On 64 bit platforms we can add one unsigned long to get from 56 to 64 
bytes.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  4:14                     ` Rik van Riel
  2007-02-16  4:15                       ` Christoph Lameter
@ 2007-02-16  4:24                       ` Andrew Morton
  1 sibling, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  4:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, Martin Bligh, linux-mm, Nick Piggin,
	Peter Zijlstra, KAMEZAWA Hiroyuki

On Thu, 15 Feb 2007 23:14:01 -0500 Rik van Riel <riel@redhat.com> wrote:

> Christoph Lameter wrote:
> 
> > I tinkered with some similar radical ideas lately. Maybe a bit vector
> > could be used instead? For 1G of memory we would need 
> > 
> > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> > 
> > Seems to be reasonable?
> 
> At that point, wouldn't it be easier to simply increase
> the size of struct page?  I don't think they're power of
> two sized anyway, at least on 64 bit architectures.

That gives us an additional 32 bits in one hit whereas the external bitmap
allows us to fine-tune it.

Doing neither is of course best..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  4:15                       ` Christoph Lameter
@ 2007-02-16  4:57                         ` KAMEZAWA Hiroyuki
  2007-02-16  5:16                           ` Andrew Morton
  2007-02-16  5:19                           ` Christoph Lameter
  0 siblings, 2 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-16  4:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: riel, akpm, mbligh, linux-mm, nickpiggin, a.p.zijlstra

On Thu, 15 Feb 2007 20:15:46 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 15 Feb 2007, Rik van Riel wrote:
> 
> > Christoph Lameter wrote:
> > 
> > > I tinkered with some similar radical ideas lately. Maybe a bit vector
> > > could be used instead? For 1G of memory we would need 
> > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> > > 
> > > Seems to be reasonable?
> > 
> > At that point, wouldn't it be easier to simply increase
> > the size of struct page?  I don't think they're power of
> > two sized anyway, at least on 64 bit architectures.
> 
> On 64 bit platforms we can add one unsigned long to get from 56 to 64 
> bytes.
> 

I sometimes dreams 
==
struct page {
	...
	struct zone	*zone;
	...
};
#define page_zone(page)		(page)->zone
==
but never tried ;)

-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  2:55   ` Christoph Lameter
@ 2007-02-16  5:02     ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  5:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Martin J. Bligh, Rik van Riel

On Thu, 15 Feb 2007, Christoph Lameter wrote:

> On Thu, 15 Feb 2007, Andrew Morton wrote:
> 
> > It's nice and simple, but I think I'd prefer to wait for the existing mlock
> > changes to crash a bit less before we do this.
> 
> Sigh. My optimizations must have done me in. Drop the last two patches and 
> it will be fine. I am not sure what is going on there but things work 
> right without the optimizations.
> 
> avoid-putting-new-mlocked-anonymous-pages-on-lru.patch
> opportunistically-move-mlocked-pages-off-the-lru.patch
> 

Would you put those two patches back?


The problem is that in some circumstances a page may be freed that is 
mlocked (if one is marking a page as mlocked early). The page allocator 
will not touch the PG_mlocked bit and thus a newly allocated page may have 
PG_mlocked set. If we then try to put it on the lru then the VM_BUG_ONs 
are triggered.

The following patch detects these conditions in the page allocator and 
does the proper checks and cleanup.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20/include/linux/page-flags.h
===================================================================
--- linux-2.6.20.orig/include/linux/page-flags.h	2007-02-15 20:42:42.000000000 -0800
+++ linux-2.6.20/include/linux/page-flags.h	2007-02-15 20:43:33.000000000 -0800
@@ -261,6 +261,7 @@ static inline void SetPageUptodate(struc
 #define PageMlocked(page)	test_bit(PG_mlocked, &(page)->flags)
 #define SetPageMlocked(page)	set_bit(PG_mlocked, &(page)->flags)
 #define ClearPageMlocked(page)	clear_bit(PG_mlocked, &(page)->flags)
+#define __ClearPageMlocked(page) __clear_bit(PG_mlocked, &(page)->flags)
 
 struct page;	/* forward declaration */
 
Index: linux-2.6.20/mm/page_alloc.c
===================================================================
--- linux-2.6.20.orig/mm/page_alloc.c	2007-02-15 20:42:42.000000000 -0800
+++ linux-2.6.20/mm/page_alloc.c	2007-02-15 20:55:23.000000000 -0800
@@ -203,6 +203,7 @@ static void bad_page(struct page *page)
 			1 << PG_slab    |
 			1 << PG_swapcache |
 			1 << PG_writeback |
+			1 << PG_mlocked |
 			1 << PG_buddy );
 	set_page_count(page, 0);
 	reset_page_mapcount(page);
@@ -442,6 +443,11 @@ static inline int free_pages_check(struc
 		bad_page(page);
 	if (PageDirty(page))
 		__ClearPageDirty(page);
+	if (PageMlocked(page)) {
+		/* Page is unused so no need to take the lru lock */
+		__ClearPageMlocked(page);
+		dec_zone_page_state(page, NR_MLOCK);
+	}
 	/*
 	 * For now, we report if PG_reserved was found set, but do not
 	 * clear it, and do not free the page.  But we shall soon need
@@ -588,6 +594,7 @@ static int prep_new_page(struct page *pa
 			1 << PG_swapcache |
 			1 << PG_writeback |
 			1 << PG_reserved |
+			1 << PG_mlocked |
 			1 << PG_buddy ))))
 		bad_page(page);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  4:57                         ` KAMEZAWA Hiroyuki
@ 2007-02-16  5:16                           ` Andrew Morton
  2007-02-16  5:25                             ` Christoph Lameter
  2007-02-16  5:19                           ` Christoph Lameter
  1 sibling, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  5:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Lameter, riel, mbligh, linux-mm, nickpiggin, a.p.zijlstra

On Fri, 16 Feb 2007 13:57:14 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 15 Feb 2007 20:15:46 -0800 (PST)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > On Thu, 15 Feb 2007, Rik van Riel wrote:
> > 
> > > Christoph Lameter wrote:
> > > 
> > > > I tinkered with some similar radical ideas lately. Maybe a bit vector
> > > > could be used instead? For 1G of memory we would need 
> > > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap.
> > > > 
> > > > Seems to be reasonable?
> > > 
> > > At that point, wouldn't it be easier to simply increase
> > > the size of struct page?  I don't think they're power of
> > > two sized anyway, at least on 64 bit architectures.
> > 
> > On 64 bit platforms we can add one unsigned long to get from 56 to 64 
> > bytes.
> > 
> 
> I sometimes dreams 
> ==
> struct page {
> 	...
> 	struct zone	*zone;
> 	...
> };
> #define page_zone(page)		(page)->zone
> ==
> but never tried ;)

hm.  We can calculate page_zone(page) from the pfn.  And I suspect we can
do that locklessly too.  I bet a nice tight implementation of that would be
efficient enough and it'll reclaim heaps of flags.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  4:57                         ` KAMEZAWA Hiroyuki
  2007-02-16  5:16                           ` Andrew Morton
@ 2007-02-16  5:19                           ` Christoph Lameter
  1 sibling, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  5:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: riel, akpm, mbligh, linux-mm, nickpiggin, a.p.zijlstra

On Fri, 16 Feb 2007, KAMEZAWA Hiroyuki wrote:

> > On 64 bit platforms we can add one unsigned long to get from 56 to 64 
> > bytes.
> > 
> 
> I sometimes dreams 
> ==
> struct page {
> 	...
> 	struct zone	*zone;
> 	...
> };
> #define page_zone(page)		(page)->zone
> ==
> but never tried ;)

Hmmm..... Currently we have

static inline struct zone *page_zone(struct page *page)
{
        return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
}

page_to_nid is extracting a piece of the page flags. Then we need to do a 
lookup and find the zonenum (another extract from page flags).

This is not expensive. Look at __pagevec_lru_add. This boils down to (r9 
= struct page * ):

0xa000000100117ef0 <__pagevec_lru_add+80>:      [MMI]       ld8 r33=[r9];;
0xa000000100117ef1 <__pagevec_lru_add+81>:                  ld8 r8=[r33]
0xa000000100117ef2 <__pagevec_lru_add+82>:                  nop.i 0x0;;
0xa000000100117f00 <__pagevec_lru_add+96>:      [MII]       nop.m 0x0
0xa000000100117f01 <__pagevec_lru_add+97>:                  shr.u r3=r8,54;;
0xa000000100117f02 <__pagevec_lru_add+98>:                  nop.i 0x0
0xa000000100117f10 <__pagevec_lru_add+112>:     [MMI]       shladd r14=r3,3,r15;;
0xa000000100117f11 <__pagevec_lru_add+113>:                 ld8 r34=[r14]
0xa000000100117f12 <__pagevec_lru_add+114>:                 nop.i 0x0;;
0xa000000100117f20 <__pagevec_lru_add+128>:     [MIB]       nop.m 0x0
0xa000000100117f21 <__pagevec_lru_add+129>:                 cmp.eq p6,p7=r2,r34

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  5:16                           ` Andrew Morton
@ 2007-02-16  5:25                             ` Christoph Lameter
  2007-02-16  5:41                               ` Andrew Morton
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16  5:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, riel, mbligh, linux-mm, nickpiggin, a.p.zijlstra

On Thu, 15 Feb 2007, Andrew Morton wrote:

> hm.  We can calculate page_zone(page) from the pfn.  And I suspect we can
> do that locklessly too.  I bet a nice tight implementation of that would be
> efficient enough and it'll reclaim heaps of flags.

You mean encode the node and the zone_id in the pfn? Ummm... That would 
get us into lots of trouble with pfn_to_page and friends.

The sparsemem section field could be available. A virtual 
memmap based implementation would not need the section number and would 
get rid of the sparsemem table lookups.Problem is that we cannot do it on 
32 bit platforms because of the lack of virtual memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  5:25                             ` Christoph Lameter
@ 2007-02-16  5:41                               ` Andrew Morton
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2007-02-16  5:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KAMEZAWA Hiroyuki, riel, mbligh, linux-mm, nickpiggin, a.p.zijlstra

On Thu, 15 Feb 2007 21:25:53 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 15 Feb 2007, Andrew Morton wrote:
> 
> > hm.  We can calculate page_zone(page) from the pfn.  And I suspect we can
> > do that locklessly too.  I bet a nice tight implementation of that would be
> > efficient enough and it'll reclaim heaps of flags.
> 
> You mean encode the node and the zone_id in the pfn?

Maybe.  Or just leave the pfns as they are and implement a decent lookup
algorithm.

For a pc it'd be something like

	for (i = ZONE_DMA; i <= ZONE_HIGHMEM; i++) {
		if (pfn >= first_pfn(i) && pfn <= last_pfn(i))
			success();
	}

if you get my drift.

I dunno how complex that would get in the worst cases.

> Ummm... That would 
> get us into lots of trouble with pfn_to_page and friends.
> 
> The sparsemem section field could be available. A virtual 
> memmap based implementation would not need the section number and would 
> get rid of the sparsemem table lookups.Problem is that we cannot do it on 
> 32 bit platforms because of the lack of virtual memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  1:40   ` Martin Bligh
  2007-02-16  1:49     ` Andrew Morton
  2007-02-16  2:16     ` Christoph Lameter
@ 2007-02-16  8:10     ` Peter Zijlstra
  2 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2007-02-16  8:10 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Andrew Morton, Christoph Lameter, linux-mm, Nick Piggin,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 2007-02-15 at 17:40 -0800, Martin Bligh wrote:

> Mine just created a locked list. If you stick them there, there's no
> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-)

> --- linux-2.6.17/include/linux/mm_inline.h      2006-06-17 
> 18:49:35.000000000 -0
> 700
> +++ linux-2.6.17-mlock_lru/include/linux/mm_inline.h    2006-07-28 
> 15:53:15.0000
> 00000 -0700
> 
> @@ -28,6 +27,20 @@ del_page_from_inactive_list(struct zone
>   }
> 
>   static inline void
> +add_page_to_mlocked_list(struct zone *zone, struct page *page)
> +{
> +       list_add(&page->lru, &zone->mlocked_list);
> +       zone->nr_mlocked--;
> +}
> +
> +static inline void
> +del_page_from_mlocked_list(struct zone *zone, struct page *page)
> +{
> +       list_del(&page->lru);
> +       zone->nr_mlocked--;
> +}
> +
> +static inline void
>   del_page_from_lru(struct zone *zone, struct page *page)
>   {
>          list_del(&page->lru);
> diff -aurpN -X /home/mbligh/.diff.exclude 
> linux-2.6.17/include/linux/mmzone.h li
> nux-2.6.17-mlock_lru/include/linux/mmzone.h
> --- linux-2.6.17/include/linux/mmzone.h 2006-06-17 18:49:35.000000000 -0700
> +++ linux-2.6.17-mlock_lru/include/linux/mmzone.h       2006-07-28 
> 15:49:05.0000
> 00000 -0700
> @@ -156,10 +156,12 @@ struct zone {
>          spinlock_t              lru_lock;
>          struct list_head        active_list;
>          struct list_head        inactive_list;
> +       struct list_head        mlocked_list;
>          unsigned long           nr_scan_active;
>          unsigned long           nr_scan_inactive;
>          unsigned long           nr_active;
>          unsigned long           nr_inactive;
> +       unsigned long           nr_mlocked;
>          unsigned long           pages_scanned;     /* since last reclaim */
>          int                     all_unreclaimable; /* All pages pinned */
> 

The problem with such an approach would be that it takes O(n) time to
find that a given pages is part of the mlocked_list; so you'd still need
some marker to optimise that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  2:48         ` Andrew Morton
  2007-02-16  2:50           ` Christoph Lameter
@ 2007-02-16  8:15           ` Peter Zijlstra
  2007-02-16  9:11             ` Rafael J. Wysocki
  2007-02-16 10:10             ` Christoph Lameter
  1 sibling, 2 replies; 44+ messages in thread
From: Peter Zijlstra @ 2007-02-16  8:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Martin Bligh, linux-mm, Nick Piggin,
	KAMEZAWA Hiroyuki, Rik van Riel, Rafael J. Wysocki

On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote:

> The two swsusp bits can be removed: they're only needed at suspend/resume
> time and can be replaced by an external data structure.

I once had a talk with Rafael, and he said it would be possible to rid
us of PG_nosave* with the now not so new bitmap code that is used to
handle swsusp of highmem pages.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  8:15           ` Peter Zijlstra
@ 2007-02-16  9:11             ` Rafael J. Wysocki
  2007-02-16  9:19               ` Peter Zijlstra
  2007-02-16 10:10             ` Christoph Lameter
  1 sibling, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2007-02-16  9:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Christoph Lameter, Martin Bligh, linux-mm,
	Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel

On Friday, 16 February 2007 09:15, Peter Zijlstra wrote:
> On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote:
> 
> > The two swsusp bits can be removed: they're only needed at suspend/resume
> > time and can be replaced by an external data structure.
> 
> I once had a talk with Rafael, and he said it would be possible to rid
> us of PG_nosave* with the now not so new bitmap code that is used to
> handle swsusp of highmem pages.

Yes, that is true.

I'm going to do this soon, but first I'd like to help to make the task freezer
suitable for the CPU hotplug.

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  9:11             ` Rafael J. Wysocki
@ 2007-02-16  9:19               ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2007-02-16  9:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Christoph Lameter, Martin Bligh, linux-mm,
	Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, 2007-02-16 at 10:11 +0100, Rafael J. Wysocki wrote:
> On Friday, 16 February 2007 09:15, Peter Zijlstra wrote:
> > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote:
> > 
> > > The two swsusp bits can be removed: they're only needed at suspend/resume
> > > time and can be replaced by an external data structure.
> > 
> > I once had a talk with Rafael, and he said it would be possible to rid
> > us of PG_nosave* with the now not so new bitmap code that is used to
> > handle swsusp of highmem pages.
> 
> Yes, that is true.
> 
> I'm going to do this soon,

Great!

>  but first I'd like to help to make the task freezer
> suitable for the CPU hotplug.

A worthy challenge, have fun :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16  8:15           ` Peter Zijlstra
  2007-02-16  9:11             ` Rafael J. Wysocki
@ 2007-02-16 10:10             ` Christoph Lameter
  2007-02-16 10:17               ` Peter Zijlstra
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-02-16 10:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin,
	KAMEZAWA Hiroyuki, Rik van Riel, Rafael J. Wysocki

On Fri, 16 Feb 2007, Peter Zijlstra wrote:

> On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote:
> 
> > The two swsusp bits can be removed: they're only needed at suspend/resume
> > time and can be replaced by an external data structure.
> 
> I once had a talk with Rafael, and he said it would be possible to rid
> us of PG_nosave* with the now not so new bitmap code that is used to
> handle swsusp of highmem pages.

Well we can just shift the stuff into the power subsystem I think. Like 
this? Compiles but not tested.

Index: linux-2.6.20-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.20-mm1.orig/include/linux/mmzone.h	2007-02-16 01:11:46.000000000 -0800
+++ linux-2.6.20-mm1/include/linux/mmzone.h	2007-02-16 01:12:23.000000000 -0800
@@ -295,6 +295,7 @@ struct zone {
 	unsigned long		spanned_pages;	/* total size, including holes */
 	unsigned long		present_pages;	/* amount of memory (excluding holes) */
 
+	unsigned long		*suspend_flags;
 	/*
 	 * rarely used fields:
 	 */
Index: linux-2.6.20-mm1/include/linux/page-flags.h
===================================================================
--- linux-2.6.20-mm1.orig/include/linux/page-flags.h	2007-02-16 01:05:26.000000000 -0800
+++ linux-2.6.20-mm1/include/linux/page-flags.h	2007-02-16 01:16:45.000000000 -0800
@@ -82,13 +82,11 @@
 #define PG_private		11	/* If pagecache, has fs-private data */
 
 #define PG_writeback		12	/* Page is under writeback */
-#define PG_nosave		13	/* Used for system suspend/resume */
 #define PG_compound		14	/* Part of a compound page */
 #define PG_swapcache		15	/* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk		16	/* Has blocks allocated on-disk */
 #define PG_reclaim		17	/* To be reclaimed asap */
-#define PG_nosave_free		18	/* Used for system suspend/resume */
 #define PG_buddy		19	/* Page is free, on buddy lists */
 
 #define PG_mlocked		20	/* Page is mlocked */
@@ -192,16 +190,6 @@ static inline void SetPageUptodate(struc
 #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback,	\
 							&(page)->flags)
 
-#define PageNosave(page)	test_bit(PG_nosave, &(page)->flags)
-#define SetPageNosave(page)	set_bit(PG_nosave, &(page)->flags)
-#define TestSetPageNosave(page)	test_and_set_bit(PG_nosave, &(page)->flags)
-#define ClearPageNosave(page)		clear_bit(PG_nosave, &(page)->flags)
-#define TestClearPageNosave(page)	test_and_clear_bit(PG_nosave, &(page)->flags)
-
-#define PageNosaveFree(page)	test_bit(PG_nosave_free, &(page)->flags)
-#define SetPageNosaveFree(page)	set_bit(PG_nosave_free, &(page)->flags)
-#define ClearPageNosaveFree(page)		clear_bit(PG_nosave_free, &(page)->flags)
-
 #define PageBuddy(page)		test_bit(PG_buddy, &(page)->flags)
 #define __SetPageBuddy(page)	__set_bit(PG_buddy, &(page)->flags)
 #define __ClearPageBuddy(page)	__clear_bit(PG_buddy, &(page)->flags)
Index: linux-2.6.20-mm1/include/linux/suspend.h
===================================================================
--- linux-2.6.20-mm1.orig/include/linux/suspend.h	2007-02-16 01:15:30.000000000 -0800
+++ linux-2.6.20-mm1/include/linux/suspend.h	2007-02-16 01:57:51.000000000 -0800
@@ -21,7 +22,6 @@ struct pbe {
 
 /* mm/page_alloc.c */
 extern void drain_local_pages(void);
-extern void mark_free_pages(struct zone *zone);
 
 #ifdef CONFIG_PM
 /* kernel/power/swsusp.c */
@@ -42,6 +42,18 @@ static inline int software_suspend(void)
 }
 #endif /* CONFIG_PM */
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+int suspend_flags_init(struct zone *zone, unsigned long zone_size_pages);
+void mark_free_pages(struct zone *zone);
+#else
+static inline int suspend_flags_init(struct zone *zone, unsigned long zone_size_pages)
+{
+	return 0;
+}
+
+static inline void mark_free_pages(struct zone *zone) {}
+#endif
+
 void save_processor_state(void);
 void restore_processor_state(void);
 struct saved_context;
Index: linux-2.6.20-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.20-mm1.orig/mm/page_alloc.c	2007-02-16 01:22:09.000000000 -0800
+++ linux-2.6.20-mm1/mm/page_alloc.c	2007-02-16 01:40:39.000000000 -0800
@@ -767,40 +767,6 @@ static void __drain_pages(unsigned int c
 }
 
 #ifdef CONFIG_PM
-
-void mark_free_pages(struct zone *zone)
-{
-	unsigned long pfn, max_zone_pfn;
-	unsigned long flags;
-	int order;
-	struct list_head *curr;
-
-	if (!zone->spanned_pages)
-		return;
-
-	spin_lock_irqsave(&zone->lock, flags);
-
-	max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
-		if (pfn_valid(pfn)) {
-			struct page *page = pfn_to_page(pfn);
-
-			if (!PageNosave(page))
-				ClearPageNosaveFree(page);
-		}
-
-	for (order = MAX_ORDER - 1; order >= 0; --order)
-		list_for_each(curr, &zone->free_area[order].free_list) {
-			unsigned long i;
-
-			pfn = page_to_pfn(list_entry(curr, struct page, lru));
-			for (i = 0; i < (1UL << order); i++)
-				SetPageNosaveFree(pfn_to_page(pfn + i));
-		}
-
-	spin_unlock_irqrestore(&zone->lock, flags);
-}
-
 /*
  * Spill all of this CPU's per-cpu pages back into the buddy allocator.
  */
@@ -2354,6 +2320,9 @@ __meminit int init_currently_empty_zone(
 	ret = zone_wait_table_init(zone, size);
 	if (ret)
 		return ret;
+	ret = suspend_flags_init(zone, size);
+	if (ret)
+		return ret;
 	pgdat->nr_zones = zone_idx(zone) + 1;
 
 	zone->zone_start_pfn = zone_start_pfn;
Index: linux-2.6.20-mm1/kernel/power/snapshot.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/power/snapshot.c	2007-02-16 01:46:02.000000000 -0800
+++ linux-2.6.20-mm1/kernel/power/snapshot.c	2007-02-16 01:59:24.000000000 -0800
@@ -34,6 +34,126 @@
 
 #include "power.h"
 
+static inline int PageNosave(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	return test_bit(offset * 2, zone->suspend_flags);
+}
+
+static inline void SetPageNosave(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	set_bit(offset * 2, zone->suspend_flags);
+}
+
+static inline int TestSetPageNosave(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	return test_and_set_bit(offset * 2, zone->suspend_flags);
+}
+
+static inline void ClearPageNosave(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	clear_bit(offset * 2, zone->suspend_flags);
+}
+
+static inline int TestClearPageNosave(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	return test_and_clear_bit(offset * 2, zone->suspend_flags);
+}
+
+
+static inline int PageNosaveFree(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	return test_bit(offset * 2 + 1, zone->suspend_flags);
+}
+
+static inline void SetPageNosaveFree(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	set_bit(offset * 2 + 1, zone->suspend_flags);
+}
+
+static inline void ClearPageNosaveFree(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn;
+
+	clear_bit(offset * 2 + 1, zone->suspend_flags);
+}
+
+int suspend_flags_init(struct zone *zone, unsigned long zone_size_pages)
+{
+	struct pglist_data *pgdat = zone->zone_pgdat;
+	size_t alloc_size;
+
+	/*
+	 * We need two bits per page in the zone. One for PageNosave and the other
+	 * for PageNosaveFree.
+	 */
+	alloc_size = BITS_TO_LONGS(zone_size_pages * 2);
+ 	if (system_state == SYSTEM_BOOTING) {
+		zone->suspend_flags = (unsigned long *)
+			alloc_bootmem_node(pgdat, alloc_size);
+	} else
+		zone->suspend_flags = (unsigned long *)vmalloc(alloc_size);
+	if (!zone->suspend_flags)
+		return -ENOMEM;
+
+	bitmap_zero(zone->suspend_flags, 2 * zone_size_pages);
+	return 0;
+}
+
+void mark_free_pages(struct zone *zone)
+{
+	unsigned long pfn, max_zone_pfn;
+	unsigned long flags;
+	int order;
+	struct list_head *curr;
+
+	if (!zone->spanned_pages)
+		return;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
+	for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+		if (pfn_valid(pfn)) {
+			struct page *page = pfn_to_page(pfn);
+
+			if (!PageNosave(page))
+				ClearPageNosaveFree(page);
+		}
+
+	for (order = MAX_ORDER - 1; order >= 0; --order)
+		list_for_each(curr, &zone->free_area[order].free_list) {
+			unsigned long i;
+
+			pfn = page_to_pfn(list_entry(curr, struct page, lru));
+			for (i = 0; i < (1UL << order); i++)
+				SetPageNosaveFree(pfn_to_page(pfn + i));
+		}
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16 10:10             ` Christoph Lameter
@ 2007-02-16 10:17               ` Peter Zijlstra
  2007-02-16 11:04                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2007-02-16 10:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin,
	KAMEZAWA Hiroyuki, Rik van Riel, Rafael J. Wysocki

On Fri, 2007-02-16 at 02:10 -0800, Christoph Lameter wrote:
> On Fri, 16 Feb 2007, Peter Zijlstra wrote:
> 
> > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote:
> > 
> > > The two swsusp bits can be removed: they're only needed at suspend/resume
> > > time and can be replaced by an external data structure.
> > 
> > I once had a talk with Rafael, and he said it would be possible to rid
> > us of PG_nosave* with the now not so new bitmap code that is used to
> > handle swsusp of highmem pages.
> 
> Well we can just shift the stuff into the power subsystem I think. Like 
> this? Compiles but not tested.

That would work, however as Andrew pointed out, this data is only ever
used at suspend/resume time. I think we can postpone allocating this
bitmap until then and free it afterwards.

However I'm quite out of my depths here, so I'll leave more constructive
comments to Rafael.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC] Remove unswappable anonymous pages off the LRU
  2007-02-16 10:17               ` Peter Zijlstra
@ 2007-02-16 11:04                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2007-02-16 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Andrew Morton, Martin Bligh, linux-mm,
	Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel

On Friday, 16 February 2007 11:17, Peter Zijlstra wrote:
> On Fri, 2007-02-16 at 02:10 -0800, Christoph Lameter wrote:
> > On Fri, 16 Feb 2007, Peter Zijlstra wrote:
> > 
> > > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote:
> > > 
> > > > The two swsusp bits can be removed: they're only needed at suspend/resume
> > > > time and can be replaced by an external data structure.
> > > 
> > > I once had a talk with Rafael, and he said it would be possible to rid
> > > us of PG_nosave* with the now not so new bitmap code that is used to
> > > handle swsusp of highmem pages.
> > 
> > Well we can just shift the stuff into the power subsystem I think. Like 
> > this? Compiles but not tested.
> 
> That would work, however as Andrew pointed out, this data is only ever
> used at suspend/resume time. I think we can postpone allocating this
> bitmap until then and free it afterwards.
> 
> However I'm quite out of my depths here, so I'll leave more constructive
> comments to Rafael.

The PageNosave bits may also used during the initialization.  On x86_64 the
arch code uses them to mark the pages that shouldn't be saved by swsusp.

However, the PageNosaveFree bits can be allocated during the suspend, as
they aren't needed before.

Thus what I'd like to do would be to use the Christoph's approach to allocate
the PageNosave bits on the architectures that need them (i386 doesn't, for
example) and handle the rest using memory bitmaps in snapshot.c.

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2007-02-16 11:04 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-15 21:05 [RFC] Remove unswappable anonymous pages off the LRU Christoph Lameter
2007-02-15 22:31 ` Rik van Riel
2007-02-15 22:41   ` Christoph Lameter
2007-02-15 22:50     ` Rik van Riel
2007-02-15 22:53       ` Christoph Lameter
2007-02-15 23:19       ` Andrew Morton
2007-02-15 23:20       ` Lee Schermerhorn
2007-02-16  0:15         ` Andrew Morton
2007-02-16  1:13 ` Andrew Morton
2007-02-16  1:24   ` KAMEZAWA Hiroyuki
2007-02-16  1:40   ` Martin Bligh
2007-02-16  1:49     ` Andrew Morton
2007-02-16  2:21       ` Martin Bligh
2007-02-16  2:34       ` Christoph Lameter
2007-02-16  2:48         ` Andrew Morton
2007-02-16  2:50           ` Christoph Lameter
2007-02-16  3:18             ` Andrew Morton
2007-02-16  3:36               ` Christoph Lameter
2007-02-16  3:42                 ` Andrew Morton
2007-02-16  3:50                   ` Christoph Lameter
2007-02-16  4:02                     ` Andrew Morton
2007-02-16  4:07                       ` Christoph Lameter
2007-02-16  4:03                     ` Andrew Morton
2007-02-16  4:14                     ` Rik van Riel
2007-02-16  4:15                       ` Christoph Lameter
2007-02-16  4:57                         ` KAMEZAWA Hiroyuki
2007-02-16  5:16                           ` Andrew Morton
2007-02-16  5:25                             ` Christoph Lameter
2007-02-16  5:41                               ` Andrew Morton
2007-02-16  5:19                           ` Christoph Lameter
2007-02-16  4:24                       ` Andrew Morton
2007-02-16  8:15           ` Peter Zijlstra
2007-02-16  9:11             ` Rafael J. Wysocki
2007-02-16  9:19               ` Peter Zijlstra
2007-02-16 10:10             ` Christoph Lameter
2007-02-16 10:17               ` Peter Zijlstra
2007-02-16 11:04                 ` Rafael J. Wysocki
2007-02-16  2:16     ` Christoph Lameter
2007-02-16  3:17       ` Martin Bligh
2007-02-16  3:29         ` Christoph Lameter
2007-02-16  8:10     ` Peter Zijlstra
2007-02-16  2:15   ` Christoph Lameter
2007-02-16  2:55   ` Christoph Lameter
2007-02-16  5:02     ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.