* [RFC] Remove unswappable anonymous pages off the LRU @ 2007-02-15 21:05 Christoph Lameter 2007-02-15 22:31 ` Rik van Riel 2007-02-16 1:13 ` Andrew Morton 0 siblings, 2 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-15 21:05 UTC (permalink / raw) To: akpm Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Rik van Riel If we do not have any swap or we have run out of swap then anonymous pages can no longer be removed from memory. In that case we simply treat them like mlocked pages. For a kernel compiled CONFIG_SWAP off this means that all anonymous pages are marked mlocked when they are allocated. If there is no swap available then anonymous pages will be removed when we attempt to reclaim and find that there is no swap space available. I think it is best to account unreclaimable anonymous pages under NR_MLOCK because mlock is a way of treating pages that is defined by POSIX. It is clear then that these pages are not reclaimed. NONLRU would not communicate clearly what is happening to the pages and it would also include mlocked pages. The possible confusion that may arise here is that pages are mlocked without an mlock() syscall but I think that the sudden increase in NR_MLOCK will help people to reconsider what they are doing if they switch off swap. Pages may also be marked as mlocked() if we are running out of swap. One unresolved issue is how to get anonymous pages back to an unmlocked state if more swap is added to the system. Pages are checked for the mlocked state whenever a process terminates. However, anonymous pages of processes that do not terminate may stay mlocked. The only way to get rid of those would be to scan all mlocked pages on the system since we have no list of mlocked pages. That may be too expensive. Maybe the best is to leave the pages mlocked? Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.20-git11/include/linux/swap.h =================================================================== --- linux-2.6.20-git11.orig/include/linux/swap.h 2007-02-15 11:03:27.000000000 -0800 +++ linux-2.6.20-git11/include/linux/swap.h 2007-02-15 11:04:27.000000000 -0800 @@ -362,6 +362,11 @@ static inline swp_entry_t get_swap_page( return entry; } +static inline int add_to_swap(struct page *page, gfp_t flags) +{ + return -ENOSPC; +} + /* linux/mm/thrash.c */ #define put_swap_token(x) do { } while(0) #define grab_swap_token() do { } while(0) Index: linux-2.6.20-git11/mm/memory.c =================================================================== --- linux-2.6.20-git11.orig/mm/memory.c 2007-02-15 10:56:49.000000000 -0800 +++ linux-2.6.20-git11/mm/memory.c 2007-02-15 11:09:30.000000000 -0800 @@ -683,7 +683,7 @@ static unsigned long zap_pte_range(struc file_rss--; } page_remove_rmap(page, vma); - if (PageMlocked(page) && vma->vm_flags & VM_LOCKED) + if (PageMlocked(page)) lru_cache_add_mlock(page); tlb_remove_page(tlb, page); continue; @@ -907,17 +907,27 @@ static void add_anon_page(struct vm_area unsigned long address) { inc_mm_counter(vma->vm_mm, anon_rss); - if (vma->vm_flags & VM_LOCKED) { - /* - * Page is new and therefore not on the LRU - * so we can directly mark it as mlocked - */ - SetPageMlocked(page); - ClearPageActive(page); - inc_zone_page_state(page, NR_MLOCK); - } else - lru_cache_add_active(page); page_add_new_anon_rmap(page, vma, address); + +#ifdef CONFIG_SWAP + /* + * If there is no swap then there is no + * point in adding an anon page to the LRU + * because we can never reclaim the page. + */ + if (!(vma->vm_flags & VM_LOCKED)) { + lru_cache_add_active(page); + return; + } +#endif + + /* + * Page is new and therefore not on the LRU + * so we can directly mark it as mlocked + */ + SetPageMlocked(page); + ClearPageActive(page); + inc_zone_page_state(page, NR_MLOCK); } /* Index: linux-2.6.20-git11/mm/swap_state.c =================================================================== --- linux-2.6.20-git11.orig/mm/swap_state.c 2007-02-15 10:57:47.000000000 -0800 +++ linux-2.6.20-git11/mm/swap_state.c 2007-02-15 10:59:52.000000000 -0800 @@ -153,7 +153,7 @@ int add_to_swap(struct page * page, gfp_ for (;;) { entry = get_swap_page(); if (!entry.val) - return 0; + return -ENOSPC; /* * Radix-tree node allocations from PF_MEMALLOC contexts could @@ -174,7 +174,7 @@ int add_to_swap(struct page * page, gfp_ SetPageUptodate(page); SetPageDirty(page); INC_CACHE_INFO(add_total); - return 1; + return 0; case -EEXIST: /* Raced with "speculative" read_swap_cache_async */ INC_CACHE_INFO(exist_race); @@ -183,7 +183,7 @@ int add_to_swap(struct page * page, gfp_ default: /* -ENOMEM radix-tree allocation failure */ swap_free(entry); - return 0; + return -ENOMEM; } } } Index: linux-2.6.20-git11/mm/vmscan.c =================================================================== --- linux-2.6.20-git11.orig/mm/vmscan.c 2007-02-15 10:59:57.000000000 -0800 +++ linux-2.6.20-git11/mm/vmscan.c 2007-02-15 11:07:57.000000000 -0800 @@ -488,15 +488,24 @@ static unsigned long shrink_page_list(st if (referenced && page_mapping_inuse(page)) goto activate_locked; -#ifdef CONFIG_SWAP - /* - * Anonymous process memory has backing store? - * Try to allocate it some swap space here. - */ - if (PageAnon(page) && !PageSwapCache(page)) - if (!add_to_swap(page, GFP_ATOMIC)) + if (PageAnon(page) && !PageSwapCache(page)) { + /* + * Anonymous process memory has backing store? + * Try to allocate it some swap space here. + */ + int rc = add_to_swap(page, GFP_ATOMIC); + + if (rc == -ENOMEM) goto activate_locked; -#endif /* CONFIG_SWAP */ + + /* + * If we are unable to allocate a swap + * page then the anonymous page can never + * be reclaimed. In effect it is mlocked. + */ + if (rc == -ENOSPC) + goto mlocked; + } mapping = page_mapping(page); may_enter_fs = (sc->gfp_mask & __GFP_FS) || -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 21:05 [RFC] Remove unswappable anonymous pages off the LRU Christoph Lameter @ 2007-02-15 22:31 ` Rik van Riel 2007-02-15 22:41 ` Christoph Lameter 2007-02-16 1:13 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Rik van Riel @ 2007-02-15 22:31 UTC (permalink / raw) To: Christoph Lameter Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh Christoph Lameter wrote: > If we do not have any swap or we have run out of swap then anonymous pages > can no longer be removed from memory. In that case we simply treat them > like mlocked pages. Running out of swap is a temporary condition. You need to have some way for those pages to make it back onto the LRU list when swap becomes available. Better yet, we could implement a better way to reclaim swap space, or reclaim swap space in a different part of the code. For example, we could try to reclaim the swap space of every page that we scan on the active list - when swap space starts getting tight. -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 22:31 ` Rik van Riel @ 2007-02-15 22:41 ` Christoph Lameter 2007-02-15 22:50 ` Rik van Riel 0 siblings, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-15 22:41 UTC (permalink / raw) To: Rik van Riel Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh On Thu, 15 Feb 2007, Rik van Riel wrote: > Running out of swap is a temporary condition. > You need to have some way for those pages to > make it back onto the LRU list when swap > becomes available. Yup any ideas how? > Better yet, we could implement a better way to > reclaim swap space, or reclaim swap space in a > different part of the code. Certainly an interesting project. > For example, we could try to reclaim the swap > space of every page that we scan on the active > list - when swap space starts getting tight. Good idea. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 22:41 ` Christoph Lameter @ 2007-02-15 22:50 ` Rik van Riel 2007-02-15 22:53 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Rik van Riel @ 2007-02-15 22:50 UTC (permalink / raw) To: Christoph Lameter Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh Christoph Lameter wrote: > On Thu, 15 Feb 2007, Rik van Riel wrote: > >> Running out of swap is a temporary condition. >> You need to have some way for those pages to >> make it back onto the LRU list when swap >> becomes available. > > Yup any ideas how? Not really. >> For example, we could try to reclaim the swap >> space of every page that we scan on the active >> list - when swap space starts getting tight. > > Good idea. I suspect this will be a better approach. That way the least used pages can cycle into swap space, and the more used pages can be in RAM. The only reason pages are unswappable when we run out of swap is that we don't free up the swap space used by pages that are in memory. -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 22:50 ` Rik van Riel @ 2007-02-15 22:53 ` Christoph Lameter 2007-02-15 23:19 ` Andrew Morton 2007-02-15 23:20 ` Lee Schermerhorn 2 siblings, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-15 22:53 UTC (permalink / raw) To: Rik van Riel Cc: akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh On Thu, 15 Feb 2007, Rik van Riel wrote: > Christoph Lameter wrote: > > On Thu, 15 Feb 2007, Rik van Riel wrote: > > > > > Running out of swap is a temporary condition. > > > You need to have some way for those pages to > > > make it back onto the LRU list when swap > > > becomes available. > > > > Yup any ideas how? > > Not really. Maybe its then best to not move the pages off the LRU when there is some swap available. But even if there is no swap available: The user could add some later. So there is really no criterion for removing anonymous pages off the LRU. We would at least need some list of mlocked pages in orderto feed them back to the LRU. > > > For example, we could try to reclaim the swap > > > space of every page that we scan on the active > > > list - when swap space starts getting tight. > > > > Good idea. > > I suspect this will be a better approach. That way > the least used pages can cycle into swap space, and > the more used pages can be in RAM. > > The only reason pages are unswappable when we run > out of swap is that we don't free up the swap space > used by pages that are in memory. Well that is another project and not moving pages off the LRU. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 22:50 ` Rik van Riel 2007-02-15 22:53 ` Christoph Lameter @ 2007-02-15 23:19 ` Andrew Morton 2007-02-15 23:20 ` Lee Schermerhorn 2 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-15 23:19 UTC (permalink / raw) To: Rik van Riel Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh On Thu, 15 Feb 2007 17:50:30 -0500 Rik van Riel <riel@redhat.com> wrote: > Christoph Lameter wrote: > > On Thu, 15 Feb 2007, Rik van Riel wrote: > > > >> Running out of swap is a temporary condition. > >> You need to have some way for those pages to > >> make it back onto the LRU list when swap > >> becomes available. > > > > Yup any ideas how? > > Not really. I guess we could be less ambitious. Obviously, CONFIG_SWAP=n is a no-brainer. And perhaps it's OK to treat no-swap-online as CONFIG_SWAP=n. So any pages which we _tried_ to swap out before any swap was online get treated as locked memory. Well, that's just bad luck. Perhaps we could do some stupid little manual thing based on the smaps walker: echo 1 > /proc/pid/add-your-anon-pages-back-to-the-lru ug. Which leaves us wondering what to do about the temporary out-of-swap problem. That''ll be hard - we don't want to do a full virtual scan of all the mm's each time free swap goes from 0kb to 4kb. I'd suggest that for now we forget about this case and just put up with the additional scanning. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 22:50 ` Rik van Riel 2007-02-15 22:53 ` Christoph Lameter 2007-02-15 23:19 ` Andrew Morton @ 2007-02-15 23:20 ` Lee Schermerhorn 2007-02-16 0:15 ` Andrew Morton 2 siblings, 1 reply; 44+ messages in thread From: Lee Schermerhorn @ 2007-02-15 23:20 UTC (permalink / raw) To: Rik van Riel Cc: Christoph Lameter, akpm, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Larry Woodman On Thu, 2007-02-15 at 17:50 -0500, Rik van Riel wrote: > Christoph Lameter wrote: > > On Thu, 15 Feb 2007, Rik van Riel wrote: > > > >> Running out of swap is a temporary condition. > >> You need to have some way for those pages to > >> make it back onto the LRU list when swap > >> becomes available. > > > > Yup any ideas how? > > Not really. > > >> For example, we could try to reclaim the swap > >> space of every page that we scan on the active > >> list - when swap space starts getting tight. > > > > Good idea. > > I suspect this will be a better approach. That way > the least used pages can cycle into swap space, and > the more used pages can be in RAM. > > The only reason pages are unswappable when we run > out of swap is that we don't free up the swap space > used by pages that are in memory. Many large memory systems [e.g., 64G-128G x86_64] running large database servers run with little [~2G] to no swap. Most of physical memory is allocated to large shared memory areas which are never expected to swap out [even tho' some db apps may not lock the shmem down :-(]. In these systems, removing the shared memory pages from reclaim consideration may alleviate some nasty lockups we've seen when one of these systems gets pushed into reclaim because, e.g., someone ran a backup that filled the page cache. We find all of the cpus walking the LRU list [millions of pages] to find eligible reclaim candidates. [Almost] none of the shmem pages are reclaimable because of insufficient swap, and we don't want them swapped anyway. Now one could argue that this is an application error, because it doesn't lock the shared memory regions that it doesn't want swapped anyway. This doesn't help the customers in the short term. They're looking for a way to take control outside of the application and make their needs known to the system. Needs like, never push out shmem [and maybe even anon] memory to make room for page cache pages. This, I believe, is the motivation behind the "limit the page cache" patches/requests that we keep seeing. An idea for handling these: With the addition of Christoph's patch to move mlock()ed pages out of the LRU, we could add a mechanism to automagically lock shared memory regions that either exceed some tunable threshold or that exceed the available amount of swap. Larry Woodman at Red Hat has been experimenting with patches to move shmem [and anon?] pages in excess of swap to a separate "wired list". This has alleviated part of the problems [apparent system hangs]. There are other issues, some that have been discussed on the mailing lists recently, with page cache pages messing up the LRU-ness of the active and inactive lists; vmscan not being proactive enough in keeping available memory [limits too low for large systems]; etc. Those issues are exacerbated by a long active list with a high fraction of unreclaimable pages. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 23:20 ` Lee Schermerhorn @ 2007-02-16 0:15 ` Andrew Morton 0 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 0:15 UTC (permalink / raw) To: Lee Schermerhorn Cc: Rik van Riel, Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Larry Woodman On Thu, 15 Feb 2007 18:20:58 -0500 Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote: > With the addition of Christoph's patch to move mlock()ed pages out of > the LRU, we could add a mechanism to automagically lock shared memory > regions that either exceed some tunable threshold or that exceed the > available amount of swap. But we have an out-of-band way of diddling shm segments? So we could create /usr/bin/ipclock --lock -i 2432 ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-15 21:05 [RFC] Remove unswappable anonymous pages off the LRU Christoph Lameter 2007-02-15 22:31 ` Rik van Riel @ 2007-02-16 1:13 ` Andrew Morton 2007-02-16 1:24 ` KAMEZAWA Hiroyuki ` (3 more replies) 1 sibling, 4 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 1:13 UTC (permalink / raw) To: Christoph Lameter Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Rik van Riel On Thu, 15 Feb 2007 13:05:47 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > If we do not have any swap or we have run out of swap then anonymous pages > can no longer be removed from memory. In that case we simply treat them > like mlocked pages. For a kernel compiled CONFIG_SWAP off this means > that all anonymous pages are marked mlocked when they are allocated. It's nice and simple, but I think I'd prefer to wait for the existing mlock changes to crash a bit less before we do this. Is it true that PageMlocked() pages are never on the LRU? If so, perhaps we could overload the lru.next/prev on these pages to flag an mlocked page. #define PageMlocked(page) (page->lru.next == some_address_which_isnt_used_for_anwything_else) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:13 ` Andrew Morton @ 2007-02-16 1:24 ` KAMEZAWA Hiroyuki 2007-02-16 1:40 ` Martin Bligh ` (2 subsequent siblings) 3 siblings, 0 replies; 44+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-02-16 1:24 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, linux-mm, nickpiggin, a.p.zijlstra, mbligh, riel On Thu, 15 Feb 2007 17:13:55 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 15 Feb 2007 13:05:47 -0800 (PST) > Christoph Lameter <clameter@sgi.com> wrote: > > > If we do not have any swap or we have run out of swap then anonymous pages > > can no longer be removed from memory. In that case we simply treat them > > like mlocked pages. For a kernel compiled CONFIG_SWAP off this means > > that all anonymous pages are marked mlocked when they are allocated. > > It's nice and simple, but I think I'd prefer to wait for the existing mlock > changes to crash a bit less before we do this. > > Is it true that PageMlocked() pages are never on the LRU? If so, perhaps > we could overload the lru.next/prev on these pages to flag an mlocked page. > > #define PageMlocked(page) (page->lru.next == some_address_which_isnt_used_for_anwything_else) > I think mlocked pages are not reclaimable but movable. So some structure should link them to a list... -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:13 ` Andrew Morton 2007-02-16 1:24 ` KAMEZAWA Hiroyuki @ 2007-02-16 1:40 ` Martin Bligh 2007-02-16 1:49 ` Andrew Morton ` (2 more replies) 2007-02-16 2:15 ` Christoph Lameter 2007-02-16 2:55 ` Christoph Lameter 3 siblings, 3 replies; 44+ messages in thread From: Martin Bligh @ 2007-02-16 1:40 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel Andrew Morton wrote: > On Thu, 15 Feb 2007 13:05:47 -0800 (PST) > Christoph Lameter <clameter@sgi.com> wrote: > >> If we do not have any swap or we have run out of swap then anonymous pages >> can no longer be removed from memory. In that case we simply treat them >> like mlocked pages. For a kernel compiled CONFIG_SWAP off this means >> that all anonymous pages are marked mlocked when they are allocated. > > It's nice and simple, but I think I'd prefer to wait for the existing mlock > changes to crash a bit less before we do this. > > Is it true that PageMlocked() pages are never on the LRU? If so, perhaps > we could overload the lru.next/prev on these pages to flag an mlocked page. > > #define PageMlocked(page) (page->lru.next == some_address_which_isnt_used_for_anwything_else) Mine just created a locked list. If you stick them there, there's no need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-) Suspect most of the rest of my patch is crap, but that might be useful? M. --- linux-2.6.17/include/linux/mm_inline.h 2006-06-17 18:49:35.000000000 -0 700 +++ linux-2.6.17-mlock_lru/include/linux/mm_inline.h 2006-07-28 15:53:15.0000 00000 -0700 @@ -28,6 +27,20 @@ del_page_from_inactive_list(struct zone } static inline void +add_page_to_mlocked_list(struct zone *zone, struct page *page) +{ + list_add(&page->lru, &zone->mlocked_list); + zone->nr_mlocked--; +} + +static inline void +del_page_from_mlocked_list(struct zone *zone, struct page *page) +{ + list_del(&page->lru); + zone->nr_mlocked--; +} + +static inline void del_page_from_lru(struct zone *zone, struct page *page) { list_del(&page->lru); diff -aurpN -X /home/mbligh/.diff.exclude linux-2.6.17/include/linux/mmzone.h li nux-2.6.17-mlock_lru/include/linux/mmzone.h --- linux-2.6.17/include/linux/mmzone.h 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-mlock_lru/include/linux/mmzone.h 2006-07-28 15:49:05.0000 00000 -0700 @@ -156,10 +156,12 @@ struct zone { spinlock_t lru_lock; struct list_head active_list; struct list_head inactive_list; + struct list_head mlocked_list; unsigned long nr_scan_active; unsigned long nr_scan_inactive; unsigned long nr_active; unsigned long nr_inactive; + unsigned long nr_mlocked; unsigned long pages_scanned; /* since last reclaim */ int all_unreclaimable; /* All pages pinned */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:40 ` Martin Bligh @ 2007-02-16 1:49 ` Andrew Morton 2007-02-16 2:21 ` Martin Bligh 2007-02-16 2:34 ` Christoph Lameter 2007-02-16 2:16 ` Christoph Lameter 2007-02-16 8:10 ` Peter Zijlstra 2 siblings, 2 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 1:49 UTC (permalink / raw) To: Martin Bligh Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007 17:40:09 -0800 Martin Bligh <mbligh@mbligh.org> wrote: > Andrew Morton wrote: > > On Thu, 15 Feb 2007 13:05:47 -0800 (PST) > > Christoph Lameter <clameter@sgi.com> wrote: > > > >> If we do not have any swap or we have run out of swap then anonymous pages > >> can no longer be removed from memory. In that case we simply treat them > >> like mlocked pages. For a kernel compiled CONFIG_SWAP off this means > >> that all anonymous pages are marked mlocked when they are allocated. > > > > It's nice and simple, but I think I'd prefer to wait for the existing mlock > > changes to crash a bit less before we do this. > > > > Is it true that PageMlocked() pages are never on the LRU? If so, perhaps > > we could overload the lru.next/prev on these pages to flag an mlocked page. > > > > #define PageMlocked(page) (page->lru.next == some_address_which_isnt_used_for_anwything_else) > > Mine just created a locked list. If you stick them there, there's no > need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-) I don't think there's a need for a mlocked list in the mlock patches: nothing ever needs to walk it. However this might be a good way of solving the someone-did-a-swapon problem for this anon patch. Guys, this page-flag problem is really serious. -mm adds PG_mlocked and PG_readahead and the ext4 patches add PG_booked (am currently fighting the good fight there). There's ongoing steady growth in these things and soon we're going to be in a lot of pain. > Suspect most of the rest of my patch is crap, but that might be useful? wordwrapped, space-stuffed and tab-replaced. The trifecta! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:49 ` Andrew Morton @ 2007-02-16 2:21 ` Martin Bligh 2007-02-16 2:34 ` Christoph Lameter 1 sibling, 0 replies; 44+ messages in thread From: Martin Bligh @ 2007-02-16 2:21 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel >>> #define PageMlocked(page) (page->lru.next == some_address_which_isnt_used_for_anwything_else) >> Mine just created a locked list. If you stick them there, there's no >> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-) > > I don't think there's a need for a mlocked list in the mlock patches: > nothing ever needs to walk it. > > However this might be a good way of solving the someone-did-a-swapon > problem for this anon patch. > > Guys, this page-flag problem is really serious. -mm adds PG_mlocked and > PG_readahead and the ext4 patches add PG_booked (am currently fighting the > good fight there). There's ongoing steady growth in these things and soon > we're going to be in a lot of pain. Well, if the list is sufficient to fix that, I don't see why we'd care about the overhead of list manipulation vs a flag, it's not a fast path. >> Suspect most of the rest of my patch is crap, but that might be useful? > > wordwrapped, space-stuffed and tab-replaced. The trifecta! That's cause it was fairly obviously useless as-was so I just cut and pasted it. But nonetheless, I appreciate your adulation ;-) I'll try to add CamelCaps, bracing fuckups, and lots of #ifdefs for the next round. M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:49 ` Andrew Morton 2007-02-16 2:21 ` Martin Bligh @ 2007-02-16 2:34 ` Christoph Lameter 2007-02-16 2:48 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 2:34 UTC (permalink / raw) To: Andrew Morton Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > Guys, this page-flag problem is really serious. -mm adds PG_mlocked and > PG_readahead and the ext4 patches add PG_booked (am currently fighting the > good fight there). There's ongoing steady growth in these things and soon > we're going to be in a lot of pain. Well is it possible to restrict some of the features to 64 bit only? There we have lots of page flags. One additional measure that may be possible is to have a page type field (maybe 3 bits long) that would consolidate a series of page flags that cannot occur together. But then we have issues with the atomicity of updates to that field. F.e. page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 2:34 ` Christoph Lameter @ 2007-02-16 2:48 ` Andrew Morton 2007-02-16 2:50 ` Christoph Lameter 2007-02-16 8:15 ` Peter Zijlstra 0 siblings, 2 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 2:48 UTC (permalink / raw) To: Christoph Lameter Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007 18:34:12 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 15 Feb 2007, Andrew Morton wrote: > > > Guys, this page-flag problem is really serious. -mm adds PG_mlocked and > > PG_readahead and the ext4 patches add PG_booked (am currently fighting the > > good fight there). There's ongoing steady growth in these things and soon > > we're going to be in a lot of pain. > > Well is it possible to restrict some of the features to 64 bit only? There > we have lots of page flags. We discussed that a while back and iirc ia64 has gone and gobbled most of the upper 32bits. Someone went and added some ascii art around the PG_uncached definition but it is incomprehensible. It seems to claim that ia64 has gone and used all 32 bits, dammit. If so, some adjustments to ia64 might be called for. > One additional measure that may be possible is to have a page type field > (maybe 3 bits long) that would consolidate a series of page flags that > cannot occur together. But then we have issues with the atomicity of > updates to that field. > > F.e. > > page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> } Yeah, maybe. There doesn't seem to be a lot of room for that though - a lot of those flags are quite independent and can occur simultaneously. Maybe PageSwapCache can be worked out by other means. The two swsusp bits can be removed: they're only needed at suspend/resume time and can be replaced by an external data structure. I still reckon there must be a way to avoid PG_buddy but Martin put up stiff-and-squealy resistance when I resisted the addition of that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 2:48 ` Andrew Morton @ 2007-02-16 2:50 ` Christoph Lameter 2007-02-16 3:18 ` Andrew Morton 2007-02-16 8:15 ` Peter Zijlstra 1 sibling, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 2:50 UTC (permalink / raw) To: Andrew Morton Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > We discussed that a while back and iirc ia64 has gone and gobbled most of > the upper 32bits. Someone went and added some ascii art around the > PG_uncached definition but it is incomprehensible. It seems to claim that > ia64 has gone and used all 32 bits, dammit. If so, some adjustments to > ia64 might be called for. Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully usable. So we have 32-20 = 12 bits to play with on 64 bit. > > > page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> } > > Yeah, maybe. There doesn't seem to be a lot of room for that though - a > lot of those flags are quite independent and can occur simultaneously. None of the above can occur simultaneously. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 2:50 ` Christoph Lameter @ 2007-02-16 3:18 ` Andrew Morton 2007-02-16 3:36 ` Christoph Lameter 0 siblings, 1 reply; 44+ messages in thread From: Andrew Morton @ 2007-02-16 3:18 UTC (permalink / raw) To: Christoph Lameter Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007 18:50:39 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 15 Feb 2007, Andrew Morton wrote: > > > We discussed that a while back and iirc ia64 has gone and gobbled most of > > the upper 32bits. Someone went and added some ascii art around the > > PG_uncached definition but it is incomprehensible. It seems to claim that > > ia64 has gone and used all 32 bits, dammit. If so, some adjustments to > > ia64 might be called for. > > Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully > usable. So we have 32-20 = 12 bits to play with on 64 bit. OK. But not many things are 64-bit-only? > > > > > page_type = { SLAB, LRU, MLOCK, RESERVED, BUDDY, <add 3 more types here> } > > > > Yeah, maybe. There doesn't seem to be a lot of room for that though - a > > lot of those flags are quite independent and can occur simultaneously. > > None of the above can occur simultaneously. <actually pays attention> OK. The actual implementation details might get messy though. We can do a non-atomic rmw of the three bits but that could corrupt a concurrent modification of a different flag. Or we could do a succession of three set_bit/clear_bit operations, but that exposes intermediate invalid states. It can be done I guess, but it'd be fiddly. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:18 ` Andrew Morton @ 2007-02-16 3:36 ` Christoph Lameter 2007-02-16 3:42 ` Andrew Morton 0 siblings, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 3:36 UTC (permalink / raw) To: Andrew Morton Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > > Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully > > usable. So we have 32-20 = 12 bits to play with on 64 bit. > > OK. But not many things are 64-bit-only? We could restrict some newer features to 64 bits? (ducks and runs ...) > > None of the above can occur simultaneously. > The actual implementation details might get messy though. We can do a > non-atomic rmw of the three bits but that could corrupt a concurrent > modification of a different flag. Or we could do a succession of three > set_bit/clear_bit operations, but that exposes intermediate invalid states. > > It can be done I guess, but it'd be fiddly. Right. Maybe we could somehow splite up page->flags into 4 separate bytes? Updating one byte would not endanger the other bytes in the other sets? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:36 ` Christoph Lameter @ 2007-02-16 3:42 ` Andrew Morton 2007-02-16 3:50 ` Christoph Lameter 0 siblings, 1 reply; 44+ messages in thread From: Andrew Morton @ 2007-02-16 3:42 UTC (permalink / raw) To: Christoph Lameter Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007 19:36:01 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 15 Feb 2007, Andrew Morton wrote: > > > > Yes ia64 has used the upper 32 bit. However, the lower 32 bits are fully > > > usable. So we have 32-20 = 12 bits to play with on 64 bit. > > > > OK. But not many things are 64-bit-only? > > We could restrict some newer features to 64 bits? (ducks and runs ...) Well. We haven't come across many such things, and doing this would mucky up the VM and would reduce testing coverage. But yeah, it's always an option if these things crop up. > > > None of the above can occur simultaneously. > > The actual implementation details might get messy though. We can do a > > non-atomic rmw of the three bits but that could corrupt a concurrent > > modification of a different flag. Or we could do a succession of three > > set_bit/clear_bit operations, but that exposes intermediate invalid states. > > > > It can be done I guess, but it'd be fiddly. > > Right. > > Maybe we could somehow splite up page->flags into 4 separate bytes? > Updating one byte would not endanger the other bytes in the other > sets? yipes. I'm not sure that'd work? compare-and-swap-in-a-loop could be used, I guess. With the obvious problem.. I do think that those two swsusp flags are low-hanging-fruit. It'd be trivial to vmalloc a bitmap or use a radix-tree-holding-longs, but I have a vague feeling that there were subtle issues with that. Still, Something Needs To Be Done. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:42 ` Andrew Morton @ 2007-02-16 3:50 ` Christoph Lameter 2007-02-16 4:02 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 3:50 UTC (permalink / raw) To: Andrew Morton Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > > Maybe we could somehow splite up page->flags into 4 separate bytes? > > Updating one byte would not endanger the other bytes in the other > > sets? > > yipes. I'm not sure that'd work? Are all arches able to do atomic ops on bytes? > compare-and-swap-in-a-loop could be used, I guess. With the obvious problem.. Yucks. There seems to be no easy solution. > I do think that those two swsusp flags are low-hanging-fruit. It'd be > trivial to vmalloc a bitmap or use a radix-tree-holding-longs, but I have a > vague feeling that there were subtle issues with that. Still, Something > Needs To Be Done. I tinkered with some similar radical ideas lately. Maybe a bit vector could be used instead? For 1G of memory we would need 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. Seems to be reasonable? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:50 ` Christoph Lameter @ 2007-02-16 4:02 ` Andrew Morton 2007-02-16 4:07 ` Christoph Lameter 2007-02-16 4:03 ` Andrew Morton 2007-02-16 4:14 ` Rik van Riel 2 siblings, 1 reply; 44+ messages in thread From: Andrew Morton @ 2007-02-16 4:02 UTC (permalink / raw) To: Christoph Lameter Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007 19:50:45 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 15 Feb 2007, Andrew Morton wrote: > > > > Maybe we could somehow splite up page->flags into 4 separate bytes? > > > Updating one byte would not endanger the other bytes in the other > > > sets? > > > > yipes. I'm not sure that'd work? > > Are all arches able to do atomic ops on bytes? I think they are, but you only wanted three bits. I don't think we'll be able to convert eight bits into a 256-value scalar efficiently. > > compare-and-swap-in-a-loop could be used, I guess. With the obvious problem.. > > Yucks. There seems to be no easy solution. > > > I do think that those two swsusp flags are low-hanging-fruit. It'd be > > trivial to vmalloc a bitmap or use a radix-tree-holding-longs, but I have a > > vague feeling that there were subtle issues with that. Still, Something > > Needs To Be Done. > > I tinkered with some similar radical ideas lately. Maybe a bit vector > could be used instead? For 1G of memory we would need > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > Seems to be reasonable? > 32k per bit per gig, yes. Better for large PAGE_SIZE. More cachemisses. But will it come unstuck for machines which have a super-sparse pfn space? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 4:02 ` Andrew Morton @ 2007-02-16 4:07 ` Christoph Lameter 0 siblings, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 4:07 UTC (permalink / raw) To: Andrew Morton Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > > could be used instead? For 1G of memory we would need > > > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > > > Seems to be reasonable? > > > > 32k per bit per gig, yes. Better for large PAGE_SIZE. More cachemisses. > > But will it come unstuck for machines which have a super-sparse pfn space? IA64 is such a beast. I think IA64 would work fine if we had bitmap vectors per zone. However, powerpc may have even super sparse zones. We may have to ask them first. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:50 ` Christoph Lameter 2007-02-16 4:02 ` Andrew Morton @ 2007-02-16 4:03 ` Andrew Morton 2007-02-16 4:14 ` Rik van Riel 2 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 4:03 UTC (permalink / raw) To: Christoph Lameter Cc: Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007 19:50:45 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > I tinkered with some similar radical ideas lately. Maybe a bit vector > could be used instead? For 1G of memory we would need > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > Seems to be reasonable? Dave Hansen did have a patchset which did something along these lines, btw. iirc it used a tree and/or a hash of some form. Much terror ensued. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:50 ` Christoph Lameter 2007-02-16 4:02 ` Andrew Morton 2007-02-16 4:03 ` Andrew Morton @ 2007-02-16 4:14 ` Rik van Riel 2007-02-16 4:15 ` Christoph Lameter 2007-02-16 4:24 ` Andrew Morton 2 siblings, 2 replies; 44+ messages in thread From: Rik van Riel @ 2007-02-16 4:14 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki Christoph Lameter wrote: > I tinkered with some similar radical ideas lately. Maybe a bit vector > could be used instead? For 1G of memory we would need > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > Seems to be reasonable? At that point, wouldn't it be easier to simply increase the size of struct page? I don't think they're power of two sized anyway, at least on 64 bit architectures. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 4:14 ` Rik van Riel @ 2007-02-16 4:15 ` Christoph Lameter 2007-02-16 4:57 ` KAMEZAWA Hiroyuki 2007-02-16 4:24 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 4:15 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki On Thu, 15 Feb 2007, Rik van Riel wrote: > Christoph Lameter wrote: > > > I tinkered with some similar radical ideas lately. Maybe a bit vector > > could be used instead? For 1G of memory we would need > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > > > Seems to be reasonable? > > At that point, wouldn't it be easier to simply increase > the size of struct page? I don't think they're power of > two sized anyway, at least on 64 bit architectures. On 64 bit platforms we can add one unsigned long to get from 56 to 64 bytes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 4:15 ` Christoph Lameter @ 2007-02-16 4:57 ` KAMEZAWA Hiroyuki 2007-02-16 5:16 ` Andrew Morton 2007-02-16 5:19 ` Christoph Lameter 0 siblings, 2 replies; 44+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-02-16 4:57 UTC (permalink / raw) To: Christoph Lameter; +Cc: riel, akpm, mbligh, linux-mm, nickpiggin, a.p.zijlstra On Thu, 15 Feb 2007 20:15:46 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 15 Feb 2007, Rik van Riel wrote: > > > Christoph Lameter wrote: > > > > > I tinkered with some similar radical ideas lately. Maybe a bit vector > > > could be used instead? For 1G of memory we would need > > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > > > > > Seems to be reasonable? > > > > At that point, wouldn't it be easier to simply increase > > the size of struct page? I don't think they're power of > > two sized anyway, at least on 64 bit architectures. > > On 64 bit platforms we can add one unsigned long to get from 56 to 64 > bytes. > I sometimes dreams == struct page { ... struct zone *zone; ... }; #define page_zone(page) (page)->zone == but never tried ;) -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 4:57 ` KAMEZAWA Hiroyuki @ 2007-02-16 5:16 ` Andrew Morton 2007-02-16 5:25 ` Christoph Lameter 2007-02-16 5:19 ` Christoph Lameter 1 sibling, 1 reply; 44+ messages in thread From: Andrew Morton @ 2007-02-16 5:16 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Christoph Lameter, riel, mbligh, linux-mm, nickpiggin, a.p.zijlstra On Fri, 16 Feb 2007 13:57:14 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Thu, 15 Feb 2007 20:15:46 -0800 (PST) > Christoph Lameter <clameter@sgi.com> wrote: > > > On Thu, 15 Feb 2007, Rik van Riel wrote: > > > > > Christoph Lameter wrote: > > > > > > > I tinkered with some similar radical ideas lately. Maybe a bit vector > > > > could be used instead? For 1G of memory we would need > > > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > > > > > > > Seems to be reasonable? > > > > > > At that point, wouldn't it be easier to simply increase > > > the size of struct page? I don't think they're power of > > > two sized anyway, at least on 64 bit architectures. > > > > On 64 bit platforms we can add one unsigned long to get from 56 to 64 > > bytes. > > > > I sometimes dreams > == > struct page { > ... > struct zone *zone; > ... > }; > #define page_zone(page) (page)->zone > == > but never tried ;) hm. We can calculate page_zone(page) from the pfn. And I suspect we can do that locklessly too. I bet a nice tight implementation of that would be efficient enough and it'll reclaim heaps of flags. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 5:16 ` Andrew Morton @ 2007-02-16 5:25 ` Christoph Lameter 2007-02-16 5:41 ` Andrew Morton 0 siblings, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 5:25 UTC (permalink / raw) To: Andrew Morton Cc: KAMEZAWA Hiroyuki, riel, mbligh, linux-mm, nickpiggin, a.p.zijlstra On Thu, 15 Feb 2007, Andrew Morton wrote: > hm. We can calculate page_zone(page) from the pfn. And I suspect we can > do that locklessly too. I bet a nice tight implementation of that would be > efficient enough and it'll reclaim heaps of flags. You mean encode the node and the zone_id in the pfn? Ummm... That would get us into lots of trouble with pfn_to_page and friends. The sparsemem section field could be available. A virtual memmap based implementation would not need the section number and would get rid of the sparsemem table lookups.Problem is that we cannot do it on 32 bit platforms because of the lack of virtual memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 5:25 ` Christoph Lameter @ 2007-02-16 5:41 ` Andrew Morton 0 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 5:41 UTC (permalink / raw) To: Christoph Lameter Cc: KAMEZAWA Hiroyuki, riel, mbligh, linux-mm, nickpiggin, a.p.zijlstra On Thu, 15 Feb 2007 21:25:53 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 15 Feb 2007, Andrew Morton wrote: > > > hm. We can calculate page_zone(page) from the pfn. And I suspect we can > > do that locklessly too. I bet a nice tight implementation of that would be > > efficient enough and it'll reclaim heaps of flags. > > You mean encode the node and the zone_id in the pfn? Maybe. Or just leave the pfns as they are and implement a decent lookup algorithm. For a pc it'd be something like for (i = ZONE_DMA; i <= ZONE_HIGHMEM; i++) { if (pfn >= first_pfn(i) && pfn <= last_pfn(i)) success(); } if you get my drift. I dunno how complex that would get in the worst cases. > Ummm... That would > get us into lots of trouble with pfn_to_page and friends. > > The sparsemem section field could be available. A virtual > memmap based implementation would not need the section number and would > get rid of the sparsemem table lookups.Problem is that we cannot do it on > 32 bit platforms because of the lack of virtual memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 4:57 ` KAMEZAWA Hiroyuki 2007-02-16 5:16 ` Andrew Morton @ 2007-02-16 5:19 ` Christoph Lameter 1 sibling, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 5:19 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: riel, akpm, mbligh, linux-mm, nickpiggin, a.p.zijlstra On Fri, 16 Feb 2007, KAMEZAWA Hiroyuki wrote: > > On 64 bit platforms we can add one unsigned long to get from 56 to 64 > > bytes. > > > > I sometimes dreams > == > struct page { > ... > struct zone *zone; > ... > }; > #define page_zone(page) (page)->zone > == > but never tried ;) Hmmm..... Currently we have static inline struct zone *page_zone(struct page *page) { return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]; } page_to_nid is extracting a piece of the page flags. Then we need to do a lookup and find the zonenum (another extract from page flags). This is not expensive. Look at __pagevec_lru_add. This boils down to (r9 = struct page * ): 0xa000000100117ef0 <__pagevec_lru_add+80>: [MMI] ld8 r33=[r9];; 0xa000000100117ef1 <__pagevec_lru_add+81>: ld8 r8=[r33] 0xa000000100117ef2 <__pagevec_lru_add+82>: nop.i 0x0;; 0xa000000100117f00 <__pagevec_lru_add+96>: [MII] nop.m 0x0 0xa000000100117f01 <__pagevec_lru_add+97>: shr.u r3=r8,54;; 0xa000000100117f02 <__pagevec_lru_add+98>: nop.i 0x0 0xa000000100117f10 <__pagevec_lru_add+112>: [MMI] shladd r14=r3,3,r15;; 0xa000000100117f11 <__pagevec_lru_add+113>: ld8 r34=[r14] 0xa000000100117f12 <__pagevec_lru_add+114>: nop.i 0x0;; 0xa000000100117f20 <__pagevec_lru_add+128>: [MIB] nop.m 0x0 0xa000000100117f21 <__pagevec_lru_add+129>: cmp.eq p6,p7=r2,r34 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 4:14 ` Rik van Riel 2007-02-16 4:15 ` Christoph Lameter @ 2007-02-16 4:24 ` Andrew Morton 1 sibling, 0 replies; 44+ messages in thread From: Andrew Morton @ 2007-02-16 4:24 UTC (permalink / raw) To: Rik van Riel Cc: Christoph Lameter, Martin Bligh, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki On Thu, 15 Feb 2007 23:14:01 -0500 Rik van Riel <riel@redhat.com> wrote: > Christoph Lameter wrote: > > > I tinkered with some similar radical ideas lately. Maybe a bit vector > > could be used instead? For 1G of memory we would need > > > > 2^(30 - PAGE_SHIFT / 8 = 2^(30-12-3) = 2^15 = 32k bytes of a bitmap. > > > > Seems to be reasonable? > > At that point, wouldn't it be easier to simply increase > the size of struct page? I don't think they're power of > two sized anyway, at least on 64 bit architectures. That gives us an additional 32 bits in one hit whereas the external bitmap allows us to fine-tune it. Doing neither is of course best.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 2:48 ` Andrew Morton 2007-02-16 2:50 ` Christoph Lameter @ 2007-02-16 8:15 ` Peter Zijlstra 2007-02-16 9:11 ` Rafael J. Wysocki 2007-02-16 10:10 ` Christoph Lameter 1 sibling, 2 replies; 44+ messages in thread From: Peter Zijlstra @ 2007-02-16 8:15 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Martin Bligh, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel, Rafael J. Wysocki On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote: > The two swsusp bits can be removed: they're only needed at suspend/resume > time and can be replaced by an external data structure. I once had a talk with Rafael, and he said it would be possible to rid us of PG_nosave* with the now not so new bitmap code that is used to handle swsusp of highmem pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 8:15 ` Peter Zijlstra @ 2007-02-16 9:11 ` Rafael J. Wysocki 2007-02-16 9:19 ` Peter Zijlstra 2007-02-16 10:10 ` Christoph Lameter 1 sibling, 1 reply; 44+ messages in thread From: Rafael J. Wysocki @ 2007-02-16 9:11 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Christoph Lameter, Martin Bligh, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel On Friday, 16 February 2007 09:15, Peter Zijlstra wrote: > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote: > > > The two swsusp bits can be removed: they're only needed at suspend/resume > > time and can be replaced by an external data structure. > > I once had a talk with Rafael, and he said it would be possible to rid > us of PG_nosave* with the now not so new bitmap code that is used to > handle swsusp of highmem pages. Yes, that is true. I'm going to do this soon, but first I'd like to help to make the task freezer suitable for the CPU hotplug. Greetings, Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 9:11 ` Rafael J. Wysocki @ 2007-02-16 9:19 ` Peter Zijlstra 0 siblings, 0 replies; 44+ messages in thread From: Peter Zijlstra @ 2007-02-16 9:19 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Christoph Lameter, Martin Bligh, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel On Fri, 2007-02-16 at 10:11 +0100, Rafael J. Wysocki wrote: > On Friday, 16 February 2007 09:15, Peter Zijlstra wrote: > > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote: > > > > > The two swsusp bits can be removed: they're only needed at suspend/resume > > > time and can be replaced by an external data structure. > > > > I once had a talk with Rafael, and he said it would be possible to rid > > us of PG_nosave* with the now not so new bitmap code that is used to > > handle swsusp of highmem pages. > > Yes, that is true. > > I'm going to do this soon, Great! > but first I'd like to help to make the task freezer > suitable for the CPU hotplug. A worthy challenge, have fun :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 8:15 ` Peter Zijlstra 2007-02-16 9:11 ` Rafael J. Wysocki @ 2007-02-16 10:10 ` Christoph Lameter 2007-02-16 10:17 ` Peter Zijlstra 1 sibling, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 10:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel, Rafael J. Wysocki On Fri, 16 Feb 2007, Peter Zijlstra wrote: > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote: > > > The two swsusp bits can be removed: they're only needed at suspend/resume > > time and can be replaced by an external data structure. > > I once had a talk with Rafael, and he said it would be possible to rid > us of PG_nosave* with the now not so new bitmap code that is used to > handle swsusp of highmem pages. Well we can just shift the stuff into the power subsystem I think. Like this? Compiles but not tested. Index: linux-2.6.20-mm1/include/linux/mmzone.h =================================================================== --- linux-2.6.20-mm1.orig/include/linux/mmzone.h 2007-02-16 01:11:46.000000000 -0800 +++ linux-2.6.20-mm1/include/linux/mmzone.h 2007-02-16 01:12:23.000000000 -0800 @@ -295,6 +295,7 @@ struct zone { unsigned long spanned_pages; /* total size, including holes */ unsigned long present_pages; /* amount of memory (excluding holes) */ + unsigned long *suspend_flags; /* * rarely used fields: */ Index: linux-2.6.20-mm1/include/linux/page-flags.h =================================================================== --- linux-2.6.20-mm1.orig/include/linux/page-flags.h 2007-02-16 01:05:26.000000000 -0800 +++ linux-2.6.20-mm1/include/linux/page-flags.h 2007-02-16 01:16:45.000000000 -0800 @@ -82,13 +82,11 @@ #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ -#define PG_nosave 13 /* Used for system suspend/resume */ #define PG_compound 14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ #define PG_mappedtodisk 16 /* Has blocks allocated on-disk */ #define PG_reclaim 17 /* To be reclaimed asap */ -#define PG_nosave_free 18 /* Used for system suspend/resume */ #define PG_buddy 19 /* Page is free, on buddy lists */ #define PG_mlocked 20 /* Page is mlocked */ @@ -192,16 +190,6 @@ static inline void SetPageUptodate(struc #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback, \ &(page)->flags) -#define PageNosave(page) test_bit(PG_nosave, &(page)->flags) -#define SetPageNosave(page) set_bit(PG_nosave, &(page)->flags) -#define TestSetPageNosave(page) test_and_set_bit(PG_nosave, &(page)->flags) -#define ClearPageNosave(page) clear_bit(PG_nosave, &(page)->flags) -#define TestClearPageNosave(page) test_and_clear_bit(PG_nosave, &(page)->flags) - -#define PageNosaveFree(page) test_bit(PG_nosave_free, &(page)->flags) -#define SetPageNosaveFree(page) set_bit(PG_nosave_free, &(page)->flags) -#define ClearPageNosaveFree(page) clear_bit(PG_nosave_free, &(page)->flags) - #define PageBuddy(page) test_bit(PG_buddy, &(page)->flags) #define __SetPageBuddy(page) __set_bit(PG_buddy, &(page)->flags) #define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags) Index: linux-2.6.20-mm1/include/linux/suspend.h =================================================================== --- linux-2.6.20-mm1.orig/include/linux/suspend.h 2007-02-16 01:15:30.000000000 -0800 +++ linux-2.6.20-mm1/include/linux/suspend.h 2007-02-16 01:57:51.000000000 -0800 @@ -21,7 +22,6 @@ struct pbe { /* mm/page_alloc.c */ extern void drain_local_pages(void); -extern void mark_free_pages(struct zone *zone); #ifdef CONFIG_PM /* kernel/power/swsusp.c */ @@ -42,6 +42,18 @@ static inline int software_suspend(void) } #endif /* CONFIG_PM */ +#ifdef CONFIG_SOFTWARE_SUSPEND +int suspend_flags_init(struct zone *zone, unsigned long zone_size_pages); +void mark_free_pages(struct zone *zone); +#else +static inline int suspend_flags_init(struct zone *zone, unsigned long zone_size_pages) +{ + return 0; +} + +static inline void mark_free_pages(struct zone *zone) {} +#endif + void save_processor_state(void); void restore_processor_state(void); struct saved_context; Index: linux-2.6.20-mm1/mm/page_alloc.c =================================================================== --- linux-2.6.20-mm1.orig/mm/page_alloc.c 2007-02-16 01:22:09.000000000 -0800 +++ linux-2.6.20-mm1/mm/page_alloc.c 2007-02-16 01:40:39.000000000 -0800 @@ -767,40 +767,6 @@ static void __drain_pages(unsigned int c } #ifdef CONFIG_PM - -void mark_free_pages(struct zone *zone) -{ - unsigned long pfn, max_zone_pfn; - unsigned long flags; - int order; - struct list_head *curr; - - if (!zone->spanned_pages) - return; - - spin_lock_irqsave(&zone->lock, flags); - - max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages; - for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) - if (pfn_valid(pfn)) { - struct page *page = pfn_to_page(pfn); - - if (!PageNosave(page)) - ClearPageNosaveFree(page); - } - - for (order = MAX_ORDER - 1; order >= 0; --order) - list_for_each(curr, &zone->free_area[order].free_list) { - unsigned long i; - - pfn = page_to_pfn(list_entry(curr, struct page, lru)); - for (i = 0; i < (1UL << order); i++) - SetPageNosaveFree(pfn_to_page(pfn + i)); - } - - spin_unlock_irqrestore(&zone->lock, flags); -} - /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ @@ -2354,6 +2320,9 @@ __meminit int init_currently_empty_zone( ret = zone_wait_table_init(zone, size); if (ret) return ret; + ret = suspend_flags_init(zone, size); + if (ret) + return ret; pgdat->nr_zones = zone_idx(zone) + 1; zone->zone_start_pfn = zone_start_pfn; Index: linux-2.6.20-mm1/kernel/power/snapshot.c =================================================================== --- linux-2.6.20-mm1.orig/kernel/power/snapshot.c 2007-02-16 01:46:02.000000000 -0800 +++ linux-2.6.20-mm1/kernel/power/snapshot.c 2007-02-16 01:59:24.000000000 -0800 @@ -34,6 +34,126 @@ #include "power.h" +static inline int PageNosave(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + return test_bit(offset * 2, zone->suspend_flags); +} + +static inline void SetPageNosave(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + set_bit(offset * 2, zone->suspend_flags); +} + +static inline int TestSetPageNosave(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + return test_and_set_bit(offset * 2, zone->suspend_flags); +} + +static inline void ClearPageNosave(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + clear_bit(offset * 2, zone->suspend_flags); +} + +static inline int TestClearPageNosave(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + return test_and_clear_bit(offset * 2, zone->suspend_flags); +} + + +static inline int PageNosaveFree(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + return test_bit(offset * 2 + 1, zone->suspend_flags); +} + +static inline void SetPageNosaveFree(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + set_bit(offset * 2 + 1, zone->suspend_flags); +} + +static inline void ClearPageNosaveFree(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long offset = page_to_pfn(page) - zone->zone_start_pfn; + + clear_bit(offset * 2 + 1, zone->suspend_flags); +} + +int suspend_flags_init(struct zone *zone, unsigned long zone_size_pages) +{ + struct pglist_data *pgdat = zone->zone_pgdat; + size_t alloc_size; + + /* + * We need two bits per page in the zone. One for PageNosave and the other + * for PageNosaveFree. + */ + alloc_size = BITS_TO_LONGS(zone_size_pages * 2); + if (system_state == SYSTEM_BOOTING) { + zone->suspend_flags = (unsigned long *) + alloc_bootmem_node(pgdat, alloc_size); + } else + zone->suspend_flags = (unsigned long *)vmalloc(alloc_size); + if (!zone->suspend_flags) + return -ENOMEM; + + bitmap_zero(zone->suspend_flags, 2 * zone_size_pages); + return 0; +} + +void mark_free_pages(struct zone *zone) +{ + unsigned long pfn, max_zone_pfn; + unsigned long flags; + int order; + struct list_head *curr; + + if (!zone->spanned_pages) + return; + + spin_lock_irqsave(&zone->lock, flags); + + max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages; + for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + + if (!PageNosave(page)) + ClearPageNosaveFree(page); + } + + for (order = MAX_ORDER - 1; order >= 0; --order) + list_for_each(curr, &zone->free_area[order].free_list) { + unsigned long i; + + pfn = page_to_pfn(list_entry(curr, struct page, lru)); + for (i = 0; i < (1UL << order); i++) + SetPageNosaveFree(pfn_to_page(pfn + i)); + } + + spin_unlock_irqrestore(&zone->lock, flags); +} + /* List of PBEs needed for restoring the pages that were allocated before * the suspend and included in the suspend image, but have also been * allocated by the "resume" kernel, so their contents cannot be written -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 10:10 ` Christoph Lameter @ 2007-02-16 10:17 ` Peter Zijlstra 2007-02-16 11:04 ` Rafael J. Wysocki 0 siblings, 1 reply; 44+ messages in thread From: Peter Zijlstra @ 2007-02-16 10:17 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Martin Bligh, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel, Rafael J. Wysocki On Fri, 2007-02-16 at 02:10 -0800, Christoph Lameter wrote: > On Fri, 16 Feb 2007, Peter Zijlstra wrote: > > > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote: > > > > > The two swsusp bits can be removed: they're only needed at suspend/resume > > > time and can be replaced by an external data structure. > > > > I once had a talk with Rafael, and he said it would be possible to rid > > us of PG_nosave* with the now not so new bitmap code that is used to > > handle swsusp of highmem pages. > > Well we can just shift the stuff into the power subsystem I think. Like > this? Compiles but not tested. That would work, however as Andrew pointed out, this data is only ever used at suspend/resume time. I think we can postpone allocating this bitmap until then and free it afterwards. However I'm quite out of my depths here, so I'll leave more constructive comments to Rafael. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 10:17 ` Peter Zijlstra @ 2007-02-16 11:04 ` Rafael J. Wysocki 0 siblings, 0 replies; 44+ messages in thread From: Rafael J. Wysocki @ 2007-02-16 11:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Christoph Lameter, Andrew Morton, Martin Bligh, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel On Friday, 16 February 2007 11:17, Peter Zijlstra wrote: > On Fri, 2007-02-16 at 02:10 -0800, Christoph Lameter wrote: > > On Fri, 16 Feb 2007, Peter Zijlstra wrote: > > > > > On Thu, 2007-02-15 at 18:48 -0800, Andrew Morton wrote: > > > > > > > The two swsusp bits can be removed: they're only needed at suspend/resume > > > > time and can be replaced by an external data structure. > > > > > > I once had a talk with Rafael, and he said it would be possible to rid > > > us of PG_nosave* with the now not so new bitmap code that is used to > > > handle swsusp of highmem pages. > > > > Well we can just shift the stuff into the power subsystem I think. Like > > this? Compiles but not tested. > > That would work, however as Andrew pointed out, this data is only ever > used at suspend/resume time. I think we can postpone allocating this > bitmap until then and free it afterwards. > > However I'm quite out of my depths here, so I'll leave more constructive > comments to Rafael. The PageNosave bits may also used during the initialization. On x86_64 the arch code uses them to mark the pages that shouldn't be saved by swsusp. However, the PageNosaveFree bits can be allocated during the suspend, as they aren't needed before. Thus what I'd like to do would be to use the Christoph's approach to allocate the PageNosave bits on the architectures that need them (i386 doesn't, for example) and handle the rest using memory bitmaps in snapshot.c. Greetings, Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:40 ` Martin Bligh 2007-02-16 1:49 ` Andrew Morton @ 2007-02-16 2:16 ` Christoph Lameter 2007-02-16 3:17 ` Martin Bligh 2007-02-16 8:10 ` Peter Zijlstra 2 siblings, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 2:16 UTC (permalink / raw) To: Martin Bligh Cc: Andrew Morton, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Martin Bligh wrote: > Mine just created a locked list. If you stick them there, there's no > need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-) How would that work without a page flag? Without a flags there is no way of checking that a page is on a particular list. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 2:16 ` Christoph Lameter @ 2007-02-16 3:17 ` Martin Bligh 2007-02-16 3:29 ` Christoph Lameter 0 siblings, 1 reply; 44+ messages in thread From: Martin Bligh @ 2007-02-16 3:17 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel Christoph Lameter wrote: > On Thu, 15 Feb 2007, Martin Bligh wrote: > >> Mine just created a locked list. If you stick them there, there's no >> need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-) > > How would that work without a page flag? Without a flags there is no way > of checking that a page is on a particular list. Depends what contexts you need to access it from. If you know the state before and after, list_del and list_add work. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 3:17 ` Martin Bligh @ 2007-02-16 3:29 ` Christoph Lameter 0 siblings, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 3:29 UTC (permalink / raw) To: Martin Bligh Cc: Andrew Morton, linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 15 Feb 2007, Martin Bligh wrote: > > How would that work without a page flag? Without a flags there is no way of > > checking that a page is on a particular list. > > Depends what contexts you need to access it from. If you know the state > before and after, list_del and list_add work. What state before and after do you know? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:40 ` Martin Bligh 2007-02-16 1:49 ` Andrew Morton 2007-02-16 2:16 ` Christoph Lameter @ 2007-02-16 8:10 ` Peter Zijlstra 2 siblings, 0 replies; 44+ messages in thread From: Peter Zijlstra @ 2007-02-16 8:10 UTC (permalink / raw) To: Martin Bligh Cc: Andrew Morton, Christoph Lameter, linux-mm, Nick Piggin, KAMEZAWA Hiroyuki, Rik van Riel On Thu, 2007-02-15 at 17:40 -0800, Martin Bligh wrote: > Mine just created a locked list. If you stick them there, there's no > need for a page flag ... and we don't abuse the lru pointers AGAIN! ;-) > --- linux-2.6.17/include/linux/mm_inline.h 2006-06-17 > 18:49:35.000000000 -0 > 700 > +++ linux-2.6.17-mlock_lru/include/linux/mm_inline.h 2006-07-28 > 15:53:15.0000 > 00000 -0700 > > @@ -28,6 +27,20 @@ del_page_from_inactive_list(struct zone > } > > static inline void > +add_page_to_mlocked_list(struct zone *zone, struct page *page) > +{ > + list_add(&page->lru, &zone->mlocked_list); > + zone->nr_mlocked--; > +} > + > +static inline void > +del_page_from_mlocked_list(struct zone *zone, struct page *page) > +{ > + list_del(&page->lru); > + zone->nr_mlocked--; > +} > + > +static inline void > del_page_from_lru(struct zone *zone, struct page *page) > { > list_del(&page->lru); > diff -aurpN -X /home/mbligh/.diff.exclude > linux-2.6.17/include/linux/mmzone.h li > nux-2.6.17-mlock_lru/include/linux/mmzone.h > --- linux-2.6.17/include/linux/mmzone.h 2006-06-17 18:49:35.000000000 -0700 > +++ linux-2.6.17-mlock_lru/include/linux/mmzone.h 2006-07-28 > 15:49:05.0000 > 00000 -0700 > @@ -156,10 +156,12 @@ struct zone { > spinlock_t lru_lock; > struct list_head active_list; > struct list_head inactive_list; > + struct list_head mlocked_list; > unsigned long nr_scan_active; > unsigned long nr_scan_inactive; > unsigned long nr_active; > unsigned long nr_inactive; > + unsigned long nr_mlocked; > unsigned long pages_scanned; /* since last reclaim */ > int all_unreclaimable; /* All pages pinned */ > The problem with such an approach would be that it takes O(n) time to find that a given pages is part of the mlocked_list; so you'd still need some marker to optimise that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:13 ` Andrew Morton 2007-02-16 1:24 ` KAMEZAWA Hiroyuki 2007-02-16 1:40 ` Martin Bligh @ 2007-02-16 2:15 ` Christoph Lameter 2007-02-16 2:55 ` Christoph Lameter 3 siblings, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 2:15 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > Is it true that PageMlocked() pages are never on the LRU? If so, perhaps > we could overload the lru.next/prev on these pages to flag an mlocked page. Yes. We could even use the lru to build a list of mlocked pages but then certain optimizations with anonymous pages would no longer work. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 1:13 ` Andrew Morton ` (2 preceding siblings ...) 2007-02-16 2:15 ` Christoph Lameter @ 2007-02-16 2:55 ` Christoph Lameter 2007-02-16 5:02 ` Christoph Lameter 3 siblings, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 2:55 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Rik van Riel On Thu, 15 Feb 2007, Andrew Morton wrote: > It's nice and simple, but I think I'd prefer to wait for the existing mlock > changes to crash a bit less before we do this. Sigh. My optimizations must have done me in. Drop the last two patches and it will be fine. I am not sure what is going on there but things work right without the optimizations. avoid-putting-new-mlocked-anonymous-pages-on-lru.patch opportunistically-move-mlocked-pages-off-the-lru.patch -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] Remove unswappable anonymous pages off the LRU 2007-02-16 2:55 ` Christoph Lameter @ 2007-02-16 5:02 ` Christoph Lameter 0 siblings, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2007-02-16 5:02 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, Nick Piggin, Peter Zijlstra, KAMEZAWA Hiroyuki, Martin J. Bligh, Rik van Riel On Thu, 15 Feb 2007, Christoph Lameter wrote: > On Thu, 15 Feb 2007, Andrew Morton wrote: > > > It's nice and simple, but I think I'd prefer to wait for the existing mlock > > changes to crash a bit less before we do this. > > Sigh. My optimizations must have done me in. Drop the last two patches and > it will be fine. I am not sure what is going on there but things work > right without the optimizations. > > avoid-putting-new-mlocked-anonymous-pages-on-lru.patch > opportunistically-move-mlocked-pages-off-the-lru.patch > Would you put those two patches back? The problem is that in some circumstances a page may be freed that is mlocked (if one is marking a page as mlocked early). The page allocator will not touch the PG_mlocked bit and thus a newly allocated page may have PG_mlocked set. If we then try to put it on the lru then the VM_BUG_ONs are triggered. The following patch detects these conditions in the page allocator and does the proper checks and cleanup. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.20/include/linux/page-flags.h =================================================================== --- linux-2.6.20.orig/include/linux/page-flags.h 2007-02-15 20:42:42.000000000 -0800 +++ linux-2.6.20/include/linux/page-flags.h 2007-02-15 20:43:33.000000000 -0800 @@ -261,6 +261,7 @@ static inline void SetPageUptodate(struc #define PageMlocked(page) test_bit(PG_mlocked, &(page)->flags) #define SetPageMlocked(page) set_bit(PG_mlocked, &(page)->flags) #define ClearPageMlocked(page) clear_bit(PG_mlocked, &(page)->flags) +#define __ClearPageMlocked(page) __clear_bit(PG_mlocked, &(page)->flags) struct page; /* forward declaration */ Index: linux-2.6.20/mm/page_alloc.c =================================================================== --- linux-2.6.20.orig/mm/page_alloc.c 2007-02-15 20:42:42.000000000 -0800 +++ linux-2.6.20/mm/page_alloc.c 2007-02-15 20:55:23.000000000 -0800 @@ -203,6 +203,7 @@ static void bad_page(struct page *page) 1 << PG_slab | 1 << PG_swapcache | 1 << PG_writeback | + 1 << PG_mlocked | 1 << PG_buddy ); set_page_count(page, 0); reset_page_mapcount(page); @@ -442,6 +443,11 @@ static inline int free_pages_check(struc bad_page(page); if (PageDirty(page)) __ClearPageDirty(page); + if (PageMlocked(page)) { + /* Page is unused so no need to take the lru lock */ + __ClearPageMlocked(page); + dec_zone_page_state(page, NR_MLOCK); + } /* * For now, we report if PG_reserved was found set, but do not * clear it, and do not free the page. But we shall soon need @@ -588,6 +594,7 @@ static int prep_new_page(struct page *pa 1 << PG_swapcache | 1 << PG_writeback | 1 << PG_reserved | + 1 << PG_mlocked | 1 << PG_buddy )))) bad_page(page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2007-02-16 11:04 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-15 21:05 [RFC] Remove unswappable anonymous pages off the LRU Christoph Lameter 2007-02-15 22:31 ` Rik van Riel 2007-02-15 22:41 ` Christoph Lameter 2007-02-15 22:50 ` Rik van Riel 2007-02-15 22:53 ` Christoph Lameter 2007-02-15 23:19 ` Andrew Morton 2007-02-15 23:20 ` Lee Schermerhorn 2007-02-16 0:15 ` Andrew Morton 2007-02-16 1:13 ` Andrew Morton 2007-02-16 1:24 ` KAMEZAWA Hiroyuki 2007-02-16 1:40 ` Martin Bligh 2007-02-16 1:49 ` Andrew Morton 2007-02-16 2:21 ` Martin Bligh 2007-02-16 2:34 ` Christoph Lameter 2007-02-16 2:48 ` Andrew Morton 2007-02-16 2:50 ` Christoph Lameter 2007-02-16 3:18 ` Andrew Morton 2007-02-16 3:36 ` Christoph Lameter 2007-02-16 3:42 ` Andrew Morton 2007-02-16 3:50 ` Christoph Lameter 2007-02-16 4:02 ` Andrew Morton 2007-02-16 4:07 ` Christoph Lameter 2007-02-16 4:03 ` Andrew Morton 2007-02-16 4:14 ` Rik van Riel 2007-02-16 4:15 ` Christoph Lameter 2007-02-16 4:57 ` KAMEZAWA Hiroyuki 2007-02-16 5:16 ` Andrew Morton 2007-02-16 5:25 ` Christoph Lameter 2007-02-16 5:41 ` Andrew Morton 2007-02-16 5:19 ` Christoph Lameter 2007-02-16 4:24 ` Andrew Morton 2007-02-16 8:15 ` Peter Zijlstra 2007-02-16 9:11 ` Rafael J. Wysocki 2007-02-16 9:19 ` Peter Zijlstra 2007-02-16 10:10 ` Christoph Lameter 2007-02-16 10:17 ` Peter Zijlstra 2007-02-16 11:04 ` Rafael J. Wysocki 2007-02-16 2:16 ` Christoph Lameter 2007-02-16 3:17 ` Martin Bligh 2007-02-16 3:29 ` Christoph Lameter 2007-02-16 8:10 ` Peter Zijlstra 2007-02-16 2:15 ` Christoph Lameter 2007-02-16 2:55 ` Christoph Lameter 2007-02-16 5:02 ` Christoph Lameter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.