* quicklists confuse meminfo @ 2008-03-09 10:19 Thomas Gleixner 2008-03-09 10:26 ` Bart Van Assche ` (3 more replies) 0 siblings, 4 replies; 44+ messages in thread From: Thomas Gleixner @ 2008-03-09 10:19 UTC (permalink / raw) To: LKML Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Christoph Lameter, Bart Van Assche Bart reported http://bugzilla.kernel.org/show_bug.cgi?id=9991. He assumed a memory leak in 32bit kernels when he analyzed the output of /proc/meminfo. The leak is not a leak, it's an accounting bug. quicklists keep a large amount of pages which are accounted as used memory. It can be easily observed by watching /proc/meminfo on an idle system and then run # while true; do ls >/dev/null; done for a while. The amount of free memory decreases steadily to the point, where the quicklist limit kicks in. To verify this I added quicklists (patch below) to the /proc/meminfo output and the decrease of free memory is matching the increase of the memory which is kept in quicklists. I'm not sure how we should handle the quicklist accounting, but the current state of silently hiding the free memory in the quicklists is definitely not acceptable. Another strange observation about quicklists is the imbalance of the quicklists across CPUs. Running the above loop on a 2way machine I can observe that the quicklist pages are acuumulating on one CPU. Stopping and restarting the loop a couple of times can shift the accumulation from one to the other CPU. Thanks, tglx --- fs/proc/proc_misc.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) Index: linux-2.6/fs/proc/proc_misc.c =================================================================== --- linux-2.6.orig/fs/proc/proc_misc.c +++ linux-2.6/fs/proc/proc_misc.c @@ -49,6 +49,8 @@ #include <linux/crash_dump.h> #include <linux/pid_namespace.h> #include <linux/bootmem.h> +#include <linux/quicklist.h> + #include <asm/uaccess.h> #include <asm/pgtable.h> #include <asm/io.h> @@ -183,8 +185,11 @@ static int meminfo_read_proc(char *page, "Committed_AS: %8lu kB\n" "VmallocTotal: %8lu kB\n" "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", - K(i.totalram), + "VmallocChunk: %8lu kB\n" +#ifdef CONFIG_QUICKLIST + "QuickLists: %8lu kB\n" +#endif + ,K(i.totalram), K(i.freeram), K(i.bufferram), K(cached), @@ -215,6 +220,9 @@ static int meminfo_read_proc(char *page, (unsigned long)VMALLOC_TOTAL >> 10, vmi.used >> 10, vmi.largest_chunk >> 10 +#ifdef CONFIG_QUICKLIST + ,K(quicklist_total_size()) +#endif ); len += hugetlb_report_meminfo(page + len); ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 10:19 quicklists confuse meminfo Thomas Gleixner @ 2008-03-09 10:26 ` Bart Van Assche 2008-03-09 10:29 ` Andi Kleen ` (2 subsequent siblings) 3 siblings, 0 replies; 44+ messages in thread From: Bart Van Assche @ 2008-03-09 10:26 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Christoph Lameter On Sun, Mar 9, 2008 at 11:19 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > Bart reported http://bugzilla.kernel.org/show_bug.cgi?id=9991. ... > + "QuickLists: %8lu kB\n" Regarding the patch: maybe it's better to spell this entry "Quicklists" than "QuickLists", since quicklist is a single word and not two words ? Regarding the memory occupied by quicklists: why does the amount of memory occupied by quicklists increase over time ? Is there a way to free the memory occupied by quicklists, similar to the way the memory occupied by caches can be freed (echo 3 >/proc/sys/vm/drop_caches) ? Bart. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 10:19 quicklists confuse meminfo Thomas Gleixner 2008-03-09 10:26 ` Bart Van Assche @ 2008-03-09 10:29 ` Andi Kleen 2008-03-09 10:42 ` KOSAKI Motohiro 2008-03-09 11:14 ` Ingo Molnar 3 siblings, 0 replies; 44+ messages in thread From: Andi Kleen @ 2008-03-09 10:29 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Christoph Lameter, Bart Van Assche Thomas Gleixner <tglx@linutronix.de> writes: > Bart reported http://bugzilla.kernel.org/show_bug.cgi?id=9991. He > assumed a memory leak in 32bit kernels when he analyzed the output of > /proc/meminfo. > > The leak is not a leak, it's an accounting bug. quicklists keep a > large amount of pages which are accounted as used memory. There are various other subsystems which can cache substantial memory under the right circumstances. Do you want to add all of them to /proc/meminfo? I'm not sure that will scale long term. One more general possibility would be to integrate this with with the shrinker callbacks. Everyone who caches memory should have a shrinker. Perhaps that could be integrated with some reporting facility that adds a dynamic counter field that is displayed somewhere in /proc or /sys. -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 10:19 quicklists confuse meminfo Thomas Gleixner 2008-03-09 10:26 ` Bart Van Assche 2008-03-09 10:29 ` Andi Kleen @ 2008-03-09 10:42 ` KOSAKI Motohiro 2008-03-09 12:00 ` Thomas Gleixner 2008-03-09 11:14 ` Ingo Molnar 3 siblings, 1 reply; 44+ messages in thread From: KOSAKI Motohiro @ 2008-03-09 10:42 UTC (permalink / raw) To: Thomas Gleixner Cc: kosaki.motohiro, LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Christoph Lameter, Bart Van Assche Hi in general, I like this patch and I think this is very useful. but I don't like #ifdef ;) > +#ifdef CONFIG_QUICKLIST > + "QuickLists: %8lu kB\n" > +#endif > + ,K(i.totalram), > K(i.freeram), > K(i.bufferram), > K(cached), > @@ -215,6 +220,9 @@ static int meminfo_read_proc(char *page, > (unsigned long)VMALLOC_TOTAL >> 10, > vmi.used >> 10, > vmi.largest_chunk >> 10 > +#ifdef CONFIG_QUICKLIST > + ,K(quicklist_total_size()) > +#endif Do you dislike my following patch? - kosaki Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- fs/proc/proc_misc.c | 6 ++++-- include/linux/quicklist.h | 7 +++++++ 2 files changed, 11 insertions(+), 2 deletions(-) Index: b/fs/proc/proc_misc.c =================================================================== --- a/fs/proc/proc_misc.c 2008-03-09 16:04:39.000000000 +0900 +++ b/fs/proc/proc_misc.c 2008-03-09 19:47:47.000000000 +0900 @@ -182,7 +182,8 @@ static int meminfo_read_proc(char *page, "Committed_AS: %8lu kB\n" "VmallocTotal: %8lu kB\n" "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", + "VmallocChunk: %8lu kB\n" + "Quicklists: %8lu kB\n", K(i.totalram), K(i.freeram), K(i.bufferram), @@ -213,7 +214,8 @@ static int meminfo_read_proc(char *page, K(committed), (unsigned long)VMALLOC_TOTAL >> 10, vmi.used >> 10, - vmi.largest_chunk >> 10 + vmi.largest_chunk >> 10, + K(quicklist_total_size()) ); len += hugetlb_report_meminfo(page + len); Index: b/include/linux/quicklist.h =================================================================== --- a/include/linux/quicklist.h 2008-03-09 16:02:28.000000000 +0900 +++ b/include/linux/quicklist.h 2008-03-09 19:45:58.000000000 +0900 @@ -80,6 +80,13 @@ void quicklist_trim(int nr, void (*dtor) unsigned long quicklist_total_size(void); +#else + +static inline unsigned long quicklist_total_size(void) +{ + return 0; +} + #endif #endif /* LINUX_QUICKLIST_H */ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 10:42 ` KOSAKI Motohiro @ 2008-03-09 12:00 ` Thomas Gleixner 0 siblings, 0 replies; 44+ messages in thread From: Thomas Gleixner @ 2008-03-09 12:00 UTC (permalink / raw) To: KOSAKI Motohiro Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Christoph Lameter, Bart Van Assche On Sun, 9 Mar 2008, KOSAKI Motohiro wrote: > Hi > > in general, I like this patch and I think this is very useful. > but I don't like #ifdef ;) /me neither :) > > +#ifdef CONFIG_QUICKLIST > > + "QuickLists: %8lu kB\n" > > +#endif > > + ,K(i.totalram), > > K(i.freeram), > > K(i.bufferram), > > K(cached), > > @@ -215,6 +220,9 @@ static int meminfo_read_proc(char *page, > > (unsigned long)VMALLOC_TOTAL >> 10, > > vmi.used >> 10, > > vmi.largest_chunk >> 10 > > +#ifdef CONFIG_QUICKLIST > > + ,K(quicklist_total_size()) > > +#endif > > Do you dislike my following patch? I like it. I'm just not sure about the accounting of cached memory in general, though quicklists seem to be a pretty obvious one. Thanks, tglx ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 10:19 quicklists confuse meminfo Thomas Gleixner ` (2 preceding siblings ...) 2008-03-09 10:42 ` KOSAKI Motohiro @ 2008-03-09 11:14 ` Ingo Molnar 2008-03-09 11:56 ` Thomas Gleixner ` (2 more replies) 3 siblings, 3 replies; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 11:14 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * Thomas Gleixner <tglx@linutronix.de> wrote: > Bart reported http://bugzilla.kernel.org/show_bug.cgi?id=9991. He > assumed a memory leak in 32bit kernels when he analyzed the output of > /proc/meminfo. > > The leak is not a leak, it's an accounting bug. quicklists keep a > large amount of pages which are accounted as used memory. [...] > Another strange observation about quicklists is the imbalance of the > quicklists across CPUs. Running the above loop on a 2way machine I can > observe that the quicklist pages are acuumulating on one CPU. Stopping > and restarting the loop a couple of times can shift the accumulation > from one to the other CPU. hm. I think we should not let this much RAM hang around in a special-purpose allocator like quicklists. Shouldnt the quicklists be temporary in nature, and be trimmed much more agressively? in fact, we have a check_pgt_cache() call in cpu_idle(), which does: quicklist_trim(0, pgd_dtor, 25, 16); but it appears we dont do quicklist trimming anywhere else! So if a system has no idle time, the quicklist can grow unbounded, and that's a real memory leak IMO. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 11:14 ` Ingo Molnar @ 2008-03-09 11:56 ` Thomas Gleixner 2008-03-09 12:01 ` Ingo Molnar 2008-03-09 12:03 ` Johannes Weiner 2008-03-09 12:03 ` KOSAKI Motohiro 2 siblings, 1 reply; 44+ messages in thread From: Thomas Gleixner @ 2008-03-09 11:56 UTC (permalink / raw) To: Ingo Molnar Cc: LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche On Sun, 9 Mar 2008, Ingo Molnar wrote: > > * Thomas Gleixner <tglx@linutronix.de> wrote: > > > Bart reported http://bugzilla.kernel.org/show_bug.cgi?id=9991. He > > assumed a memory leak in 32bit kernels when he analyzed the output of > > /proc/meminfo. > > > > The leak is not a leak, it's an accounting bug. quicklists keep a > > large amount of pages which are accounted as used memory. > [...] > > Another strange observation about quicklists is the imbalance of the > > quicklists across CPUs. Running the above loop on a 2way machine I can > > observe that the quicklist pages are acuumulating on one CPU. Stopping > > and restarting the loop a couple of times can shift the accumulation > > from one to the other CPU. > > hm. I think we should not let this much RAM hang around in a > special-purpose allocator like quicklists. Shouldnt the quicklists be > temporary in nature, and be trimmed much more agressively? > > in fact, we have a check_pgt_cache() call in cpu_idle(), which does: > > quicklist_trim(0, pgd_dtor, 25, 16); > > but it appears we dont do quicklist trimming anywhere else! So if a > system has no idle time, the quicklist can grow unbounded, and that's a > real memory leak IMO. Right, also the quicklist_trim() in idle() is freeing at max 16 pages in one go. According to the quicklist_trim() code we keep up to (node_free_pages / 16) in the quicklist unconditionally, which seems rather odd as well. Thanks, tglx ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 11:56 ` Thomas Gleixner @ 2008-03-09 12:01 ` Ingo Molnar 2008-03-09 12:49 ` Andi Kleen 0 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 12:01 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * Thomas Gleixner <tglx@linutronix.de> wrote: > Right, also the quicklist_trim() in idle() is freeing at max 16 pages > in one go. According to the quicklist_trim() code we keep up to > (node_free_pages / 16) in the quicklist unconditionally, which seems > rather odd as well. i suspect the right approach would be to get rid of them altogether and to convert the code to RCU and plain old alloc/free instead. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:01 ` Ingo Molnar @ 2008-03-09 12:49 ` Andi Kleen 2008-03-10 15:51 ` Christoph Lameter 0 siblings, 1 reply; 44+ messages in thread From: Andi Kleen @ 2008-03-09 12:49 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche Ingo Molnar <mingo@elte.hu> writes: > * Thomas Gleixner <tglx@linutronix.de> wrote: > > > Right, also the quicklist_trim() in idle() is freeing at max 16 pages > > in one go. According to the quicklist_trim() code we keep up to > > (node_free_pages / 16) in the quicklist unconditionally, which seems > > rather odd as well. > > i suspect the right approach would be to get rid of them altogether and > to convert the code to RCU and plain old alloc/free instead. Are you sure RCU is even needed? AFAIK all delayed freeing on page table level is already handled by the standard free-after-flush mmu code. iirc quicklists were just to avoid rezeroing pages which are known to be zero at free time (in theory __GFP_ZERO should handle that at page_alloc.c level, but it doesn't) and to get a little private fast path for the page allocator (might actually predate the current page_alloc fast paths) -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:49 ` Andi Kleen @ 2008-03-10 15:51 ` Christoph Lameter 0 siblings, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2008-03-10 15:51 UTC (permalink / raw) To: Andi Kleen Cc: Ingo Molnar, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Bart Van Assche, William Lee Irwin III, Benjamin Herrenschmidt, David Miller On Sun, 9 Mar 2008, Andi Kleen wrote: > iirc quicklists were just to avoid rezeroing pages which are known to > be zero at free time (in theory __GFP_ZERO should handle that at > page_alloc.c level, but it doesn't) and to get a little private fast path > for the page allocator (might actually predate the current page_alloc > fast paths) Quicklists were introduced to increase performance of program termination and startup. The avoidance of zeroing is one effect. The other reason that made this a good optimization is the bad page allocator performance in general vs a simple LIFO list. The numbers that we saw on x86_64 were around a 90% reduction in overhead with the quicklists. The reason for the throttle is that termination and starting of large programs would have to zero large amounts of memory and go through the page allocator for all of this if the quicklists would be bounded to a fixed limit. Having a fraction of free memory allows preserving large amounts of page table pages for the next process that starts. Right now quicklists do not make much sense because the x86_64 portion was removed. IMHO The usefulness for i386 pgd/pud caching is negligible. The code came initially from IA64 arch code (I think it was first on sparc64 though). After it became available in the core it was used by various other arches. There have been a couple of people who wanted to continue work on quicklists (which made me focus on different things) but so far nothing has happened. Replacement of the i386 portion with alloc/free would be fairly straightforward I would think. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 11:14 ` Ingo Molnar 2008-03-09 11:56 ` Thomas Gleixner @ 2008-03-09 12:03 ` Johannes Weiner 2008-03-09 12:03 ` KOSAKI Motohiro 2 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2008-03-09 12:03 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche Hi, Ingo Molnar <mingo@elte.hu> writes: > hm. I think we should not let this much RAM hang around in a > special-purpose allocator like quicklists. Shouldnt the quicklists be > temporary in nature, and be trimmed much more agressively? > > in fact, we have a check_pgt_cache() call in cpu_idle(), which does: > > quicklist_trim(0, pgd_dtor, 25, 16); > > but it appears we dont do quicklist trimming anywhere else! So if a > system has no idle time, the quicklist can grow unbounded, and that's a > real memory leak IMO. It is also called from tlbu_finish_mmu(). Hannes ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 11:14 ` Ingo Molnar 2008-03-09 11:56 ` Thomas Gleixner 2008-03-09 12:03 ` Johannes Weiner @ 2008-03-09 12:03 ` KOSAKI Motohiro 2008-03-09 12:09 ` Ingo Molnar 2 siblings, 1 reply; 44+ messages in thread From: KOSAKI Motohiro @ 2008-03-09 12:03 UTC (permalink / raw) To: Ingo Molnar Cc: kosaki.motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche Hi > hm. I think we should not let this much RAM hang around in a > special-purpose allocator like quicklists. Shouldnt the quicklists be > temporary in nature, and be trimmed much more agressively? > > in fact, we have a check_pgt_cache() call in cpu_idle(), which does: > > quicklist_trim(0, pgd_dtor, 25, 16); > > but it appears we dont do quicklist trimming anywhere else! So if a > system has no idle time, the quicklist can grow unbounded, and that's a > real memory leak IMO. I test following method. 1. $ hackbench 100 process 1000 2. $ cat /proc/meminfo quicklist consume 1GB memory of 8GB total memory system. it seems too large cache ;) IMHO we need shrink pgtable cache mecanism. --------------------------------------- MemTotal: 7683328 kB MemFree: 4940672 kB Buffers: 2816 kB Cached: 23232 kB SwapCached: 72192 kB Active: 56832 kB Inactive: 77440 kB SwapTotal: 2031488 kB SwapFree: 1907776 kB Dirty: 704 kB Writeback: 0 kB AnonPages: 48384 kB Mapped: 20864 kB Slab: 1325824 kB SReclaimable: 11968 kB SUnreclaim: 1313856 kB PageTables: 13632 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 5873152 kB Committed_AS: 531008 kB VmallocTotal: 17592177655808 kB VmallocUsed: 28864 kB VmallocChunk: 17592177623104 kB Quicklists: 1194304 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 262144 kB ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:03 ` KOSAKI Motohiro @ 2008-03-09 12:09 ` Ingo Molnar 2008-03-09 12:34 ` Ingo Molnar 2008-03-09 12:47 ` KOSAKI Motohiro 0 siblings, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 12:09 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > but it appears we dont do quicklist trimming anywhere else! So if a > > system has no idle time, the quicklist can grow unbounded, and > > that's a real memory leak IMO. > > I test following method. > > 1. $ hackbench 100 process 1000 > 2. $ cat /proc/meminfo > > quicklist consume 1GB memory of 8GB total memory system. > it seems too large cache ;) > > IMHO we need shrink pgtable cache mecanism. ouch! Could you try the patch below? How large is the quicklist cache with this applied? Ingo ----------------------> Subject: x86: trim quicklist more agressively From: Ingo Molnar <mingo@elte.hu> Date: Sun Mar 09 12:59:41 CET 2008 trim the quicklists more agressively, up to 1024 pages at once. (which pretty much means we keep this special-purpose cache as small as possible) Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/mm/pgtable_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-x86.q/arch/x86/mm/pgtable_32.c =================================================================== --- linux-x86.q.orig/arch/x86/mm/pgtable_32.c +++ linux-x86.q/arch/x86/mm/pgtable_32.c @@ -358,7 +358,7 @@ void pgd_free(struct mm_struct *mm, pgd_ void check_pgt_cache(void) { - quicklist_trim(0, pgd_dtor, 25, 16); + quicklist_trim(0, pgd_dtor, 25, 1024); } void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:09 ` Ingo Molnar @ 2008-03-09 12:34 ` Ingo Molnar 2008-03-09 12:51 ` KOSAKI Motohiro ` (3 more replies) 2008-03-09 12:47 ` KOSAKI Motohiro 1 sibling, 4 replies; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 12:34 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * Ingo Molnar <mingo@elte.hu> wrote: > > IMHO we need shrink pgtable cache mecanism. > > ouch! Could you try the patch below? How large is the quicklist cache > with this applied? hm, Thomas pointed it out that this wont solve all the problems as quicklists have a built-in "preserve me" throttle (which is rather stupid). the right solution is to get rid of quicklists altogether (Thomas expects to have patches for that later today). If anyone implements a _sane_ general purpose allocator that can preserve constructed objects then i'm all for utilizing it in x86 too, but quicklists arent that mechanism ... Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:34 ` Ingo Molnar @ 2008-03-09 12:51 ` KOSAKI Motohiro 2008-03-09 13:20 ` Thomas Gleixner ` (2 subsequent siblings) 3 siblings, 0 replies; 44+ messages in thread From: KOSAKI Motohiro @ 2008-03-09 12:51 UTC (permalink / raw) To: Ingo Molnar Cc: kosaki.motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche > hm, Thomas pointed it out that this wont solve all the problems as > quicklists have a built-in "preserve me" throttle (which is rather > stupid). > > the right solution is to get rid of quicklists altogether (Thomas > expects to have patches for that later today). > > If anyone implements a _sane_ general purpose allocator that can > preserve constructed objects then i'm all for utilizing it in x86 too, > but quicklists arent that mechanism ... Apparently, my patch seems to have been too late ;) please forget it. - kosaki ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:34 ` Ingo Molnar 2008-03-09 12:51 ` KOSAKI Motohiro @ 2008-03-09 13:20 ` Thomas Gleixner 2008-03-09 18:46 ` Andrew Morton 2008-03-09 19:11 ` Arjan van de Ven 3 siblings, 0 replies; 44+ messages in thread From: Thomas Gleixner @ 2008-03-09 13:20 UTC (permalink / raw) To: Ingo Molnar Cc: KOSAKI Motohiro, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche On Sun, 9 Mar 2008, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > > IMHO we need shrink pgtable cache mecanism. > > > > ouch! Could you try the patch below? How large is the quicklist cache > > with this applied? > > hm, Thomas pointed it out that this wont solve all the problems as > quicklists have a built-in "preserve me" throttle (which is rather > stupid). There is also the imbalance across CPUs. I think I figured out what's going there as well. The allocation happens on one CPU (via page_alloc), but the tear down happens on the other CPU, which accumulates the pages in the quicklist. So the quicklist of the busy CPU is empty, while the one of the idle CPU goes up to the limit. When I pin the loop to one CPU then the quicklists are stable. Thanks, tglx ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:34 ` Ingo Molnar 2008-03-09 12:51 ` KOSAKI Motohiro 2008-03-09 13:20 ` Thomas Gleixner @ 2008-03-09 18:46 ` Andrew Morton 2008-03-09 20:21 ` Andi Kleen 2008-03-10 15:54 ` Christoph Lameter 2008-03-09 19:11 ` Arjan van de Ven 3 siblings, 2 replies; 44+ messages in thread From: Andrew Morton @ 2008-03-09 18:46 UTC (permalink / raw) To: Ingo Molnar Cc: KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Christoph Lameter, Bart Van Assche On Sun, 9 Mar 2008 13:34:32 +0100 Ingo Molnar <mingo@elte.hu> wrote: > the right solution is to get rid of quicklists altogether Yes, I think so. - They are pretty marginal from a performance POV (iirc) - They've been a relatively rich source of bugs - As I said when we merged them (under protest): Private object caches like this are just a bad idea - caches should be *shared*, because some other code path which wants a zeroed page wants a cache-warm one, not a cache-cold one from the allocator (iirc there was doubt over how cache-warm these pages are, however). Making __GFP_ZERO smarter/more efficient would be a preferable way of addressing any performance problems we have in there. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 18:46 ` Andrew Morton @ 2008-03-09 20:21 ` Andi Kleen 2008-03-10 15:54 ` Christoph Lameter 1 sibling, 0 replies; 44+ messages in thread From: Andi Kleen @ 2008-03-09 20:21 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Christoph Lameter, Bart Van Assche Andrew Morton <akpm@linux-foundation.org> writes: > On Sun, 9 Mar 2008 13:34:32 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > > the right solution is to get rid of quicklists altogether > > Yes, I think so. > > - They are pretty marginal from a performance POV (iirc) One general issue -- as noted again by Christoph Lameter recently -- is that the order 0 fast path in page_alloc.c isn't actually very fast. That is why people keep inventing their own... > - As I said when we merged them (under protest): Private object caches > like this are just a bad idea - caches should be *shared*, because some > other code path which wants a zeroed page wants a cache-warm one, not a > cache-cold one from the allocator (iirc there was doubt over how > cache-warm these pages are, however). > > Making __GFP_ZERO smarter/more efficient would be a preferable way of > addressing any performance problems we have in there. To do the same as quicklists you would need a __free_pages_zeroed() and separate buddy lists I think. Later is probably somewhat ugly. Or perhaps do it only for order 0? Or perhaps idle time zeroing should be reinvestigated on modern CPUs, but I'm always a little sceptical of that. -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 18:46 ` Andrew Morton 2008-03-09 20:21 ` Andi Kleen @ 2008-03-10 15:54 ` Christoph Lameter 2008-03-10 16:43 ` Andi Kleen 2008-03-11 4:07 ` Nick Piggin 1 sibling, 2 replies; 44+ messages in thread From: Christoph Lameter @ 2008-03-10 15:54 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche On Sun, 9 Mar 2008, Andrew Morton wrote: > - They are pretty marginal from a performance POV (iirc) Pretty significant in our experience. > Making __GFP_ZERO smarter/more efficient would be a preferable way of > addressing any performance problems we have in there. Looking at the page allocator "fastpath": The basic reaons that this was such a good optimization was that the page allocator is expensive to call. The hotpath gets more and more clogged with logic. Fixing the page allocator to be more efficient may be the right approach here. That could also include having a list of zeroed pages. Zeroed pages however will not address the issue of having initialized pgd (which seems to be what i386 needs). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 15:54 ` Christoph Lameter @ 2008-03-10 16:43 ` Andi Kleen 2008-03-10 17:19 ` Hugh Dickins 2008-03-11 4:07 ` Nick Piggin 1 sibling, 1 reply; 44+ messages in thread From: Andi Kleen @ 2008-03-10 16:43 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche Christoph Lameter <clameter@sgi.com> writes: > > Zeroed pages however will not address the issue of having initialized pgd > (which seems to be what i386 needs). pgd is tiny on i386 PAE (4 * 16 bytes). Are you sure reinitializing that is a serious issue? For i386 non PAE (and x86-64) it is just a normal 4k page. -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 16:43 ` Andi Kleen @ 2008-03-10 17:19 ` Hugh Dickins 2008-03-10 17:25 ` Andi Kleen 2008-03-10 17:31 ` Jeremy Fitzhardinge 0 siblings, 2 replies; 44+ messages in thread From: Hugh Dickins @ 2008-03-10 17:19 UTC (permalink / raw) To: Andi Kleen Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche, Jeremy Fitzhardinge On Mon, 10 Mar 2008, Andi Kleen wrote: > Christoph Lameter <clameter@sgi.com> writes: > > > > Zeroed pages however will not address the issue of having initialized pgd > > (which seems to be what i386 needs). > > pgd is tiny on i386 PAE (4 * 16 bytes). Are you sure reinitializing that > is a serious issue? ... It used to be tiny (32 aligned bytes), then 2.6.22's quicklist enlarged that to a whole (lowmem) page. I think we were all too busy with other stuff to protest loudly enough about that bloat. If the quicklists are going, it'd be good for PAE to go back to a kmem_cache of 32-byte entries as in 2.6.21 - I think Ingo's patch is still using a whole page there. Or have sl?b alignment changes, or virtualization issues (locking per underlying struct page?), made a kmem_cache awkward there now? Hugh ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 17:19 ` Hugh Dickins @ 2008-03-10 17:25 ` Andi Kleen 2008-03-10 17:31 ` Jeremy Fitzhardinge 1 sibling, 0 replies; 44+ messages in thread From: Andi Kleen @ 2008-03-10 17:25 UTC (permalink / raw) To: Hugh Dickins Cc: Andi Kleen, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche, Jeremy Fitzhardinge On Mon, Mar 10, 2008 at 05:19:32PM +0000, Hugh Dickins wrote: > On Mon, 10 Mar 2008, Andi Kleen wrote: > > Christoph Lameter <clameter@sgi.com> writes: > > > > > > Zeroed pages however will not address the issue of having initialized pgd > > > (which seems to be what i386 needs). > > > > pgd is tiny on i386 PAE (4 * 16 bytes). Are you sure reinitializing that > > is a serious issue? ... > > It used to be tiny (32 aligned bytes), then 2.6.22's quicklist enlarged > that to a whole (lowmem) page. I think we were all too busy with other > stuff to protest loudly enough about that bloat. Interesting; I missed. Hmm, that might have been for Xen or Vmware too (which like some structures be page aligned) But I agree it is bloat in general. -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 17:19 ` Hugh Dickins 2008-03-10 17:25 ` Andi Kleen @ 2008-03-10 17:31 ` Jeremy Fitzhardinge 2008-03-10 17:53 ` Andi Kleen 1 sibling, 1 reply; 44+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-10 17:31 UTC (permalink / raw) To: Hugh Dickins Cc: Andi Kleen, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche Hugh Dickins wrote: > On Mon, 10 Mar 2008, Andi Kleen wrote: > >> Christoph Lameter <clameter@sgi.com> writes: >> >>> Zeroed pages however will not address the issue of having initialized pgd >>> (which seems to be what i386 needs). >>> >> pgd is tiny on i386 PAE (4 * 16 bytes). Are you sure reinitializing that >> is a serious issue? ... >> > > It used to be tiny (32 aligned bytes), then 2.6.22's quicklist enlarged > that to a whole (lowmem) page. I think we were all too busy with other > stuff to protest loudly enough about that bloat. > Yes, I was surprised about the pgd page allocation. It's currently necessary for Xen (but I think I can fix that easily enough). But I was especially surprised it was imposed for everyone. > If the quicklists are going, it'd be good for PAE to go back to a > kmem_cache of 32-byte entries as in 2.6.21 - I think Ingo's patch is > still using a whole page there. > +1. We'd still need to maintain a list to link all the pgds together, but I think we can just allocate that out of the cache too (either the same object or a separate pgd list cache). > Or have sl?b alignment changes, or virtualization issues (locking > per underlying struct page?), made a kmem_cache awkward there now? > I think only Xen has a constraint. At the moment we rely on page-sized pgds for two things: 1. Xen marks the whole page as being of pgd-type, and so it can't have non-pgd contents (as the page would be RO, and any contents would be validated as pgd entries). However I can fix that by maintaining a separate per-cpu pgd page, and just copy the four entries over when cr3 is reloaded. This would move the Xen-specific requirements into the Xen code without affecting the rest of the kernel. 2. We still need to maintain a list of pgds, as I discussed above. So from my perspective there are no insoluble problems, and I'd fully support the transition. I've been working on unifying the pgalloc stuff, and the quicklist (i386) vs non-quicklist (x86-64) use has been a bit of a thorn in my side. Of course there'll still be the page-sized pgd vs non-page-sized pgd, but we just have to live with that. J ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 17:31 ` Jeremy Fitzhardinge @ 2008-03-10 17:53 ` Andi Kleen 2008-03-10 18:35 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 44+ messages in thread From: Andi Kleen @ 2008-03-10 17:53 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Hugh Dickins, Andi Kleen, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche > 1. Xen marks the whole page as being of pgd-type, and so it can't > have non-pgd contents (as the page would be RO, and any contents > would be validated as pgd entries). However I can fix that by That sounds more like a Xen bug. Why should it validate anything except the first four entries which are used by the hardware? I can imagine that false sharing would be slow because you would have unnecessary traps, but it shouldn't be a correctness issue. > maintaining a separate per-cpu pgd page, and just copy the four > entries over when cr3 is reloaded. This would move the > Xen-specific requirements into the Xen code without affecting the > rest of the kernel. x86-64 used to do that, but it turned out this breaks some shared cache optimizations on the P4 between SMT threads. So you might actually see user space performance regressions from it. -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 17:53 ` Andi Kleen @ 2008-03-10 18:35 ` Jeremy Fitzhardinge 2008-03-10 19:06 ` Andi Kleen 2008-03-10 20:54 ` H. Peter Anvin 0 siblings, 2 replies; 44+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-10 18:35 UTC (permalink / raw) To: Andi Kleen Cc: Hugh Dickins, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche Andi Kleen wrote: >> 1. Xen marks the whole page as being of pgd-type, and so it can't >> have non-pgd contents (as the page would be RO, and any contents >> would be validated as pgd entries). However I can fix that by >> > > That sounds more like a Xen bug. Why should it validate anything except > the first four entries which are used by the hardware? > > I can imagine that false sharing would be slow because you would > have unnecessary traps, but it shouldn't be a correctness issue. > Yeah, the most fundamental problem is that the whole page is RO, so even if Xen trapped and emulated, it still makes a very bad neighbour. I was also going to say that there's no reason why you couldn't pack multiple pgds into one page, but I think we can only specify the cr3 at page resolution anyway. >> maintaining a separate per-cpu pgd page, and just copy the four >> entries over when cr3 is reloaded. This would move the >> Xen-specific requirements into the Xen code without affecting the >> rest of the kernel. >> > > x86-64 used to do that, but it turned out this breaks some shared cache > optimizations on the P4 between SMT threads. So you might actually see > user space performance regressions from it. > Hm, given that Xen doesn't make any topology guarantees about vcpus, I don't think it would make much difference. I think it would only be observable if you actually pinned the vcpus to physical siblings. Besides we could easily maintain sibling info and look to see if we're about to reload cr3 to match the other thread(s), then just use the same pgd copy. J ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 18:35 ` Jeremy Fitzhardinge @ 2008-03-10 19:06 ` Andi Kleen 2008-03-10 20:54 ` H. Peter Anvin 1 sibling, 0 replies; 44+ messages in thread From: Andi Kleen @ 2008-03-10 19:06 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Hugh Dickins, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche On Mon, Mar 10, 2008 at 11:35:50AM -0700, Jeremy Fitzhardinge wrote: > Hm, given that Xen doesn't make any topology guarantees about vcpus, I > don't think it would make much difference. I think it would only be > observable if you actually pinned the vcpus to physical siblings. > Besides we could easily maintain sibling info and look to see if we're > about to reload cr3 to match the other thread(s), then just use the same > pgd copy. The optimization doesn't really require any topology information; you just should make sure that if you have a VM shared between different CPUs that it has the same CR3. The easiest way to enforce that is to always only use a single CR3 for a VM. -Andi ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 18:35 ` Jeremy Fitzhardinge 2008-03-10 19:06 ` Andi Kleen @ 2008-03-10 20:54 ` H. Peter Anvin 2008-03-10 21:26 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 44+ messages in thread From: H. Peter Anvin @ 2008-03-10 20:54 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Hugh Dickins, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche Jeremy Fitzhardinge wrote: > > Yeah, the most fundamental problem is that the whole page is RO, so even > if Xen trapped and emulated, it still makes a very bad neighbour. I was > also going to say that there's no reason why you couldn't pack multiple > pgds into one page, but I think we can only specify the cr3 at page > resolution anyway. > Wrong. With PAE you can specify it at any 32-byte boundary. -hpa ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 20:54 ` H. Peter Anvin @ 2008-03-10 21:26 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 44+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-10 21:26 UTC (permalink / raw) To: H. Peter Anvin Cc: Andi Kleen, Hugh Dickins, Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche H. Peter Anvin wrote: > Jeremy Fitzhardinge wrote: >> >> Yeah, the most fundamental problem is that the whole page is RO, so >> even if Xen trapped and emulated, it still makes a very bad >> neighbour. I was also going to say that there's no reason why you >> couldn't pack multiple pgds into one page, but I think we can only >> specify the cr3 at page resolution anyway. >> > > Wrong. With PAE you can specify it at any 32-byte boundary. Yes, the x86 lets you, but Xen doesn't - mostly because it reuses the lower 12 bits as upper bits of the pfn so that you can specify a pagetable base at >4G. J ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-10 15:54 ` Christoph Lameter 2008-03-10 16:43 ` Andi Kleen @ 2008-03-11 4:07 ` Nick Piggin 2008-03-21 12:52 ` Bart Van Assche 1 sibling, 1 reply; 44+ messages in thread From: Nick Piggin @ 2008-03-11 4:07 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Bart Van Assche On Tuesday 11 March 2008 02:54, Christoph Lameter wrote: > On Sun, 9 Mar 2008, Andrew Morton wrote: > > - They are pretty marginal from a performance POV (iirc) > > Pretty significant in our experience. > > > Making __GFP_ZERO smarter/more efficient would be a preferable way of > > addressing any performance problems we have in there. > > Looking at the page allocator "fastpath": The basic reaons that this was > such a good optimization was that the page allocator is expensive to call. > The hotpath gets more and more clogged with logic. Fixing the page > allocator to be more efficient may be the right approach here. That could > also include having a list of zeroed pages. This is insane. We add more and more of this NUMA and cpuset and anti-frag and page zeroing logic to the page allocator, and then decide that we don't actually need to obey any of those rules when we're running lat_proc. You will never be able to make the page allocator faster than a single list of pages. The reason is because we actually *want* some of these checks and heuristics in the page allocator. And I doubt a list of zeroed pages is the right approach. That's just adding more complexity for lat_proc AFAIKS. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-11 4:07 ` Nick Piggin @ 2008-03-21 12:52 ` Bart Van Assche 2008-03-21 14:45 ` Ingo Molnar 0 siblings, 1 reply; 44+ messages in thread From: Bart Van Assche @ 2008-03-21 12:52 UTC (permalink / raw) To: Nick Piggin Cc: Christoph Lameter, Andrew Morton, Ingo Molnar, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds On Tue, Mar 11, 2008 at 5:07 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > [ ... ] > > This is insane. We add more and more of this NUMA and cpuset and anti-frag > and page zeroing logic to the page allocator, and then decide that we > don't actually need to obey any of those rules when we're running lat_proc. > > You will never be able to make the page allocator faster than a single list > of pages. The reason is because we actually *want* some of these checks and > heuristics in the page allocator. > > And I doubt a list of zeroed pages is the right approach. That's just > adding more complexity for lat_proc AFAIKS. Is anyone currently working on this (http://bugzilla.kernel.org/show_bug.cgi?id=9991) ? I think this should either be fixed or documented as a known issue. Bart. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-21 12:52 ` Bart Van Assche @ 2008-03-21 14:45 ` Ingo Molnar 2008-03-26 7:45 ` Bart Van Assche 0 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2008-03-21 14:45 UTC (permalink / raw) To: Bart Van Assche Cc: Nick Piggin, Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds * Bart Van Assche <bart.vanassche@gmail.com> wrote: > Is anyone currently working on this > (http://bugzilla.kernel.org/show_bug.cgi?id=9991) ? I think this > should either be fixed or documented as a known issue. it should be fixed in latest -git (by commit 985a34bd75cc8c9) - can you still see it? Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-21 14:45 ` Ingo Molnar @ 2008-03-26 7:45 ` Bart Van Assche 2008-03-26 7:53 ` Andrew Morton 2008-03-26 8:13 ` Ingo Molnar 0 siblings, 2 replies; 44+ messages in thread From: Bart Van Assche @ 2008-03-26 7:45 UTC (permalink / raw) To: Ingo Molnar Cc: Nick Piggin, Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds On Fri, Mar 21, 2008 at 3:45 PM, Ingo Molnar <mingo@elte.hu> wrote: > > * Bart Van Assche <bart.vanassche@gmail.com> wrote: > > > Is anyone currently working on this > > (http://bugzilla.kernel.org/show_bug.cgi?id=9991) ? I think this > > should either be fixed or documented as a known issue. > > it should be fixed in latest -git (by commit 985a34bd75cc8c9) - can you > still see it? This issue is indeed fixed now, thanks (retested with kernel version 2.6.25-rc6-00333-ga4083c9-dirty). Where can I find the patch that fixed this issue ? A Google query for the commit ID returned only one result, and that was an URL that pointed to your e-mail. Bart. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-26 7:45 ` Bart Van Assche @ 2008-03-26 7:53 ` Andrew Morton 2008-03-26 8:13 ` Ingo Molnar 1 sibling, 0 replies; 44+ messages in thread From: Andrew Morton @ 2008-03-26 7:53 UTC (permalink / raw) To: Bart Van Assche Cc: Ingo Molnar, Nick Piggin, Christoph Lameter, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds On Wed, 26 Mar 2008 08:45:46 +0100 "Bart Van Assche" <bart.vanassche@gmail.com> wrote: > On Fri, Mar 21, 2008 at 3:45 PM, Ingo Molnar <mingo@elte.hu> wrote: > > > > * Bart Van Assche <bart.vanassche@gmail.com> wrote: > > > > > Is anyone currently working on this > > > (http://bugzilla.kernel.org/show_bug.cgi?id=9991) ? I think this > > > should either be fixed or documented as a known issue. > > > > it should be fixed in latest -git (by commit 985a34bd75cc8c9) - can you > > still see it? > > This issue is indeed fixed now, thanks (retested with kernel version > 2.6.25-rc6-00333-ga4083c9-dirty). Where can I find the patch that > fixed this issue ? A Google query for the commit ID returned only one > result, and that was an URL that pointed to your e-mail. > commit 985a34bd75cc8c96e43f00dcdda7c3fdb51a3026 Author: Thomas Gleixner <tglx@linutronix.de> Date: Sun Mar 9 13:14:37 2008 +0100 x86: remove quicklists quicklists cause a serious memory leak on 32-bit x86, as documented at: http://bugzilla.kernel.org/show_bug.cgi?id=9991 the reason is that the quicklist pool is a special-purpose cache that grows out of proportion. It is not accounted for anywhere and users have no way to even realize that it's the quicklists that are causing RAM usage spikes. It was supposed to be a relatively small pool, but as demonstrated by KOSAKI Motohiro, they can grow as large as: Quicklists: 1194304 kB given how much trouble this code has caused historically, and given that Andrew objected to its introduction on x86 (years ago), the best option at this point is to remove them. [ any performance benefits of caching constructed pgds should be implemented in a more generic way (possibly within the page allocator), while still allowing constructed pages to be allocated by other workloads. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f41c953..237fc12 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -66,9 +66,6 @@ config MMU config ZONE_DMA def_bool y -config QUICKLIST - def_bool X86_32 - config SBUS bool diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c index 73aba71..2f9e9af 100644 --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -342,12 +342,16 @@ static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp) pgd_t *pgd_alloc(struct mm_struct *mm) { - pgd_t *pgd = quicklist_alloc(0, GFP_KERNEL, pgd_ctor); + pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO); - mm->pgd = pgd; /* so that alloc_pd can use it */ + /* so that alloc_pd can use it */ + mm->pgd = pgd; + if (pgd) + pgd_ctor(pgd); if (pgd && !pgd_prepopulate_pmd(mm, pgd)) { - quicklist_free(0, pgd_dtor, pgd); + pgd_dtor(pgd); + free_page((unsigned long)pgd); pgd = NULL; } @@ -357,12 +361,8 @@ pgd_t *pgd_alloc(struct mm_struct *mm) void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); - quicklist_free(0, pgd_dtor, pgd); -} - -void check_pgt_cache(void) -{ - quicklist_trim(0, pgd_dtor, 25, 16); + pgd_dtor(pgd); + free_page((unsigned long)pgd); } void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) diff --git a/include/asm-x86/pgtable_32.h b/include/asm-x86/pgtable_32.h index a842c72..4e6a0fc 100644 --- a/include/asm-x86/pgtable_32.h +++ b/include/asm-x86/pgtable_32.h @@ -26,10 +26,9 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; -extern struct kmem_cache *pmd_cache; -void check_pgt_cache(void); -static inline void pgtable_cache_init(void) {} +static inline void pgtable_cache_init(void) { } +static inline void check_pgt_cache(void) { } void paging_init(void); ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-26 7:45 ` Bart Van Assche 2008-03-26 7:53 ` Andrew Morton @ 2008-03-26 8:13 ` Ingo Molnar 2008-03-26 10:37 ` Bart Van Assche 1 sibling, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2008-03-26 8:13 UTC (permalink / raw) To: Bart Van Assche Cc: Nick Piggin, Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds * Bart Van Assche <bart.vanassche@gmail.com> wrote: > On Fri, Mar 21, 2008 at 3:45 PM, Ingo Molnar <mingo@elte.hu> wrote: > > > > * Bart Van Assche <bart.vanassche@gmail.com> wrote: > > > > > Is anyone currently working on this > > > (http://bugzilla.kernel.org/show_bug.cgi?id=9991) ? I think this > > > should either be fixed or documented as a known issue. > > > > it should be fixed in latest -git (by commit 985a34bd75cc8c9) - can you > > still see it? > > This issue is indeed fixed now, thanks (retested with kernel version > 2.6.25-rc6-00333-ga4083c9-dirty). Where can I find the patch that > fixed this issue ? A Google query for the commit ID returned only one > result, and that was an URL that pointed to your e-mail. thanks a lot Bart for the persistent testing - this was a nasty one to track down. Find the standalone fix below. Ingo ----------------> commit 985a34bd75cc8c96e43f00dcdda7c3fdb51a3026 Author: Thomas Gleixner <tglx@linutronix.de> Date: Sun Mar 9 13:14:37 2008 +0100 x86: remove quicklists quicklists cause a serious memory leak on 32-bit x86, as documented at: http://bugzilla.kernel.org/show_bug.cgi?id=9991 the reason is that the quicklist pool is a special-purpose cache that grows out of proportion. It is not accounted for anywhere and users have no way to even realize that it's the quicklists that are causing RAM usage spikes. It was supposed to be a relatively small pool, but as demonstrated by KOSAKI Motohiro, they can grow as large as: Quicklists: 1194304 kB given how much trouble this code has caused historically, and given that Andrew objected to its introduction on x86 (years ago), the best option at this point is to remove them. [ any performance benefits of caching constructed pgds should be implemented in a more generic way (possibly within the page allocator), while still allowing constructed pages to be allocated by other workloads. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f41c953..237fc12 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -66,9 +66,6 @@ config MMU config ZONE_DMA def_bool y -config QUICKLIST - def_bool X86_32 - config SBUS bool diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c index 73aba71..2f9e9af 100644 --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -342,12 +342,16 @@ static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp) pgd_t *pgd_alloc(struct mm_struct *mm) { - pgd_t *pgd = quicklist_alloc(0, GFP_KERNEL, pgd_ctor); + pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO); - mm->pgd = pgd; /* so that alloc_pd can use it */ + /* so that alloc_pd can use it */ + mm->pgd = pgd; + if (pgd) + pgd_ctor(pgd); if (pgd && !pgd_prepopulate_pmd(mm, pgd)) { - quicklist_free(0, pgd_dtor, pgd); + pgd_dtor(pgd); + free_page((unsigned long)pgd); pgd = NULL; } @@ -357,12 +361,8 @@ pgd_t *pgd_alloc(struct mm_struct *mm) void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); - quicklist_free(0, pgd_dtor, pgd); -} - -void check_pgt_cache(void) -{ - quicklist_trim(0, pgd_dtor, 25, 16); + pgd_dtor(pgd); + free_page((unsigned long)pgd); } void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) diff --git a/include/asm-x86/pgtable_32.h b/include/asm-x86/pgtable_32.h index a842c72..4e6a0fc 100644 --- a/include/asm-x86/pgtable_32.h +++ b/include/asm-x86/pgtable_32.h @@ -26,10 +26,9 @@ struct mm_struct; struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; -extern struct kmem_cache *pmd_cache; -void check_pgt_cache(void); -static inline void pgtable_cache_init(void) {} +static inline void pgtable_cache_init(void) { } +static inline void check_pgt_cache(void) { } void paging_init(void); ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-26 8:13 ` Ingo Molnar @ 2008-03-26 10:37 ` Bart Van Assche 2008-03-26 16:34 ` Christoph Lameter 0 siblings, 1 reply; 44+ messages in thread From: Bart Van Assche @ 2008-03-26 10:37 UTC (permalink / raw) To: Ingo Molnar Cc: Nick Piggin, Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds On Wed, Mar 26, 2008 at 9:13 AM, Ingo Molnar <mingo@elte.hu> wrote: > > * Bart Van Assche <bart.vanassche@gmail.com> wrote: > > > This issue is indeed fixed now, thanks (retested with kernel version > > 2.6.25-rc6-00333-ga4083c9-dirty). Where can I find the patch that > > fixed this issue ? A Google query for the commit ID returned only one > > result, and that was an URL that pointed to your e-mail. > > thanks a lot Bart for the persistent testing - this was a nasty one to > track down. Find the standalone fix below. Thanks to everyone who worked on this. By the way, I have asked Albert Calahan, the procps maintainer, to fix the free/top/vmstat tools such that these take recently added /proc/meminfo fields into account and display correct values on recent 2.6 kernels. See also http://bugzilla.kernel.org/show_bug.cgi?id=9991 and http://marc.info/?l=linux-mm&m=120496901605830&w=2. Bart. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-26 10:37 ` Bart Van Assche @ 2008-03-26 16:34 ` Christoph Lameter 2008-03-27 9:48 ` Bart Van Assche 0 siblings, 1 reply; 44+ messages in thread From: Christoph Lameter @ 2008-03-26 16:34 UTC (permalink / raw) To: Bart Van Assche Cc: Ingo Molnar, Nick Piggin, Andrew Morton, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds On Wed, 26 Mar 2008, Bart Van Assche wrote: > Thanks to everyone who worked on this. By the way, I have asked Albert > Calahan, the procps maintainer, to fix the free/top/vmstat tools such > that these take recently added /proc/meminfo fields into account and > display correct values on recent 2.6 kernels. See also > http://bugzilla.kernel.org/show_bug.cgi?id=9991 and > http://marc.info/?l=linux-mm&m=120496901605830&w=2. Be aware that some distros have hacked their meminfo (proc and /sys/devices/system/node/node*/meminfo) output to be more compatible with recent 2.6 kernels but not updated to the new ZVC counter scheme. On those kernels (<2.6.17) counters may show weird values and the sum of the per node counters in sysfs may not yield the same amount shown in /proc/meminfo. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-26 16:34 ` Christoph Lameter @ 2008-03-27 9:48 ` Bart Van Assche 0 siblings, 0 replies; 44+ messages in thread From: Bart Van Assche @ 2008-03-27 9:48 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Nick Piggin, Andrew Morton, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Albert Cahalan On Wed, Mar 26, 2008 at 5:34 PM, Christoph Lameter <clameter@sgi.com> wrote: > On Wed, 26 Mar 2008, Bart Van Assche wrote: > > > Thanks to everyone who worked on this. By the way, I have asked Albert > > Calahan, the procps maintainer, to fix the free/top/vmstat tools such > > that these take recently added /proc/meminfo fields into account and > > display correct values on recent 2.6 kernels. See also > > http://bugzilla.kernel.org/show_bug.cgi?id=9991 and > > http://marc.info/?l=linux-mm&m=120496901605830&w=2. > > Be aware that some distros have hacked their meminfo (proc and > /sys/devices/system/node/node*/meminfo) output to be more compatible with > recent 2.6 kernels but not updated to the new ZVC counter scheme. On those > kernels (<2.6.17) counters may show weird values and the sum of the per > node counters in sysfs may not yield the same amount shown in > /proc/meminfo. In my opinion /proc/meminfo is part of the kernelspace - userspace API, and any Linux distributor that modifies /proc/meminfo is responsible for modifying the tools that rely on this information. Bart. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:34 ` Ingo Molnar ` (2 preceding siblings ...) 2008-03-09 18:46 ` Andrew Morton @ 2008-03-09 19:11 ` Arjan van de Ven 2008-03-09 19:25 ` Ingo Molnar 3 siblings, 1 reply; 44+ messages in thread From: Arjan van de Ven @ 2008-03-09 19:11 UTC (permalink / raw) To: Ingo Molnar Cc: KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche On Sun, 9 Mar 2008 13:34:32 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > IMHO we need shrink pgtable cache mecanism. > > > > ouch! Could you try the patch below? How large is the quicklist > > cache with this applied? > > hm, Thomas pointed it out that this wont solve all the problems as > quicklists have a built-in "preserve me" throttle (which is rather > stupid). > > the right solution is to get rid of quicklists altogether (Thomas > expects to have patches for that later today). careful with this; the quicklists aren't JUST for speed they are also there to make sure a page we free that is a pagetable, is not reused until we have finished flushing the tlbs on all the cpus that saw it. This is a really hard correctness requirement, and while I can see that quicklists are probably not the best way to achieve this, we can't just throw away the behavior ;( -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 19:11 ` Arjan van de Ven @ 2008-03-09 19:25 ` Ingo Molnar 2008-03-09 19:27 ` Ingo Molnar 2008-03-10 15:55 ` Christoph Lameter 0 siblings, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 19:25 UTC (permalink / raw) To: Arjan van de Ven Cc: KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * Arjan van de Ven <arjan@infradead.org> wrote: > > > ouch! Could you try the patch below? How large is the quicklist > > > cache with this applied? > > > > hm, Thomas pointed it out that this wont solve all the problems as > > quicklists have a built-in "preserve me" throttle (which is rather > > stupid). > > > > the right solution is to get rid of quicklists altogether (Thomas > > expects to have patches for that later today). > > careful with this; the quicklists aren't JUST for speed they are also > there to make sure a page we free that is a pagetable, is not reused > until we have finished flushing the tlbs on all the cpus that saw it. > This is a really hard correctness requirement, and while I can see > that quicklists are probably not the best way to achieve this, we > can't just throw away the behavior ;( no, that's not true anymore - and the current quicklists code doesnt do anything like that AFAICS. It used to be a lot more complex, but now it's just a thin wrapper around the page allocator. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 19:25 ` Ingo Molnar @ 2008-03-09 19:27 ` Ingo Molnar 2008-03-09 19:31 ` Ingo Molnar 2008-03-10 15:57 ` Christoph Lameter 2008-03-10 15:55 ` Christoph Lameter 1 sibling, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 19:27 UTC (permalink / raw) To: Arjan van de Ven Cc: KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * Ingo Molnar <mingo@elte.hu> wrote: > > careful with this; the quicklists aren't JUST for speed they are > > also there to make sure a page we free that is a pagetable, is not > > reused until we have finished flushing the tlbs on all the cpus that > > saw it. This is a really hard correctness requirement, and while I > > can see that quicklists are probably not the best way to achieve > > this, we can't just throw away the behavior ;( > > no, that's not true anymore - and the current quicklists code doesnt > do anything like that AFAICS. It used to be a lot more complex, but > now it's just a thin wrapper around the page allocator. i.e. the patch below should do the trick. (it's still work in progress but it seems to boot just fine) Ingo ------------> Subject: x86: patches/remove-quicklists.patch From: Ingo Molnar <mingo@elte.hu> Date: Sun Mar 09 20:04:32 CET 2008 --- arch/x86/mm/pgtable_32.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) Index: linux-x86.q/arch/x86/mm/pgtable_32.c =================================================================== --- linux-x86.q.orig/arch/x86/mm/pgtable_32.c +++ linux-x86.q/arch/x86/mm/pgtable_32.c @@ -10,7 +10,6 @@ #include <linux/pagemap.h> #include <linux/spinlock.h> #include <linux/module.h> -#include <linux/quicklist.h> #include <asm/system.h> #include <asm/pgtable.h> @@ -338,12 +337,16 @@ static void pgd_mop_up_pmds(struct mm_st pgd_t *pgd_alloc(struct mm_struct *mm) { - pgd_t *pgd = quicklist_alloc(0, GFP_KERNEL, pgd_ctor); + pgd_t *pgd; + + pgd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (pgd) + pgd_ctor(pgd); mm->pgd = pgd; /* so that alloc_pd can use it */ if (pgd && !pgd_prepopulate_pmd(mm, pgd)) { - quicklist_free(0, pgd_dtor, pgd); + free_page((unsigned long)pgd); pgd = NULL; } @@ -353,12 +356,12 @@ pgd_t *pgd_alloc(struct mm_struct *mm) void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); - quicklist_free(0, pgd_dtor, pgd); + pgd_dtor(pgd); + free_page((unsigned long)pgd); } void check_pgt_cache(void) { - quicklist_trim(0, pgd_dtor, 25, 16); } void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 19:27 ` Ingo Molnar @ 2008-03-09 19:31 ` Ingo Molnar 2008-03-10 15:57 ` Christoph Lameter 1 sibling, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2008-03-09 19:31 UTC (permalink / raw) To: Arjan van de Ven Cc: KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche * Ingo Molnar <mingo@elte.hu> wrote: > i.e. the patch below should do the trick. > > (it's still work in progress but it seems to boot just fine) updated patch below - Thomas found a bug in the alloc-failure cleanup path. I shouldnt hack late Sunday night :) Ingo ---------------> Subject: x86: patches/remove-quicklists.patch From: Ingo Molnar <mingo@elte.hu> Date: Sun Mar 09 20:04:32 CET 2008 Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/mm/pgtable_32.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) Index: linux-x86.q/arch/x86/mm/pgtable_32.c =================================================================== --- linux-x86.q.orig/arch/x86/mm/pgtable_32.c +++ linux-x86.q/arch/x86/mm/pgtable_32.c @@ -10,7 +10,6 @@ #include <linux/pagemap.h> #include <linux/spinlock.h> #include <linux/module.h> -#include <linux/quicklist.h> #include <asm/system.h> #include <asm/pgtable.h> @@ -338,12 +337,17 @@ static void pgd_mop_up_pmds(struct mm_st pgd_t *pgd_alloc(struct mm_struct *mm) { - pgd_t *pgd = quicklist_alloc(0, GFP_KERNEL, pgd_ctor); + pgd_t *pgd; + + pgd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (pgd) + pgd_ctor(pgd); mm->pgd = pgd; /* so that alloc_pd can use it */ if (pgd && !pgd_prepopulate_pmd(mm, pgd)) { - quicklist_free(0, pgd_dtor, pgd); + pgd_dtor(pgd); + free_page((unsigned long)pgd); pgd = NULL; } @@ -353,12 +357,12 @@ pgd_t *pgd_alloc(struct mm_struct *mm) void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); - quicklist_free(0, pgd_dtor, pgd); + pgd_dtor(pgd); + free_page((unsigned long)pgd); } void check_pgt_cache(void) { - quicklist_trim(0, pgd_dtor, 25, 16); } void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 19:27 ` Ingo Molnar 2008-03-09 19:31 ` Ingo Molnar @ 2008-03-10 15:57 ` Christoph Lameter 1 sibling, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2008-03-10 15:57 UTC (permalink / raw) To: Ingo Molnar Cc: Arjan van de Ven, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Bart Van Assche Reviewed-by: Christoph Lameter <clameter@sgi.com> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 19:25 ` Ingo Molnar 2008-03-09 19:27 ` Ingo Molnar @ 2008-03-10 15:55 ` Christoph Lameter 1 sibling, 0 replies; 44+ messages in thread From: Christoph Lameter @ 2008-03-10 15:55 UTC (permalink / raw) To: Ingo Molnar Cc: Arjan van de Ven, KOSAKI Motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Bart Van Assche On Sun, 9 Mar 2008, Ingo Molnar wrote: > > careful with this; the quicklists aren't JUST for speed they are also > > there to make sure a page we free that is a pagetable, is not reused > > until we have finished flushing the tlbs on all the cpus that saw it. > > This is a really hard correctness requirement, and while I can see > > that quicklists are probably not the best way to achieve this, we > > can't just throw away the behavior ;( > > no, that's not true anymore - and the current quicklists code doesnt do > anything like that AFAICS. It used to be a lot more complex, but now > it's just a thin wrapper around the page allocator. Sure it does that. It interacts with the TLB logic which is another bad thing as Linus has pointed out. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: quicklists confuse meminfo 2008-03-09 12:09 ` Ingo Molnar 2008-03-09 12:34 ` Ingo Molnar @ 2008-03-09 12:47 ` KOSAKI Motohiro 1 sibling, 0 replies; 44+ messages in thread From: KOSAKI Motohiro @ 2008-03-09 12:47 UTC (permalink / raw) To: Ingo Molnar Cc: kosaki.motohiro, Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton, Christoph Lameter, Bart Van Assche Hi > > IMHO we need shrink pgtable cache mecanism. > > ouch! Could you try the patch below? How large is the quicklist cache > with this applied? I porting to IA64, applied, and tested. but no change behavior. because quicklist.c::max_pages() formula is strange. my freememory is about 5GB, thus this function calculate 5GB/16 ~= 300MB, my machine has 8cpu, thus 300MB x 8 ~= 2.5GB consume at worst case. (because this variable is per cpu variable) IMHO FRACTION_OF_NODE_MEM(16) is too small. after change FRACTION_OF_NODE_MEM to 256, i got following result. --------------------------------------------------- MemTotal: 7683328 kB MemFree: 5904704 kB Buffers: 28672 kB Cached: 168384 kB SwapCached: 1344 kB Active: 173376 kB Inactive: 126016 kB SwapTotal: 2031488 kB SwapFree: 2030144 kB Dirty: 64 kB Writeback: 0 kB AnonPages: 101120 kB Mapped: 34496 kB Slab: 1332608 kB SReclaimable: 21824 kB SUnreclaim: 1310784 kB PageTables: 12480 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 5873152 kB Committed_AS: 383744 kB VmallocTotal: 17592177655808 kB VmallocUsed: 28864 kB VmallocChunk: 17592177623040 kB Quicklists: 61888 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 262144 kB ----------------------------------------------- the following patch is explain to my shrink plan and change to FRACTION_OF_NODE_MEM value. of cource, this is only base of discussion. I don't tested performance regression test yet. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- fs/drop_caches.c | 3 +++ include/asm-ia64/pgalloc.h | 9 ++++++++- include/linux/quicklist.h | 7 +++++++ mm/quicklist.c | 2 +- mm/vmscan.c | 8 ++++++++ 5 files changed, 27 insertions(+), 2 deletions(-) Index: b/fs/drop_caches.c =================================================================== --- a/fs/drop_caches.c 2008-03-09 16:04:14.000000000 +0900 +++ b/fs/drop_caches.c 2008-03-09 20:38:44.000000000 +0900 @@ -8,6 +8,7 @@ #include <linux/writeback.h> #include <linux/sysctl.h> #include <linux/gfp.h> +#include <asm/pgalloc.h> /* A global variable is a bit ugly, but it keeps the code simple */ int sysctl_drop_caches; @@ -63,6 +64,8 @@ int drop_caches_sysctl_handler(ctl_table drop_pagecache(); if (sysctl_drop_caches & 2) drop_slab(); + if (sysctl_drop_caches & 4) + drop_pgtable_cache(); } return 0; } Index: b/include/linux/quicklist.h =================================================================== --- a/include/linux/quicklist.h 2008-03-09 19:45:58.000000000 +0900 +++ b/include/linux/quicklist.h 2008-03-09 21:49:07.000000000 +0900 @@ -87,6 +87,13 @@ static inline unsigned long quicklist_to return 0; } +static inline void quicklist_trim(int nr, void (*dtor)(void *), + unsigned long min_pages, + unsigned long max_free) +{ +} + + #endif #endif /* LINUX_QUICKLIST_H */ Index: b/mm/quicklist.c =================================================================== --- a/mm/quicklist.c 2008-03-09 16:04:53.000000000 +0900 +++ b/mm/quicklist.c 2008-03-09 21:35:18.000000000 +0900 @@ -21,7 +21,7 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK]; -#define FRACTION_OF_NODE_MEM 16 +#define FRACTION_OF_NODE_MEM 256 static unsigned long max_pages(unsigned long min_pages) { Index: b/include/asm-ia64/pgalloc.h =================================================================== --- a/include/asm-ia64/pgalloc.h 2008-03-09 21:53:14.000000000 +0900 +++ b/include/asm-ia64/pgalloc.h 2008-03-09 21:53:53.000000000 +0900 @@ -114,9 +114,16 @@ static inline void pte_free_kernel(struc static inline void check_pgt_cache(void) { - quicklist_trim(0, NULL, 25, 16); + quicklist_trim(0, NULL, 25, 1024); } #define __pte_free_tlb(tlb, pte) pte_free((tlb)->mm, pte) +#define HAVE_ARCH_DROP_PGTABLE_CACHE +static inline void drop_pgtable_cache(void) +{ + quicklist_trim(0, NULL, 25, ULONG_MAX); +} + + #endif /* _ASM_IA64_PGALLOC_H */ Index: b/mm/vmscan.c =================================================================== --- a/mm/vmscan.c 2008-03-09 19:42:47.000000000 +0900 +++ b/mm/vmscan.c 2008-03-09 20:45:24.000000000 +0900 @@ -42,6 +42,7 @@ #include <asm/tlbflush.h> #include <asm/div64.h> +#include <asm/pgalloc.h> #include <linux/swapops.h> @@ -1362,6 +1363,7 @@ static unsigned long do_try_to_free_page * over limit cgroups */ if (scan_global_lru(sc)) { + drop_pgtable_cache(); shrink_slab(sc->nr_scanned, gfp_mask, lru_pages); if (reclaim_state) { nr_reclaimed += reclaim_state->reclaimed_slab; @@ -1589,6 +1591,7 @@ loop_again: if (!zone_watermark_ok(zone, order, 8*zone->pages_high, end_zone, 0)) nr_reclaimed += shrink_zone(priority, zone, &sc); + drop_pgtable_cache(); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages); @@ -1831,6 +1834,7 @@ unsigned long shrink_all_memory(unsigned nr_slab = global_page_state(NR_SLAB_RECLAIMABLE); /* If slab caches are huge, it's better to hit them first */ while (nr_slab >= lru_pages) { + drop_pgtable_cache(); reclaim_state.reclaimed_slab = 0; shrink_slab(nr_pages, sc.gfp_mask, lru_pages); if (!reclaim_state.reclaimed_slab) @@ -1868,6 +1872,7 @@ unsigned long shrink_all_memory(unsigned if (ret >= nr_pages) goto out; + drop_pgtable_cache(); reclaim_state.reclaimed_slab = 0; shrink_slab(sc.nr_scanned, sc.gfp_mask, count_lru_pages()); @@ -1880,6 +1885,7 @@ unsigned long shrink_all_memory(unsigned } } + drop_pgtable_cache(); /* * If ret = 0, we could not shrink LRUs, but there may be something * in slab caches @@ -2038,6 +2044,8 @@ static int __zone_reclaim(struct zone *z } while (priority >= 0 && nr_reclaimed < nr_pages); } + drop_pgtable_cache(); + slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE); if (slab_reclaimable > zone->min_slab_pages) { /* ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2008-03-27 9:49 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-03-09 10:19 quicklists confuse meminfo Thomas Gleixner 2008-03-09 10:26 ` Bart Van Assche 2008-03-09 10:29 ` Andi Kleen 2008-03-09 10:42 ` KOSAKI Motohiro 2008-03-09 12:00 ` Thomas Gleixner 2008-03-09 11:14 ` Ingo Molnar 2008-03-09 11:56 ` Thomas Gleixner 2008-03-09 12:01 ` Ingo Molnar 2008-03-09 12:49 ` Andi Kleen 2008-03-10 15:51 ` Christoph Lameter 2008-03-09 12:03 ` Johannes Weiner 2008-03-09 12:03 ` KOSAKI Motohiro 2008-03-09 12:09 ` Ingo Molnar 2008-03-09 12:34 ` Ingo Molnar 2008-03-09 12:51 ` KOSAKI Motohiro 2008-03-09 13:20 ` Thomas Gleixner 2008-03-09 18:46 ` Andrew Morton 2008-03-09 20:21 ` Andi Kleen 2008-03-10 15:54 ` Christoph Lameter 2008-03-10 16:43 ` Andi Kleen 2008-03-10 17:19 ` Hugh Dickins 2008-03-10 17:25 ` Andi Kleen 2008-03-10 17:31 ` Jeremy Fitzhardinge 2008-03-10 17:53 ` Andi Kleen 2008-03-10 18:35 ` Jeremy Fitzhardinge 2008-03-10 19:06 ` Andi Kleen 2008-03-10 20:54 ` H. Peter Anvin 2008-03-10 21:26 ` Jeremy Fitzhardinge 2008-03-11 4:07 ` Nick Piggin 2008-03-21 12:52 ` Bart Van Assche 2008-03-21 14:45 ` Ingo Molnar 2008-03-26 7:45 ` Bart Van Assche 2008-03-26 7:53 ` Andrew Morton 2008-03-26 8:13 ` Ingo Molnar 2008-03-26 10:37 ` Bart Van Assche 2008-03-26 16:34 ` Christoph Lameter 2008-03-27 9:48 ` Bart Van Assche 2008-03-09 19:11 ` Arjan van de Ven 2008-03-09 19:25 ` Ingo Molnar 2008-03-09 19:27 ` Ingo Molnar 2008-03-09 19:31 ` Ingo Molnar 2008-03-10 15:57 ` Christoph Lameter 2008-03-10 15:55 ` Christoph Lameter 2008-03-09 12:47 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).