* [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous @ 2011-05-30 13:13 Mel Gorman 2011-05-30 14:31 ` Andrea Arcangeli ` (4 more replies) 0 siblings, 5 replies; 63+ messages in thread From: Mel Gorman @ 2011-05-30 13:13 UTC (permalink / raw) To: akpm Cc: Ury Stankevich, KOSAKI Motohiro, Andrea Arcangeli, linux-kernel, linux-mm, stable Asynchronous compaction is used when promoting to huge pages. This is all very nice but if there are a number of processes in compacting memory, a large number of pages can be isolated. An "asynchronous" process can stall for long periods of time as a result with a user reporting that firefox can stall for 10s of seconds. This patch aborts asynchronous compaction if too many pages are isolated as it's better to fail a hugepage promotion than stall a process. If accepted, this should also be considered for 2.6.39-stable. It should also be considered for 2.6.38-stable but ideally [11bc82d6: mm: compaction: Use async migration for __GFP_NO_KSWAPD and enforce no writeback] would be applied to 2.6.38 before consideration. Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 32 ++++++++++++++++++++++++++------ 1 files changed, 26 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 021a296..331a2ee 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -240,11 +240,20 @@ static bool too_many_isolated(struct zone *zone) return isolated > (inactive + active) / 2; } +/* possible outcome of isolate_migratepages */ +typedef enum { + ISOLATE_ABORT, /* Abort compaction now */ + ISOLATE_NONE, /* No pages isolated, continue scanning */ + ISOLATE_SUCCESS, /* Pages isolated, migrate */ +} isolate_migrate_t; + /* * Isolate all pages that can be migrated from the block pointed to by * the migrate scanner within compact_control. + * + * Returns false if compaction should abort at this point due to congestion. */ -static unsigned long isolate_migratepages(struct zone *zone, +static isolate_migrate_t isolate_migratepages(struct zone *zone, struct compact_control *cc) { unsigned long low_pfn, end_pfn; @@ -261,7 +270,7 @@ static unsigned long isolate_migratepages(struct zone *zone, /* Do not cross the free scanner or scan within a memory hole */ if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) { cc->migrate_pfn = end_pfn; - return 0; + return ISOLATE_NONE; } /* @@ -270,10 +279,14 @@ static unsigned long isolate_migratepages(struct zone *zone, * delay for some time until fewer pages are isolated */ while (unlikely(too_many_isolated(zone))) { + /* async migration should just abort */ + if (!cc->sync) + return ISOLATE_ABORT; + congestion_wait(BLK_RW_ASYNC, HZ/10); if (fatal_signal_pending(current)) - return 0; + return ISOLATE_ABORT; } /* Time to isolate some pages for migration */ @@ -358,7 +371,7 @@ static unsigned long isolate_migratepages(struct zone *zone, trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated); - return cc->nr_migratepages; + return ISOLATE_SUCCESS; } /* @@ -522,9 +535,15 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) unsigned long nr_migrate, nr_remaining; int err; - if (!isolate_migratepages(zone, cc)) + switch (isolate_migratepages(zone, cc)) { + case ISOLATE_ABORT: + goto out; + case ISOLATE_NONE: continue; - + case ISOLATE_SUCCESS: + ; + } + nr_migrate = cc->nr_migratepages; err = migrate_pages(&cc->migratepages, compaction_alloc, (unsigned long)cc, false, @@ -547,6 +566,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) } +out: /* Release free pages and check accounting */ cc->nr_freepages -= release_freepages(&cc->freepages); VM_BUG_ON(cc->nr_freepages != 0); ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman @ 2011-05-30 14:31 ` Andrea Arcangeli 2011-05-30 15:37 ` Mel Gorman 2011-05-30 14:45 ` [stable] " Greg KH ` (3 subsequent siblings) 4 siblings, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-05-30 14:31 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable Hi Mel and everyone, On Mon, May 30, 2011 at 02:13:00PM +0100, Mel Gorman wrote: > Asynchronous compaction is used when promoting to huge pages. This is > all very nice but if there are a number of processes in compacting > memory, a large number of pages can be isolated. An "asynchronous" > process can stall for long periods of time as a result with a user > reporting that firefox can stall for 10s of seconds. This patch aborts > asynchronous compaction if too many pages are isolated as it's better to > fail a hugepage promotion than stall a process. > > If accepted, this should also be considered for 2.6.39-stable. It should > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > writeback] would be applied to 2.6.38 before consideration. Is this supposed to fix the stall with khugepaged in D state and other processes in D state? zoneinfo showed a nr_isolated_file = -1, I don't think that meant compaction had 4g pages isolated really considering it moves from -1,0, 1. So I'm unsure if this fix could be right if the problem is the hang with khugepaged in D state reported, so far that looked more like a bug with PREEMPT in the vmstat accounting of nr_isolated_file that trips in too_many_isolated of both vmscan.c and compaction.c with PREEMPT=y. Or are you fixing a different problem? Or how do you explain this -1 value out of nr_isolated_file? Clearly when that value goes to -1, compaction.c:too_many_isolated will hang, I think we should fix the -1 value before worrying about the rest... grep nr_isolated_file zoneinfo-khugepaged nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 nr_isolated_file 1 nr_isolated_file 4294967295 nr_isolated_file 0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 14:31 ` Andrea Arcangeli @ 2011-05-30 15:37 ` Mel Gorman 2011-05-30 16:55 ` Mel Gorman 0 siblings, 1 reply; 63+ messages in thread From: Mel Gorman @ 2011-05-30 15:37 UTC (permalink / raw) To: Andrea Arcangeli Cc: akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, May 30, 2011 at 04:31:09PM +0200, Andrea Arcangeli wrote: > Hi Mel and everyone, > > On Mon, May 30, 2011 at 02:13:00PM +0100, Mel Gorman wrote: > > Asynchronous compaction is used when promoting to huge pages. This is > > all very nice but if there are a number of processes in compacting > > memory, a large number of pages can be isolated. An "asynchronous" > > process can stall for long periods of time as a result with a user > > reporting that firefox can stall for 10s of seconds. This patch aborts > > asynchronous compaction if too many pages are isolated as it's better to > > fail a hugepage promotion than stall a process. > > > > If accepted, this should also be considered for 2.6.39-stable. It should > > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > > writeback] would be applied to 2.6.38 before consideration. > > Is this supposed to fix the stall with khugepaged in D state and other > processes in D state? > Other processes. khugepaged might be getting stuck in the same loop but I do not have a specific case in mind. > zoneinfo showed a nr_isolated_file = -1, I don't think that meant > compaction had 4g pages isolated really considering it moves from > -1,0, 1. So I'm unsure if this fix could be right if the problem is > the hang with khugepaged in D state reported, so far that looked more > like a bug with PREEMPT in the vmstat accounting of nr_isolated_file > that trips in too_many_isolated of both vmscan.c and compaction.c with > PREEMPT=y. Or are you fixing a different problem? > I'm not familiar with this problem. I either missed it or forgot about it entirely. I was considering only Ury's report whereby firefox was getting stalled for 10s of seconds in congestion_wait. It's possible the root cause was isolated counters being broken but I didn't pick up on it. > Or how do you explain this -1 value out of nr_isolated_file? Clearly > when that value goes to -1, compaction.c:too_many_isolated will hang, > I think we should fix the -1 value before worrying about the rest... > > grep nr_isolated_file zoneinfo-khugepaged > nr_isolated_file 1 > nr_isolated_file 4294967295 Can you point me at the thread that this file appears on and what the conditions were? If vmstat is going to -1, it is indeed a problem because it implies an imbalance in increments and decrements to the isolated counters. Even with that fixed though, this patch still makes sense as why would an asynchronous user of compaction stall on congestion_wait? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 15:37 ` Mel Gorman @ 2011-05-30 16:55 ` Mel Gorman 2011-05-30 17:53 ` Andrea Arcangeli 0 siblings, 1 reply; 63+ messages in thread From: Mel Gorman @ 2011-05-30 16:55 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, May 30, 2011 at 04:37:49PM +0100, Mel Gorman wrote: > > Or how do you explain this -1 value out of nr_isolated_file? Clearly > > when that value goes to -1, compaction.c:too_many_isolated will hang, > > I think we should fix the -1 value before worrying about the rest... > > > > grep nr_isolated_file zoneinfo-khugepaged > > nr_isolated_file 1 > > nr_isolated_file 4294967295 > > Can you point me at the thread that this file appears on and what the > conditions were? If vmstat is going to -1, it is indeed a problem > because it implies an imbalance in increments and decrements to the > isolated counters. Even with drift issues, -1 there should be "impossible". Assuming this is a zoneinfo file, that figure is based on global_page_state() which looks like static inline unsigned long global_page_state(enum zone_stat_item item) { long x = atomic_long_read(&vm_stat[item]); #ifdef CONFIG_SMP if (x < 0) x = 0; #endif return x; } So even if isolated counts were going negative for short periods of time, the returned value should be 0. As this is an inline returning unsigned long, and callers are using unsigned long, is there any possibility the "if (x < 0)" is being optimised out? If you aware of users reporting this problem (like the users in thread "iotop: khugepaged at 99.99% (2.6.38.3)"), do you know if they had a particular compiler in common? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 16:55 ` Mel Gorman @ 2011-05-30 17:53 ` Andrea Arcangeli 2011-05-31 12:16 ` Minchan Kim 2011-05-31 14:34 ` Mel Gorman 0 siblings, 2 replies; 63+ messages in thread From: Andrea Arcangeli @ 2011-05-30 17:53 UTC (permalink / raw) To: Mel Gorman Cc: Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, May 30, 2011 at 05:55:46PM +0100, Mel Gorman wrote: > Even with drift issues, -1 there should be "impossible". Assuming this > is a zoneinfo file, that figure is based on global_page_state() which > looks like The two cases reproducing this long hang in D state, had from SMP=n PREEMPT=y. Clearly not common config these days. Also it didn't seem apparent that any task was running in a code path that kept pages isolated. > unsigned long, and callers are using unsigned long, is there any > possibility the "if (x < 0)" is being optimised out? If you aware It was eliminated by cpp. > of users reporting this problem (like the users in thread "iotop: > khugepaged at 99.99% (2.6.38.3)"), do you know if they had a particular > compiler in common? I had no reason to worry about the compiler yet but that's always good idea to keep in mind. The thread were the bug is reported is the "iotop" one you mentioned, and there's a tarball attached to one of the last emails of the thread with the debug data I grepped. It was /proc/zoneinfo file yes. That's the file I asked when I noticed something had to be wrong with too_many_isolated and I expected either nr_isolated or nr_inactive going wrong, it turned out it was nr_isolated (apparently, I don't have full picture on the problem yet). I added you in CC to a few emails but you weren't in all replies. The debug data you can find on lkml in this email: Message-Id: <201105232005.56840.johannes.hirte@fem.tu-ilmenau.de>. The other relevant sysrq+t here http://pastebin.com/raw.php?i=VG28YRbi better save the latter (I did) as I'm worried it has a timeout on it. Your patch was for reports with CONFIG_SMP=y? I'd prefer to clear out this error before improving the too_many_isolated, in fact while reviewing this code I was not impressed by too_many_isolated. For vmscan.c if there's an huge nr_active* list and a tiny nr_inactive (like after a truncate of filebacked pages or munmap of anon memory) there's no reason to stall, it's better to go ahead and let it refile more active pages. The too_many_isolated in compaction.c looks a whole lot better than the vmscan.c one as that takes into account the active pages too... But I refrained to make any change in this area as I don't think the bug is in too_many_isolated itself. I noticed the count[] array is unsigned int, but it looks ok (especially for 32bit ;) because the isolation is limited. Both bugs were reported on 32bit x86 UP builds with PREEMPT=y. The stat accounting seem to use atomics on UP so irqs on off or PREEMPT=y/n shouldn't matter if the increment is 1 insn long (plus no irq code should ever mess with nr_isolated)... If it wasn't atomic and irqs or preempt aren't disabled it could be preempt. To avoid confusion: it's not proven that PREEMPT is related, it may be an accident both .config had it on. I'm also unsure why it moves from -1,0,1 I wouldn't expect a single page to be isolated like -1 pages to be isolated, it just looks weird... ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 17:53 ` Andrea Arcangeli @ 2011-05-31 12:16 ` Minchan Kim 2011-05-31 12:24 ` Andrea Arcangeli 2011-05-31 14:34 ` Mel Gorman 1 sibling, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-05-31 12:16 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable Hi Andrea, On Mon, May 30, 2011 at 07:53:34PM +0200, Andrea Arcangeli wrote: > On Mon, May 30, 2011 at 05:55:46PM +0100, Mel Gorman wrote: > > Even with drift issues, -1 there should be "impossible". Assuming this > > is a zoneinfo file, that figure is based on global_page_state() which > > looks like > > The two cases reproducing this long hang in D state, had from SMP=n > PREEMPT=y. Clearly not common config these days. Also it didn't seem > apparent that any task was running in a code path that kept pages > isolated. > > > unsigned long, and callers are using unsigned long, is there any > > possibility the "if (x < 0)" is being optimised out? If you aware > > It was eliminated by cpp. > > > of users reporting this problem (like the users in thread "iotop: > > khugepaged at 99.99% (2.6.38.3)"), do you know if they had a particular > > compiler in common? > > I had no reason to worry about the compiler yet but that's always good > idea to keep in mind. The thread were the bug is reported is the > "iotop" one you mentioned, and there's a tarball attached to one of > the last emails of the thread with the debug data I grepped. It was > /proc/zoneinfo file yes. That's the file I asked when I noticed > something had to be wrong with too_many_isolated and I expected either > nr_isolated or nr_inactive going wrong, it turned out it was > nr_isolated (apparently, I don't have full picture on the problem > yet). I added you in CC to a few emails but you weren't in all > replies. > > The debug data you can find on lkml in this email: Message-Id: > <201105232005.56840.johannes.hirte@fem.tu-ilmenau.de>. > > The other relevant sysrq+t here http://pastebin.com/raw.php?i=VG28YRbi > > better save the latter (I did) as I'm worried it has a timeout on it. > > Your patch was for reports with CONFIG_SMP=y? I'd prefer to clear out > this error before improving the too_many_isolated, in fact while > reviewing this code I was not impressed by too_many_isolated. For > vmscan.c if there's an huge nr_active* list and a tiny nr_inactive > (like after a truncate of filebacked pages or munmap of anon memory) > there's no reason to stall, it's better to go ahead and let it refile > more active pages. The too_many_isolated in compaction.c looks a whole > lot better than the vmscan.c one as that takes into account the active > pages too... But I refrained to make any change in this area as I > don't think the bug is in too_many_isolated itself. > > I noticed the count[] array is unsigned int, but it looks ok > (especially for 32bit ;) because the isolation is limited. > > Both bugs were reported on 32bit x86 UP builds with PREEMPT=y. The > stat accounting seem to use atomics on UP so irqs on off or > PREEMPT=y/n shouldn't matter if the increment is 1 insn long (plus no > irq code should ever mess with nr_isolated)... If it wasn't atomic and > irqs or preempt aren't disabled it could be preempt. To avoid > confusion: it's not proven that PREEMPT is related, it may be an > accident both .config had it on. I'm also unsure why it moves from > -1,0,1 I wouldn't expect a single page to be isolated like -1 pages to > be isolated, it just looks weird... I am not sure this is related to the problem you have seen. If he used hwpoison by madivse, it is possible. Anyway, we can see negative value by count mismatch in UP build. Let's fix it. >From 1d3ebce2e8aa79dcc912da16b7a8d0611b6f9f1a Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan.kim@gmail.com> Date: Tue, 31 May 2011 21:11:58 +0900 Subject: [PATCH] Fix page isolated count mismatch If migration is failed, normally we call putback_lru_pages which decreases NR_ISOLATE_[ANON|FILE]. It means we should increase NR_ISOLATE_[ANON|FILE] before calling putback_lru_pages. But soft_offline_page dosn't it. It can make NR_ISOLATE_[ANON|FILE] with negative value and in UP build, zone_page_state will say huge isolated pages so too_many_isolated functions be deceived completely. At last, some process stuck in D state as it expect while loop ending with congestion_wait. But it's never ending story. If it is right, it would be -stable stuff. Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> --- mm/memory-failure.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5c8f7e0..eac0ba5 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -52,6 +52,7 @@ #include <linux/swapops.h> #include <linux/hugetlb.h> #include <linux/memory_hotplug.h> +#include <linux/mm_inline.h> #include "internal.h" int sysctl_memory_failure_early_kill __read_mostly = 0; @@ -1468,7 +1469,8 @@ int soft_offline_page(struct page *page, int flags) put_page(page); if (!ret) { LIST_HEAD(pagelist); - + inc_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0, true); -- 1.7.0.4 -- Kind regards Minchan Kim ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 12:16 ` Minchan Kim @ 2011-05-31 12:24 ` Andrea Arcangeli 2011-05-31 13:33 ` Minchan Kim 0 siblings, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-05-31 12:24 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 09:16:20PM +0900, Minchan Kim wrote: > I am not sure this is related to the problem you have seen. > If he used hwpoison by madivse, it is possible. CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y # CONFIG_MEMORY_FAILURE is not set > Anyway, we can see negative value by count mismatch in UP build. > Let's fix it. Definitely let's fix it, but it's probably not related to this one. > > From 1d3ebce2e8aa79dcc912da16b7a8d0611b6f9f1a Mon Sep 17 00:00:00 2001 > From: Minchan Kim <minchan.kim@gmail.com> > Date: Tue, 31 May 2011 21:11:58 +0900 > Subject: [PATCH] Fix page isolated count mismatch > > If migration is failed, normally we call putback_lru_pages which > decreases NR_ISOLATE_[ANON|FILE]. > It means we should increase NR_ISOLATE_[ANON|FILE] before calling > putback_lru_pages. But soft_offline_page dosn't it. > > It can make NR_ISOLATE_[ANON|FILE] with negative value and in UP build, > zone_page_state will say huge isolated pages so too_many_isolated > functions be deceived completely. At last, some process stuck in D state > as it expect while loop ending with congestion_wait. > But it's never ending story. > > If it is right, it would be -stable stuff. > > Cc: Mel Gorman <mel@csn.ul.ie> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Signed-off-by: Minchan Kim <minchan.kim@gmail.com> > --- > mm/memory-failure.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5c8f7e0..eac0ba5 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -52,6 +52,7 @@ > #include <linux/swapops.h> > #include <linux/hugetlb.h> > #include <linux/memory_hotplug.h> > +#include <linux/mm_inline.h> > #include "internal.h" > > int sysctl_memory_failure_early_kill __read_mostly = 0; > @@ -1468,7 +1469,8 @@ int soft_offline_page(struct page *page, int flags) > put_page(page); > if (!ret) { > LIST_HEAD(pagelist); > - > + inc_zone_page_state(page, NR_ISOLATED_ANON + > + page_is_file_cache(page)); > list_add(&page->lru, &pagelist); > ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, > 0, true); Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Let's check all other migrate_pages callers too... ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 12:24 ` Andrea Arcangeli @ 2011-05-31 13:33 ` Minchan Kim 2011-05-31 14:14 ` Andrea Arcangeli 0 siblings, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-05-31 13:33 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 02:24:37PM +0200, Andrea Arcangeli wrote: > On Tue, May 31, 2011 at 09:16:20PM +0900, Minchan Kim wrote: > > I am not sure this is related to the problem you have seen. > > If he used hwpoison by madivse, it is possible. > > CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y > # CONFIG_MEMORY_FAILURE is not set > > > Anyway, we can see negative value by count mismatch in UP build. > > Let's fix it. > > Definitely let's fix it, but it's probably not related to this one. > > > > > From 1d3ebce2e8aa79dcc912da16b7a8d0611b6f9f1a Mon Sep 17 00:00:00 2001 > > From: Minchan Kim <minchan.kim@gmail.com> > > Date: Tue, 31 May 2011 21:11:58 +0900 > > Subject: [PATCH] Fix page isolated count mismatch > > > > If migration is failed, normally we call putback_lru_pages which > > decreases NR_ISOLATE_[ANON|FILE]. > > It means we should increase NR_ISOLATE_[ANON|FILE] before calling > > putback_lru_pages. But soft_offline_page dosn't it. > > > > It can make NR_ISOLATE_[ANON|FILE] with negative value and in UP build, > > zone_page_state will say huge isolated pages so too_many_isolated > > functions be deceived completely. At last, some process stuck in D state > > as it expect while loop ending with congestion_wait. > > But it's never ending story. > > > > If it is right, it would be -stable stuff. > > > > Cc: Mel Gorman <mel@csn.ul.ie> > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > Signed-off-by: Minchan Kim <minchan.kim@gmail.com> > > --- > > mm/memory-failure.c | 4 +++- > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index 5c8f7e0..eac0ba5 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -52,6 +52,7 @@ > > #include <linux/swapops.h> > > #include <linux/hugetlb.h> > > #include <linux/memory_hotplug.h> > > +#include <linux/mm_inline.h> > > #include "internal.h" > > > > int sysctl_memory_failure_early_kill __read_mostly = 0; > > @@ -1468,7 +1469,8 @@ int soft_offline_page(struct page *page, int flags) > > put_page(page); > > if (!ret) { > > LIST_HEAD(pagelist); > > - > > + inc_zone_page_state(page, NR_ISOLATED_ANON + > > + page_is_file_cache(page)); > > list_add(&page->lru, &pagelist); > > ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, > > 0, true); > > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Thanks, Andrea. > > Let's check all other migrate_pages callers too... I checked them before sending patch but I got failed to find strange things. :( Now I am checking the page's SwapBacked flag can be changed between before and after of migrate_pages so accounting of NR_ISOLATED_XX can make mistake. I am approaching the failure, too. Hmm. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 13:33 ` Minchan Kim @ 2011-05-31 14:14 ` Andrea Arcangeli 2011-05-31 14:37 ` Minchan Kim 2011-06-01 0:57 ` Mel Gorman 0 siblings, 2 replies; 63+ messages in thread From: Andrea Arcangeli @ 2011-05-31 14:14 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 10:33:40PM +0900, Minchan Kim wrote: > I checked them before sending patch but I got failed to find strange things. :( My review also doesn't show other bugs in migrate_pages callers like that one. > Now I am checking the page's SwapBacked flag can be changed > between before and after of migrate_pages so accounting of NR_ISOLATED_XX can > make mistake. I am approaching the failure, too. Hmm. When I checked that, I noticed the ClearPageSwapBacked in swapcache if radix insertion fails, but that happens before adding the page in the LRU so it shouldn't have a chance to be isolated. So far I only noticed an unsafe page_count in vmscan.c:isolate_lru_pages but that should at worst result in a invalid pointer dereference as random result from that page_count is not going to hurt and I think it's only a theoretical issue. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 14:14 ` Andrea Arcangeli @ 2011-05-31 14:37 ` Minchan Kim 2011-05-31 14:38 ` Minchan Kim 2011-06-01 0:57 ` Mel Gorman 1 sibling, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-05-31 14:37 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 04:14:02PM +0200, Andrea Arcangeli wrote: > On Tue, May 31, 2011 at 10:33:40PM +0900, Minchan Kim wrote: > > I checked them before sending patch but I got failed to find strange things. :( > > My review also doesn't show other bugs in migrate_pages callers like > that one. > > > Now I am checking the page's SwapBacked flag can be changed > > between before and after of migrate_pages so accounting of NR_ISOLATED_XX can > > make mistake. I am approaching the failure, too. Hmm. > > When I checked that, I noticed the ClearPageSwapBacked in swapcache if > radix insertion fails, but that happens before adding the page in the > LRU so it shouldn't have a chance to be isolated. True. > > So far I only noticed an unsafe page_count in > vmscan.c:isolate_lru_pages but that should at worst result in a > invalid pointer dereference as random result from that page_count is > not going to hurt and I think it's only a theoretical issue. Yes. You find a new BUG. It seems to be related to this problem but it should be solved although it's very rare case. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 14:37 ` Minchan Kim @ 2011-05-31 14:38 ` Minchan Kim 2011-06-02 18:23 ` Andrea Arcangeli 0 siblings, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-05-31 14:38 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 11:37:35PM +0900, Minchan Kim wrote: > On Tue, May 31, 2011 at 04:14:02PM +0200, Andrea Arcangeli wrote: > > On Tue, May 31, 2011 at 10:33:40PM +0900, Minchan Kim wrote: > > > I checked them before sending patch but I got failed to find strange things. :( > > > > My review also doesn't show other bugs in migrate_pages callers like > > that one. > > > > > Now I am checking the page's SwapBacked flag can be changed > > > between before and after of migrate_pages so accounting of NR_ISOLATED_XX can > > > make mistake. I am approaching the failure, too. Hmm. > > > > When I checked that, I noticed the ClearPageSwapBacked in swapcache if > > radix insertion fails, but that happens before adding the page in the > > LRU so it shouldn't have a chance to be isolated. > > True. > > > > > So far I only noticed an unsafe page_count in > > vmscan.c:isolate_lru_pages but that should at worst result in a > > invalid pointer dereference as random result from that page_count is > > not going to hurt and I think it's only a theoretical issue. > > > Yes. You find a new BUG. > It seems to be related to this problem but it should be solved although typo : It doesn't seem to be. - Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 14:38 ` Minchan Kim @ 2011-06-02 18:23 ` Andrea Arcangeli 2011-06-02 20:21 ` Minchan Kim 2011-06-02 23:02 ` Minchan Kim 0 siblings, 2 replies; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-02 18:23 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Tue, May 31, 2011 at 11:38:30PM +0900, Minchan Kim wrote: > > Yes. You find a new BUG. > > It seems to be related to this problem but it should be solved although > > typo : It doesn't seem to be. This should fix it, but I doubt it matters for this problem. === Subject: mm: no page_count without a page pin From: Andrea Arcangeli <aarcange@redhat.com> It's unsafe to run page_count during the physical pfn scan. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- diff --git a/mm/vmscan.c b/mm/vmscan.c index faa0a08..e41e78a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1124,8 +1124,18 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, nr_lumpy_dirty++; scan++; } else { - /* the page is freed already. */ - if (!page_count(cursor_page)) + /* + * We can't use page_count() as that + * requires compound_head and we don't + * have a pin on the page here. If a + * page is tail, we may or may not + * have isolated the head, so assume + * it's not free, it'd be tricky to + * track the head status without a + * page pin. + */ + if (!PageTail(cursor_page) && + !atomic_read(&cursor_page->_count)) continue; break; } ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 18:23 ` Andrea Arcangeli @ 2011-06-02 20:21 ` Minchan Kim 2011-06-02 20:59 ` Minchan Kim 2011-06-02 21:40 ` Andrea Arcangeli 2011-06-02 23:02 ` Minchan Kim 1 sibling, 2 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-02 20:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Thu, Jun 02, 2011 at 08:23:02PM +0200, Andrea Arcangeli wrote: > On Tue, May 31, 2011 at 11:38:30PM +0900, Minchan Kim wrote: > > > Yes. You find a new BUG. > > > It seems to be related to this problem but it should be solved although > > > > typo : It doesn't seem to be. > > This should fix it, but I doubt it matters for this problem. > > === > Subject: mm: no page_count without a page pin > > From: Andrea Arcangeli <aarcange@redhat.com> > > It's unsafe to run page_count during the physical pfn scan. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index faa0a08..e41e78a 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1124,8 +1124,18 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > nr_lumpy_dirty++; > scan++; > } else { > - /* the page is freed already. */ > - if (!page_count(cursor_page)) > + /* > + * We can't use page_count() as that > + * requires compound_head and we don't > + * have a pin on the page here. If a > + * page is tail, we may or may not > + * have isolated the head, so assume > + * it's not free, it'd be tricky to Isn't it rather aggressive? I think cursor page is likely to be PageTail rather than PageHead. Could we handle it simply with below code? get_page(cursor_page) /* The page is freed already */ if (1 == page_count(cursor_page)) { put_page(cursor_page) continue; } put_page(cursor_page); > + * track the head status without a > + * page pin. > + */ > + if (!PageTail(cursor_page) && > + !atomic_read(&cursor_page->_count)) > continue; > break; > } -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 20:21 ` Minchan Kim @ 2011-06-02 20:59 ` Minchan Kim 2011-06-02 22:03 ` Andrea Arcangeli 2011-06-02 21:40 ` Andrea Arcangeli 1 sibling, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-06-02 20:59 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 05:21:56AM +0900, Minchan Kim wrote: > On Thu, Jun 02, 2011 at 08:23:02PM +0200, Andrea Arcangeli wrote: > > On Tue, May 31, 2011 at 11:38:30PM +0900, Minchan Kim wrote: > > > > Yes. You find a new BUG. > > > > It seems to be related to this problem but it should be solved although > > > > > > typo : It doesn't seem to be. > > > > This should fix it, but I doubt it matters for this problem. > > > > === > > Subject: mm: no page_count without a page pin > > > > From: Andrea Arcangeli <aarcange@redhat.com> > > > > It's unsafe to run page_count during the physical pfn scan. > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > --- > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index faa0a08..e41e78a 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1124,8 +1124,18 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > > nr_lumpy_dirty++; > > scan++; > > } else { > > - /* the page is freed already. */ > > - if (!page_count(cursor_page)) > > + /* > > + * We can't use page_count() as that > > + * requires compound_head and we don't > > + * have a pin on the page here. If a > > + * page is tail, we may or may not > > + * have isolated the head, so assume > > + * it's not free, it'd be tricky to > > Isn't it rather aggressive? > I think cursor page is likely to be PageTail rather than PageHead. > Could we handle it simply with below code? > > get_page(cursor_page) > /* The page is freed already */ > if (1 == page_count(cursor_page)) { > put_page(cursor_page) > continue; > } > put_page(cursor_page); > Now that I look code more, it would meet VM_BUG_ON of get_page if the page is really freed. I think if we hold zone->lock to prevent prep_new_page racing, it would be okay. But it's rather overkill so I will add my sign to your patch if we don't have better idea until tomorrow. :) -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 20:59 ` Minchan Kim @ 2011-06-02 22:03 ` Andrea Arcangeli 0 siblings, 0 replies; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-02 22:03 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 05:59:13AM +0900, Minchan Kim wrote: > Now that I look code more, it would meet VM_BUG_ON of get_page if the page is really > freed. I think if we hold zone->lock to prevent prep_new_page racing, it would be okay. There would be problems with split_huge_page too, we can't even use get_page_unless_zero unless it's a lru page and we hold the lru_lock and that's an hot lock too. > But it's rather overkill so I will add my sign to your patch if we don't have better idea > until tomorrow. :) Things like compound_trans_head are made to protect against split_huge_page like in ksm, not exactly to get to the head page when the page is being freed, so it's a little tricky. If we could get to the head page safe starting from a tail page it'd solve some issues for memory-failure too, which is currently using compound_head unsafe too, but at least that's running after a catastrophic hardware failure so the safer the better but the little race is unlikely to ever be an issue for memory-failure (and it's same issue for hugetlbfs and slub order 3). ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 20:21 ` Minchan Kim 2011-06-02 20:59 ` Minchan Kim @ 2011-06-02 21:40 ` Andrea Arcangeli 2011-06-02 22:23 ` Minchan Kim 1 sibling, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-02 21:40 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm Hello Minchan, On Fri, Jun 03, 2011 at 05:21:56AM +0900, Minchan Kim wrote: > Isn't it rather aggressive? > I think cursor page is likely to be PageTail rather than PageHead. > Could we handle it simply with below code? It's not so likely, there is small percentage of compound pages that aren't THP compared to the rest that is either regular pagecache or anon regular or anon THP or regular shm. If it's THP chances are we isolated the head and it's useless to insist on more tail pages (at least for large page size like on x86). Plus we've compaction so insisting and screwing lru ordering isn't worth it, better to be permissive and abort... in fact I wouldn't dislike to remove the entire lumpy logic when COMPACTION_BUILD is true, but that alters the trace too... > get_page(cursor_page) > /* The page is freed already */ > if (1 == page_count(cursor_page)) { > put_page(cursor_page) > continue; > } > put_page(cursor_page); We can't call get_page on an tail page or we break split_huge_page, only an isolated lru can be boosted, if we take the lru_lock and we check the page is in lru, then we can isolate and pin it safely. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 21:40 ` Andrea Arcangeli @ 2011-06-02 22:23 ` Minchan Kim 2011-06-02 22:32 ` Andrea Arcangeli 0 siblings, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-06-02 22:23 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm Hi Andrea, On Fri, Jun 3, 2011 at 6:40 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > Hello Minchan, > > On Fri, Jun 03, 2011 at 05:21:56AM +0900, Minchan Kim wrote: >> Isn't it rather aggressive? >> I think cursor page is likely to be PageTail rather than PageHead. >> Could we handle it simply with below code? > > It's not so likely, there is small percentage of compound pages that > aren't THP compared to the rest that is either regular pagecache or > anon regular or anon THP or regular shm. If it's THP chances are we I mean we have more tail pages than head pages. So I think we are likely to meet tail pages. Of course, compared to all pages(page cache, anon and so on), compound pages would be very small percentage. > isolated the head and it's useless to insist on more tail pages (at > least for large page size like on x86). Plus we've compaction so I can't understand your point. Could you elaborate it? > insisting and screwing lru ordering isn't worth it, better to be > permissive and abort... in fact I wouldn't dislike to remove the > entire lumpy logic when COMPACTION_BUILD is true, but that alters the > trace too... AFAIK, it's final destination to go as compaction will not break lru ordering if my patch(inorder-putback) is merged. > >> get_page(cursor_page) >> /* The page is freed already */ >> if (1 == page_count(cursor_page)) { >> put_page(cursor_page) >> continue; >> } >> put_page(cursor_page); > > We can't call get_page on an tail page or we break split_huge_page, Why don't we call get_page on tail page if tail page isn't free? Maybe I need investigating split_huge_page. > only an isolated lru can be boosted, if we take the lru_lock and we > check the page is in lru, then we can isolate and pin it safely. > Thanks. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 22:23 ` Minchan Kim @ 2011-06-02 22:32 ` Andrea Arcangeli 2011-06-02 23:01 ` Minchan Kim 0 siblings, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-02 22:32 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 07:23:48AM +0900, Minchan Kim wrote: > I mean we have more tail pages than head pages. So I think we are likely to > meet tail pages. Of course, compared to all pages(page cache, anon and > so on), compound pages would be very small percentage. Yes that's my point, that being a small percentage it's no big deal to break the loop early. > > isolated the head and it's useless to insist on more tail pages (at > > least for large page size like on x86). Plus we've compaction so > > I can't understand your point. Could you elaborate it? What I meant is that if we already isolated the head page of the THP, we don't need to try to free the tail pages and breaking the loop early, will still give us a chance to free a whole 2m because we isolated the head page (it'll involve some work and swapping but if it was a compoundtranspage we're ok to break the loop and we're not making the logic any worse). Provided the PMD_SIZE is quite large like 2/4m... The only way this patch makes things worse is for slub order 3 in the process of being freed. But tail pages aren't generally free anyway so I doubt this really makes any difference plus the tail is getting cleared as soon as the page reaches the buddy so it's probably unnoticeable as this then makes a difference only during a race (plus the tail page can't be isolated, only head page can be part of lrus and only if they're THP). > > insisting and screwing lru ordering isn't worth it, better to be > > permissive and abort... in fact I wouldn't dislike to remove the > > entire lumpy logic when COMPACTION_BUILD is true, but that alters the > > trace too... > > AFAIK, it's final destination to go as compaction will not break lru > ordering if my patch(inorder-putback) is merged. Agreed. I like your patchset, sorry for not having reviewed it in detail yet but there were other issues popping up in the last few days. > >> get_page(cursor_page) > >> /* The page is freed already */ > >> if (1 == page_count(cursor_page)) { > >> put_page(cursor_page) > >> continue; > >> } > >> put_page(cursor_page); > > > > We can't call get_page on an tail page or we break split_huge_page, > > Why don't we call get_page on tail page if tail page isn't free? > Maybe I need investigating split_huge_page. Yes it's split_huge_page, only gup is allowed to increase the tail page because we're guaranteed while gup_fast does it, split_huge_page_refcount isn't running yet, because the pmd wasn't set as splitting and the irqs were disabled (or we'd be holding the page_table_lock for gup slow version after checking again the pmd wasn't splitting and so __split_huge_page_refcount will wait). Thanks, Andrea ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 22:32 ` Andrea Arcangeli @ 2011-06-02 23:01 ` Minchan Kim 2011-06-03 17:37 ` Andrea Arcangeli 2011-06-06 10:15 ` Mel Gorman 0 siblings, 2 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-02 23:01 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 3, 2011 at 7:32 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Fri, Jun 03, 2011 at 07:23:48AM +0900, Minchan Kim wrote: >> I mean we have more tail pages than head pages. So I think we are likely to >> meet tail pages. Of course, compared to all pages(page cache, anon and >> so on), compound pages would be very small percentage. > > Yes that's my point, that being a small percentage it's no big deal to > break the loop early. Indeed. > >> > isolated the head and it's useless to insist on more tail pages (at >> > least for large page size like on x86). Plus we've compaction so >> >> I can't understand your point. Could you elaborate it? > > What I meant is that if we already isolated the head page of the THP, > we don't need to try to free the tail pages and breaking the loop > early, will still give us a chance to free a whole 2m because we > isolated the head page (it'll involve some work and swapping but if it > was a compoundtranspage we're ok to break the loop and we're not > making the logic any worse). Provided the PMD_SIZE is quite large like > 2/4m... Do you want this? (it's almost pseudo-code) diff --git a/mm/vmscan.c b/mm/vmscan.c index 7a4469b..9d7609f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1017,7 +1017,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { struct page *page; unsigned long pfn; - unsigned long end_pfn; + unsigned long start_pfn, end_pfn; unsigned long page_pfn; int zone_id; @@ -1057,9 +1057,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, */ zone_id = page_zone_id(page); page_pfn = page_to_pfn(page); - pfn = page_pfn & ~((1 << order) - 1); + start_pfn = pfn = page_pfn & ~((1 << order) - 1); end_pfn = pfn + (1 << order); - for (; pfn < end_pfn; pfn++) { + while (pfn < end_pfn) { struct page *cursor_page; /* The target page is in the block, ignore it. */ @@ -1086,17 +1086,25 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, break; if (__isolate_lru_page(cursor_page, mode, file) == 0) { + int isolated_pages; list_move(&cursor_page->lru, dst); mem_cgroup_del_lru(cursor_page); - nr_taken += hpage_nr_pages(page); + isolated_pages = hpage_nr_pages(page); + nr_taken += isolated_pages; + /* if we isolated pages enough, let's break early */ + if (nr_taken > end_pfn - start_pfn) + break; + pfn += isolated_pages; nr_lumpy_taken++; if (PageDirty(cursor_page)) nr_lumpy_dirty++; scan++; } else { /* the page is freed already. */ - if (!page_count(cursor_page)) + if (!page_count(cursor_page)) { + pfn++; continue; + } break; } } > > The only way this patch makes things worse is for slub order 3 in the > process of being freed. But tail pages aren't generally free anyway so > I doubt this really makes any difference plus the tail is getting > cleared as soon as the page reaches the buddy so it's probably Okay. Considering getting clear PG_tail as soon as slub order 3 is freed, it would be very rare case. > unnoticeable as this then makes a difference only during a race (plus > the tail page can't be isolated, only head page can be part of lrus > and only if they're THP). > >> > insisting and screwing lru ordering isn't worth it, better to be >> > permissive and abort... in fact I wouldn't dislike to remove the >> > entire lumpy logic when COMPACTION_BUILD is true, but that alters the >> > trace too... >> >> AFAIK, it's final destination to go as compaction will not break lru >> ordering if my patch(inorder-putback) is merged. > > Agreed. I like your patchset, sorry for not having reviewed it in > detail yet but there were other issues popping up in the last few > days. No problem. it's urgent than mine. :) > >> >> get_page(cursor_page) >> >> /* The page is freed already */ >> >> if (1 == page_count(cursor_page)) { >> >> put_page(cursor_page) >> >> continue; >> >> } >> >> put_page(cursor_page); >> > >> > We can't call get_page on an tail page or we break split_huge_page, >> >> Why don't we call get_page on tail page if tail page isn't free? >> Maybe I need investigating split_huge_page. > > Yes it's split_huge_page, only gup is allowed to increase the tail > page because we're guaranteed while gup_fast does it, > split_huge_page_refcount isn't running yet, because the pmd wasn't > set as splitting and the irqs were disabled (or we'd be holding the > page_table_lock for gup slow version after checking again the pmd > wasn't splitting and so __split_huge_page_refcount will wait). Thanks. I will have a time to understand your point with reviewing split_huge_page and your this comment. You convinced me and made me think thing I didn't think about which are good points. Thanks, Andrea. > > Thanks, > Andrea > -- Kind regards, Minchan Kim ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 23:01 ` Minchan Kim @ 2011-06-03 17:37 ` Andrea Arcangeli 2011-06-03 18:07 ` Andrea Arcangeli 2011-06-06 10:15 ` Mel Gorman 1 sibling, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-03 17:37 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > Do you want this? (it's almost pseudo-code) Yes that's good idea so we at least take into account if we isolated something big, and it's pointless to insist wasting CPU on the tail pages and even trace a fail because of tail pages after it. I introduced a __page_count to increase readability. It's still hackish to work on subpages in vmscan.c but at least I added a comment and until we serialize destroy_compound_page vs compound_head, I guess there's no better way. I didn't attempt to add out of order serialization similar to what exists for split_huge_page vs compound_trans_head yet, as the page can be allocated or go away from under us, in split_huge_page vs compound_trans_head it's simpler because both callers are required to hold a pin on the page so the page can't go be reallocated and destroyed under it. === Subject: mm: no page_count without a page pin From: Andrea Arcangeli <aarcange@redhat.com> It's unsafe to run page_count during the physical pfn scan because compound_head could trip on a dangling pointer when reading page->first_page if the compound page is being freed by another CPU. Also properly take into account if we isolated a compound page during the scan and break the loop if we've isolated enoguh. Introduce __page_count to cleanup some atomic_read from &page->_count in common code to cleanup. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- arch/powerpc/mm/gup.c | 2 - arch/powerpc/platforms/512x/mpc512x_shared.c | 2 - arch/x86/mm/gup.c | 2 - fs/nilfs2/page.c | 2 - include/linux/mm.h | 13 +++++++--- mm/huge_memory.c | 4 +-- mm/internal.h | 2 - mm/page_alloc.c | 6 ++-- mm/swap.c | 4 +-- mm/vmscan.c | 33 ++++++++++++++++++++------- 10 files changed, 46 insertions(+), 24 deletions(-) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1047,7 +1047,7 @@ static unsigned long isolate_lru_pages(u for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { struct page *page; unsigned long pfn; - unsigned long end_pfn; + unsigned long start_pfn, end_pfn; unsigned long page_pfn; int zone_id; @@ -1087,9 +1087,9 @@ static unsigned long isolate_lru_pages(u */ zone_id = page_zone_id(page); page_pfn = page_to_pfn(page); - pfn = page_pfn & ~((1 << order) - 1); + start_pfn = page_pfn & ~((1 << order) - 1); end_pfn = pfn + (1 << order); - for (; pfn < end_pfn; pfn++) { + for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct page *cursor_page; /* The target page is in the block, ignore it. */ @@ -1116,16 +1116,33 @@ static unsigned long isolate_lru_pages(u break; if (__isolate_lru_page(cursor_page, mode, file) == 0) { + unsigned int isolated_pages; list_move(&cursor_page->lru, dst); mem_cgroup_del_lru(cursor_page); - nr_taken += hpage_nr_pages(page); - nr_lumpy_taken++; + isolated_pages = hpage_nr_pages(page); + nr_taken += isolated_pages; + nr_lumpy_taken += isolated_pages; if (PageDirty(cursor_page)) - nr_lumpy_dirty++; + nr_lumpy_dirty += isolated_pages; scan++; + pfn += isolated_pages-1; + VM_BUG_ON(!isolated_pages); + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); } else { - /* the page is freed already. */ - if (!page_count(cursor_page)) + /* + * Check if the page is freed already. + * + * We can't use page_count() as that + * requires compound_head and we don't + * have a pin on the page here. If a + * page is tail, we may or may not + * have isolated the head, so assume + * it's not free, it'd be tricky to + * track the head status without a + * page pin. + */ + if (!PageTail(cursor_page) && + !__page_count(&cursor_page)) continue; break; } --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -271,7 +271,7 @@ struct inode; */ static inline int put_page_testzero(struct page *page) { - VM_BUG_ON(atomic_read(&page->_count) == 0); + VM_BUG_ON(__page_count(page) == 0); return atomic_dec_and_test(&page->_count); } @@ -355,9 +355,14 @@ static inline struct page *compound_head return page; } +static inline int __page_count(struct page *page) +{ + return atomic_read(&page->_count); +} + static inline int page_count(struct page *page) { - return atomic_read(&compound_head(page)->_count); + return __page_count(compound_head(page)); } static inline void get_page(struct page *page) @@ -370,7 +375,7 @@ static inline void get_page(struct page * bugcheck only verifies that the page->_count isn't * negative. */ - VM_BUG_ON(atomic_read(&page->_count) < !PageTail(page)); + VM_BUG_ON(__page_count(page) < !PageTail(page)); atomic_inc(&page->_count); /* * Getting a tail page will elevate both the head and tail @@ -382,7 +387,7 @@ static inline void get_page(struct page * __split_huge_page_refcount can't run under * get_page(). */ - VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0); + VM_BUG_ON(__page_count(page->first_page) <= 0); atomic_inc(&page->first_page->_count); } } --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1203,10 +1203,10 @@ static void __split_huge_page_refcount(s struct page *page_tail = page + i; /* tail_page->_count cannot change */ - atomic_sub(atomic_read(&page_tail->_count), &page->_count); + atomic_sub(__page_count(page_tail), &page->_count); BUG_ON(page_count(page) <= 0); atomic_add(page_mapcount(page) + 1, &page_tail->_count); - BUG_ON(atomic_read(&page_tail->_count) <= 0); + BUG_ON(__page_count(page_tail) <= 0); /* after clearing PageTail the gup refcount can be released */ smp_mb(); --- a/arch/powerpc/mm/gup.c +++ b/arch/powerpc/mm/gup.c @@ -22,7 +22,7 @@ static inline void get_huge_page_tail(st * __split_huge_page_refcount() cannot run * from under us. */ - VM_BUG_ON(atomic_read(&page->_count) < 0); + VM_BUG_ON(__page_count(page) < 0); atomic_inc(&page->_count); } --- a/arch/powerpc/platforms/512x/mpc512x_shared.c +++ b/arch/powerpc/platforms/512x/mpc512x_shared.c @@ -200,7 +200,7 @@ static inline void mpc512x_free_bootmem( { __ClearPageReserved(page); BUG_ON(PageTail(page)); - BUG_ON(atomic_read(&page->_count) > 1); + BUG_ON(__page_count(page) > 1); atomic_set(&page->_count, 1); __free_page(page); totalram_pages++; --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -114,7 +114,7 @@ static inline void get_huge_page_tail(st * __split_huge_page_refcount() cannot run * from under us. */ - VM_BUG_ON(atomic_read(&page->_count) < 0); + VM_BUG_ON(__page_count(page) < 0); atomic_inc(&page->_count); } --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -181,7 +181,7 @@ void nilfs_page_bug(struct page *page) printk(KERN_CRIT "NILFS_PAGE_BUG(%p): cnt=%d index#=%llu flags=0x%lx " "mapping=%p ino=%lu\n", - page, atomic_read(&page->_count), + page, __page_count(page), (unsigned long long)page->index, page->flags, m, ino); if (page_has_buffers(page)) { --- a/mm/internal.h +++ b/mm/internal.h @@ -28,7 +28,7 @@ static inline void set_page_count(struct static inline void set_page_refcounted(struct page *page) { VM_BUG_ON(PageTail(page)); - VM_BUG_ON(atomic_read(&page->_count)); + VM_BUG_ON(__page_count(page)); set_page_count(page, 1); } --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -568,7 +568,7 @@ static inline int free_pages_check(struc { if (unlikely(page_mapcount(page) | (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | + (__page_count(page) != 0) | (page->flags & PAGE_FLAGS_CHECK_AT_FREE) | (mem_cgroup_bad_page_check(page)))) { bad_page(page); @@ -758,7 +758,7 @@ static inline int check_new_page(struct { if (unlikely(page_mapcount(page) | (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | + (__page_count(page) != 0) | (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page)))) { bad_page(page); @@ -5739,7 +5739,7 @@ void dump_page(struct page *page) { printk(KERN_ALERT "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n", - page, atomic_read(&page->_count), page_mapcount(page), + page, __page_count(page), page_mapcount(page), page->mapping, page->index); dump_page_flags(page->flags); mem_cgroup_print_bad_page(page); --- a/mm/swap.c +++ b/mm/swap.c @@ -128,9 +128,9 @@ static void put_compound_page(struct pag if (put_page_testzero(page_head)) VM_BUG_ON(1); /* __split_huge_page_refcount will wait now */ - VM_BUG_ON(atomic_read(&page->_count) <= 0); + VM_BUG_ON(__page_count(page) <= 0); atomic_dec(&page->_count); - VM_BUG_ON(atomic_read(&page_head->_count) <= 0); + VM_BUG_ON(__page_count(page_head) <= 0); compound_unlock_irqrestore(page_head, flags); if (put_page_testzero(page_head)) { if (PageHead(page_head)) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 17:37 ` Andrea Arcangeli @ 2011-06-03 18:07 ` Andrea Arcangeli 2011-06-04 7:59 ` Minchan Kim 2011-06-06 10:32 ` Mel Gorman 0 siblings, 2 replies; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-03 18:07 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 07:37:07PM +0200, Andrea Arcangeli wrote: > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > Do you want this? (it's almost pseudo-code) > > Yes that's good idea so we at least take into account if we isolated > something big, and it's pointless to insist wasting CPU on the tail > pages and even trace a fail because of tail pages after it. > > I introduced a __page_count to increase readability. It's still > hackish to work on subpages in vmscan.c but at least I added a comment > and until we serialize destroy_compound_page vs compound_head, I guess > there's no better way. I didn't attempt to add out of order > serialization similar to what exists for split_huge_page vs > compound_trans_head yet, as the page can be allocated or go away from > under us, in split_huge_page vs compound_trans_head it's simpler > because both callers are required to hold a pin on the page so the > page can't go be reallocated and destroyed under it. Sent too fast... had to shuffle a few things around... trying again. === Subject: mm: no page_count without a page pin From: Andrea Arcangeli <aarcange@redhat.com> It's unsafe to run page_count during the physical pfn scan because compound_head could trip on a dangling pointer when reading page->first_page if the compound page is being freed by another CPU. Also properly take into account if we isolated a compound page during the scan and break the loop if we've isolated enoguh. Introduce __page_count to cleanup some atomic_read from &page->_count in common code to cleanup. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- arch/powerpc/mm/gup.c | 2 - arch/powerpc/platforms/512x/mpc512x_shared.c | 2 - arch/x86/mm/gup.c | 2 - fs/nilfs2/page.c | 2 - include/linux/mm.h | 13 ++++++---- mm/huge_memory.c | 4 +-- mm/internal.h | 2 - mm/page_alloc.c | 6 ++-- mm/swap.c | 4 +-- mm/vmscan.c | 35 ++++++++++++++++++++------- 10 files changed, 47 insertions(+), 25 deletions(-) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1047,7 +1047,7 @@ static unsigned long isolate_lru_pages(u for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { struct page *page; unsigned long pfn; - unsigned long end_pfn; + unsigned long start_pfn, end_pfn; unsigned long page_pfn; int zone_id; @@ -1087,9 +1087,9 @@ static unsigned long isolate_lru_pages(u */ zone_id = page_zone_id(page); page_pfn = page_to_pfn(page); - pfn = page_pfn & ~((1 << order) - 1); - end_pfn = pfn + (1 << order); - for (; pfn < end_pfn; pfn++) { + start_pfn = page_pfn & ~((1 << order) - 1); + end_pfn = start_pfn + (1 << order); + for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct page *cursor_page; /* The target page is in the block, ignore it. */ @@ -1116,16 +1116,33 @@ static unsigned long isolate_lru_pages(u break; if (__isolate_lru_page(cursor_page, mode, file) == 0) { + unsigned int isolated_pages; list_move(&cursor_page->lru, dst); mem_cgroup_del_lru(cursor_page); - nr_taken += hpage_nr_pages(page); - nr_lumpy_taken++; + isolated_pages = hpage_nr_pages(page); + nr_taken += isolated_pages; + nr_lumpy_taken += isolated_pages; if (PageDirty(cursor_page)) - nr_lumpy_dirty++; + nr_lumpy_dirty += isolated_pages; scan++; + pfn += isolated_pages-1; + VM_BUG_ON(!isolated_pages); + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); } else { - /* the page is freed already. */ - if (!page_count(cursor_page)) + /* + * Check if the page is freed already. + * + * We can't use page_count() as that + * requires compound_head and we don't + * have a pin on the page here. If a + * page is tail, we may or may not + * have isolated the head, so assume + * it's not free, it'd be tricky to + * track the head status without a + * page pin. + */ + if (!PageTail(cursor_page) && + !__page_count(cursor_page)) continue; break; } --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -266,12 +266,17 @@ struct inode; * routine so they can be sure the page doesn't go away from under them. */ +static inline int __page_count(struct page *page) +{ + return atomic_read(&page->_count); +} + /* * Drop a ref, return true if the refcount fell to zero (the page has no users) */ static inline int put_page_testzero(struct page *page) { - VM_BUG_ON(atomic_read(&page->_count) == 0); + VM_BUG_ON(__page_count(page) == 0); return atomic_dec_and_test(&page->_count); } @@ -357,7 +362,7 @@ static inline struct page *compound_head static inline int page_count(struct page *page) { - return atomic_read(&compound_head(page)->_count); + return __page_count(compound_head(page)); } static inline void get_page(struct page *page) @@ -370,7 +375,7 @@ static inline void get_page(struct page * bugcheck only verifies that the page->_count isn't * negative. */ - VM_BUG_ON(atomic_read(&page->_count) < !PageTail(page)); + VM_BUG_ON(__page_count(page) < !PageTail(page)); atomic_inc(&page->_count); /* * Getting a tail page will elevate both the head and tail @@ -382,7 +387,7 @@ static inline void get_page(struct page * __split_huge_page_refcount can't run under * get_page(). */ - VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0); + VM_BUG_ON(__page_count(page->first_page) <= 0); atomic_inc(&page->first_page->_count); } } --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1203,10 +1203,10 @@ static void __split_huge_page_refcount(s struct page *page_tail = page + i; /* tail_page->_count cannot change */ - atomic_sub(atomic_read(&page_tail->_count), &page->_count); + atomic_sub(__page_count(page_tail), &page->_count); BUG_ON(page_count(page) <= 0); atomic_add(page_mapcount(page) + 1, &page_tail->_count); - BUG_ON(atomic_read(&page_tail->_count) <= 0); + BUG_ON(__page_count(page_tail) <= 0); /* after clearing PageTail the gup refcount can be released */ smp_mb(); --- a/arch/powerpc/mm/gup.c +++ b/arch/powerpc/mm/gup.c @@ -22,7 +22,7 @@ static inline void get_huge_page_tail(st * __split_huge_page_refcount() cannot run * from under us. */ - VM_BUG_ON(atomic_read(&page->_count) < 0); + VM_BUG_ON(__page_count(page) < 0); atomic_inc(&page->_count); } --- a/arch/powerpc/platforms/512x/mpc512x_shared.c +++ b/arch/powerpc/platforms/512x/mpc512x_shared.c @@ -200,7 +200,7 @@ static inline void mpc512x_free_bootmem( { __ClearPageReserved(page); BUG_ON(PageTail(page)); - BUG_ON(atomic_read(&page->_count) > 1); + BUG_ON(__page_count(page) > 1); atomic_set(&page->_count, 1); __free_page(page); totalram_pages++; --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -114,7 +114,7 @@ static inline void get_huge_page_tail(st * __split_huge_page_refcount() cannot run * from under us. */ - VM_BUG_ON(atomic_read(&page->_count) < 0); + VM_BUG_ON(__page_count(page) < 0); atomic_inc(&page->_count); } --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -181,7 +181,7 @@ void nilfs_page_bug(struct page *page) printk(KERN_CRIT "NILFS_PAGE_BUG(%p): cnt=%d index#=%llu flags=0x%lx " "mapping=%p ino=%lu\n", - page, atomic_read(&page->_count), + page, __page_count(page), (unsigned long long)page->index, page->flags, m, ino); if (page_has_buffers(page)) { --- a/mm/internal.h +++ b/mm/internal.h @@ -28,7 +28,7 @@ static inline void set_page_count(struct static inline void set_page_refcounted(struct page *page) { VM_BUG_ON(PageTail(page)); - VM_BUG_ON(atomic_read(&page->_count)); + VM_BUG_ON(__page_count(page)); set_page_count(page, 1); } --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -568,7 +568,7 @@ static inline int free_pages_check(struc { if (unlikely(page_mapcount(page) | (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | + (__page_count(page) != 0) | (page->flags & PAGE_FLAGS_CHECK_AT_FREE) | (mem_cgroup_bad_page_check(page)))) { bad_page(page); @@ -758,7 +758,7 @@ static inline int check_new_page(struct { if (unlikely(page_mapcount(page) | (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | + (__page_count(page) != 0) | (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page)))) { bad_page(page); @@ -5739,7 +5739,7 @@ void dump_page(struct page *page) { printk(KERN_ALERT "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n", - page, atomic_read(&page->_count), page_mapcount(page), + page, __page_count(page), page_mapcount(page), page->mapping, page->index); dump_page_flags(page->flags); mem_cgroup_print_bad_page(page); --- a/mm/swap.c +++ b/mm/swap.c @@ -128,9 +128,9 @@ static void put_compound_page(struct pag if (put_page_testzero(page_head)) VM_BUG_ON(1); /* __split_huge_page_refcount will wait now */ - VM_BUG_ON(atomic_read(&page->_count) <= 0); + VM_BUG_ON(__page_count(page) <= 0); atomic_dec(&page->_count); - VM_BUG_ON(atomic_read(&page_head->_count) <= 0); + VM_BUG_ON(__page_count(page_head) <= 0); compound_unlock_irqrestore(page_head, flags); if (put_page_testzero(page_head)) { if (PageHead(page_head)) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 18:07 ` Andrea Arcangeli @ 2011-06-04 7:59 ` Minchan Kim 2011-06-06 10:32 ` Mel Gorman 1 sibling, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-04 7:59 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 08:07:30PM +0200, Andrea Arcangeli wrote: > On Fri, Jun 03, 2011 at 07:37:07PM +0200, Andrea Arcangeli wrote: > > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > > Do you want this? (it's almost pseudo-code) > > > > Yes that's good idea so we at least take into account if we isolated > > something big, and it's pointless to insist wasting CPU on the tail > > pages and even trace a fail because of tail pages after it. > > > > I introduced a __page_count to increase readability. It's still > > hackish to work on subpages in vmscan.c but at least I added a comment > > and until we serialize destroy_compound_page vs compound_head, I guess > > there's no better way. I didn't attempt to add out of order > > serialization similar to what exists for split_huge_page vs > > compound_trans_head yet, as the page can be allocated or go away from > > under us, in split_huge_page vs compound_trans_head it's simpler > > because both callers are required to hold a pin on the page so the > > page can't go be reallocated and destroyed under it. > > Sent too fast... had to shuffle a few things around... trying again. > > === > Subject: mm: no page_count without a page pin > > From: Andrea Arcangeli <aarcange@redhat.com> > > It's unsafe to run page_count during the physical pfn scan because > compound_head could trip on a dangling pointer when reading page->first_page if > the compound page is being freed by another CPU. Also properly take into > account if we isolated a compound page during the scan and break the loop if > we've isolated enoguh. Introduce __page_count to cleanup some atomic_read from > &page->_count in common code to cleanup. > Patch looks good to me. I have a question. Please see bottom line. In addition, I think this patch have to be divided by 4 patches. 1. fix accounting nu_lumpy_taken, nr_lumpy_dirty on hpage 2. early breaking of isolate_lru_pages if we had enough isolated pages 3. introduce __page_count and cleanup 4. fix page_count usage of subpage in vmscan.c > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > arch/powerpc/mm/gup.c | 2 - > arch/powerpc/platforms/512x/mpc512x_shared.c | 2 - > arch/x86/mm/gup.c | 2 - > fs/nilfs2/page.c | 2 - > include/linux/mm.h | 13 ++++++---- > mm/huge_memory.c | 4 +-- > mm/internal.h | 2 - > mm/page_alloc.c | 6 ++-- > mm/swap.c | 4 +-- > mm/vmscan.c | 35 ++++++++++++++++++++------- > 10 files changed, 47 insertions(+), 25 deletions(-) > > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1047,7 +1047,7 @@ static unsigned long isolate_lru_pages(u > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > struct page *page; > unsigned long pfn; > - unsigned long end_pfn; > + unsigned long start_pfn, end_pfn; > unsigned long page_pfn; > int zone_id; > > @@ -1087,9 +1087,9 @@ static unsigned long isolate_lru_pages(u > */ > zone_id = page_zone_id(page); > page_pfn = page_to_pfn(page); > - pfn = page_pfn & ~((1 << order) - 1); > - end_pfn = pfn + (1 << order); > - for (; pfn < end_pfn; pfn++) { > + start_pfn = page_pfn & ~((1 << order) - 1); > + end_pfn = start_pfn + (1 << order); > + for (pfn = start_pfn; pfn < end_pfn; pfn++) { > struct page *cursor_page; > > /* The target page is in the block, ignore it. */ > @@ -1116,16 +1116,33 @@ static unsigned long isolate_lru_pages(u > break; > > if (__isolate_lru_page(cursor_page, mode, file) == 0) { > + unsigned int isolated_pages; > list_move(&cursor_page->lru, dst); > mem_cgroup_del_lru(cursor_page); > - nr_taken += hpage_nr_pages(page); > - nr_lumpy_taken++; > + isolated_pages = hpage_nr_pages(page); > + nr_taken += isolated_pages; > + nr_lumpy_taken += isolated_pages; > if (PageDirty(cursor_page)) > - nr_lumpy_dirty++; > + nr_lumpy_dirty += isolated_pages; > scan++; > + pfn += isolated_pages-1; > + VM_BUG_ON(!isolated_pages); > + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); What's point of this VM_BUG_ONs? Could you explain what you expect with this VM_BUG_ONs? -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 18:07 ` Andrea Arcangeli 2011-06-04 7:59 ` Minchan Kim @ 2011-06-06 10:32 ` Mel Gorman 2011-06-06 12:49 ` Andrea Arcangeli 2011-06-06 14:07 ` Minchan Kim 1 sibling, 2 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 10:32 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 08:07:30PM +0200, Andrea Arcangeli wrote: > On Fri, Jun 03, 2011 at 07:37:07PM +0200, Andrea Arcangeli wrote: > > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > > Do you want this? (it's almost pseudo-code) > > > > Yes that's good idea so we at least take into account if we isolated > > something big, and it's pointless to insist wasting CPU on the tail > > pages and even trace a fail because of tail pages after it. > > > > I introduced a __page_count to increase readability. It's still > > hackish to work on subpages in vmscan.c but at least I added a comment > > and until we serialize destroy_compound_page vs compound_head, I guess > > there's no better way. I didn't attempt to add out of order > > serialization similar to what exists for split_huge_page vs > > compound_trans_head yet, as the page can be allocated or go away from > > under us, in split_huge_page vs compound_trans_head it's simpler > > because both callers are required to hold a pin on the page so the > > page can't go be reallocated and destroyed under it. > > Sent too fast... had to shuffle a few things around... trying again. > > === > Subject: mm: no page_count without a page pin > > From: Andrea Arcangeli <aarcange@redhat.com> > > It's unsafe to run page_count during the physical pfn scan because > compound_head could trip on a dangling pointer when reading page->first_page if > the compound page is being freed by another CPU. Also properly take into > account if we isolated a compound page during the scan and break the loop if > we've isolated enoguh. Introduce __page_count to cleanup some atomic_read from > &page->_count in common code to cleanup. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> This patch is pulling in stuff from Minchan. Minimally his patch should be kept separate to preserve history or his Signed-off should be included on this patch. > --- > arch/powerpc/mm/gup.c | 2 - > arch/powerpc/platforms/512x/mpc512x_shared.c | 2 - > arch/x86/mm/gup.c | 2 - > fs/nilfs2/page.c | 2 - > include/linux/mm.h | 13 ++++++---- > mm/huge_memory.c | 4 +-- > mm/internal.h | 2 - > mm/page_alloc.c | 6 ++-- > mm/swap.c | 4 +-- > mm/vmscan.c | 35 ++++++++++++++++++++------- > 10 files changed, 47 insertions(+), 25 deletions(-) > > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1047,7 +1047,7 @@ static unsigned long isolate_lru_pages(u > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > struct page *page; > unsigned long pfn; > - unsigned long end_pfn; > + unsigned long start_pfn, end_pfn; > unsigned long page_pfn; > int zone_id; > > @@ -1087,9 +1087,9 @@ static unsigned long isolate_lru_pages(u > */ > zone_id = page_zone_id(page); > page_pfn = page_to_pfn(page); > - pfn = page_pfn & ~((1 << order) - 1); > - end_pfn = pfn + (1 << order); > - for (; pfn < end_pfn; pfn++) { > + start_pfn = page_pfn & ~((1 << order) - 1); > + end_pfn = start_pfn + (1 << order); > + for (pfn = start_pfn; pfn < end_pfn; pfn++) { > struct page *cursor_page; > > /* The target page is in the block, ignore it. */ > @@ -1116,16 +1116,33 @@ static unsigned long isolate_lru_pages(u > break; > > if (__isolate_lru_page(cursor_page, mode, file) == 0) { > + unsigned int isolated_pages; > list_move(&cursor_page->lru, dst); > mem_cgroup_del_lru(cursor_page); > - nr_taken += hpage_nr_pages(page); > - nr_lumpy_taken++; > + isolated_pages = hpage_nr_pages(page); > + nr_taken += isolated_pages; > + nr_lumpy_taken += isolated_pages; > if (PageDirty(cursor_page)) > - nr_lumpy_dirty++; > + nr_lumpy_dirty += isolated_pages; > scan++; > + pfn += isolated_pages-1; Ah, here is the isolated_pages - 1 which is necessary. Should have read the whole thread before responding to anything :). I still think this optimisation is rare and only applies if we are encountering huge pages during the linear scan. How often are we doing that really? > + VM_BUG_ON(!isolated_pages); This BUG_ON is overkill. hpage_nr_pages would have to return 0. > + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); This would require order > MAX_ORDER_NR_PAGES to be passed into isolate_lru_pages or for a huge page to be unaligned to a power of two. The former is very unlikely and the latter is not supported by any CPU. > } else { > - /* the page is freed already. */ > - if (!page_count(cursor_page)) > + /* > + * Check if the page is freed already. > + * > + * We can't use page_count() as that > + * requires compound_head and we don't > + * have a pin on the page here. If a > + * page is tail, we may or may not > + * have isolated the head, so assume > + * it's not free, it'd be tricky to > + * track the head status without a > + * page pin. > + */ > + if (!PageTail(cursor_page) && > + !__page_count(cursor_page)) > continue; > break; Ack to this part. I'm not keen on __page_count() as __ normally means the "unlocked" version of a function although I realise that rule isn't universal either. I can't think of a better name though. > } > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -266,12 +266,17 @@ struct inode; > * routine so they can be sure the page doesn't go away from under them. > */ > > +static inline int __page_count(struct page *page) > +{ > + return atomic_read(&page->_count); > +} > + > /* > * Drop a ref, return true if the refcount fell to zero (the page has no users) > */ > static inline int put_page_testzero(struct page *page) > { > - VM_BUG_ON(atomic_read(&page->_count) == 0); > + VM_BUG_ON(__page_count(page) == 0); > return atomic_dec_and_test(&page->_count); > } > > @@ -357,7 +362,7 @@ static inline struct page *compound_head > > static inline int page_count(struct page *page) > { > - return atomic_read(&compound_head(page)->_count); > + return __page_count(compound_head(page)); > } > > static inline void get_page(struct page *page) > @@ -370,7 +375,7 @@ static inline void get_page(struct page > * bugcheck only verifies that the page->_count isn't > * negative. > */ > - VM_BUG_ON(atomic_read(&page->_count) < !PageTail(page)); > + VM_BUG_ON(__page_count(page) < !PageTail(page)); > atomic_inc(&page->_count); > /* > * Getting a tail page will elevate both the head and tail > @@ -382,7 +387,7 @@ static inline void get_page(struct page > * __split_huge_page_refcount can't run under > * get_page(). > */ > - VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0); > + VM_BUG_ON(__page_count(page->first_page) <= 0); > atomic_inc(&page->first_page->_count); > } > } > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1203,10 +1203,10 @@ static void __split_huge_page_refcount(s > struct page *page_tail = page + i; > > /* tail_page->_count cannot change */ > - atomic_sub(atomic_read(&page_tail->_count), &page->_count); > + atomic_sub(__page_count(page_tail), &page->_count); > BUG_ON(page_count(page) <= 0); > atomic_add(page_mapcount(page) + 1, &page_tail->_count); > - BUG_ON(atomic_read(&page_tail->_count) <= 0); > + BUG_ON(__page_count(page_tail) <= 0); > > /* after clearing PageTail the gup refcount can be released */ > smp_mb(); > --- a/arch/powerpc/mm/gup.c > +++ b/arch/powerpc/mm/gup.c > @@ -22,7 +22,7 @@ static inline void get_huge_page_tail(st > * __split_huge_page_refcount() cannot run > * from under us. > */ > - VM_BUG_ON(atomic_read(&page->_count) < 0); > + VM_BUG_ON(__page_count(page) < 0); > atomic_inc(&page->_count); > } > > --- a/arch/powerpc/platforms/512x/mpc512x_shared.c > +++ b/arch/powerpc/platforms/512x/mpc512x_shared.c > @@ -200,7 +200,7 @@ static inline void mpc512x_free_bootmem( > { > __ClearPageReserved(page); > BUG_ON(PageTail(page)); > - BUG_ON(atomic_read(&page->_count) > 1); > + BUG_ON(__page_count(page) > 1); > atomic_set(&page->_count, 1); > __free_page(page); > totalram_pages++; > --- a/arch/x86/mm/gup.c > +++ b/arch/x86/mm/gup.c > @@ -114,7 +114,7 @@ static inline void get_huge_page_tail(st > * __split_huge_page_refcount() cannot run > * from under us. > */ > - VM_BUG_ON(atomic_read(&page->_count) < 0); > + VM_BUG_ON(__page_count(page) < 0); > atomic_inc(&page->_count); > } > > --- a/fs/nilfs2/page.c > +++ b/fs/nilfs2/page.c > @@ -181,7 +181,7 @@ void nilfs_page_bug(struct page *page) > > printk(KERN_CRIT "NILFS_PAGE_BUG(%p): cnt=%d index#=%llu flags=0x%lx " > "mapping=%p ino=%lu\n", > - page, atomic_read(&page->_count), > + page, __page_count(page), > (unsigned long long)page->index, page->flags, m, ino); > > if (page_has_buffers(page)) { > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -28,7 +28,7 @@ static inline void set_page_count(struct > static inline void set_page_refcounted(struct page *page) > { > VM_BUG_ON(PageTail(page)); > - VM_BUG_ON(atomic_read(&page->_count)); > + VM_BUG_ON(__page_count(page)); > set_page_count(page, 1); > } > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -568,7 +568,7 @@ static inline int free_pages_check(struc > { > if (unlikely(page_mapcount(page) | > (page->mapping != NULL) | > - (atomic_read(&page->_count) != 0) | > + (__page_count(page) != 0) | > (page->flags & PAGE_FLAGS_CHECK_AT_FREE) | > (mem_cgroup_bad_page_check(page)))) { > bad_page(page); > @@ -758,7 +758,7 @@ static inline int check_new_page(struct > { > if (unlikely(page_mapcount(page) | > (page->mapping != NULL) | > - (atomic_read(&page->_count) != 0) | > + (__page_count(page) != 0) | > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | > (mem_cgroup_bad_page_check(page)))) { > bad_page(page); > @@ -5739,7 +5739,7 @@ void dump_page(struct page *page) > { > printk(KERN_ALERT > "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n", > - page, atomic_read(&page->_count), page_mapcount(page), > + page, __page_count(page), page_mapcount(page), > page->mapping, page->index); > dump_page_flags(page->flags); > mem_cgroup_print_bad_page(page); > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -128,9 +128,9 @@ static void put_compound_page(struct pag > if (put_page_testzero(page_head)) > VM_BUG_ON(1); > /* __split_huge_page_refcount will wait now */ > - VM_BUG_ON(atomic_read(&page->_count) <= 0); > + VM_BUG_ON(__page_count(page) <= 0); > atomic_dec(&page->_count); > - VM_BUG_ON(atomic_read(&page_head->_count) <= 0); > + VM_BUG_ON(__page_count(page_head) <= 0); > compound_unlock_irqrestore(page_head, flags); > if (put_page_testzero(page_head)) { > if (PageHead(page_head)) -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:32 ` Mel Gorman @ 2011-06-06 12:49 ` Andrea Arcangeli 2011-06-06 14:47 ` Mel Gorman 2011-06-06 14:07 ` Minchan Kim 1 sibling, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-06 12:49 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Mon, Jun 06, 2011 at 11:32:16AM +0100, Mel Gorman wrote: > This patch is pulling in stuff from Minchan. Minimally his patch should > be kept separate to preserve history or his Signed-off should be > included on this patch. Well I didn't apply Minchan's patch, just improved it as he suggested from pseudocode, but I can add his signed-off-by no prob. > I still think this optimisation is rare and only applies if we are > encountering huge pages during the linear scan. How often are we doing > that really? Well it's so fast to do it, that it looks worthwhile. You probably noticed initially I suggested only the fix for page_count (theoretical) oops, and I argued we could improve some more bits, but then it was kind of obvious to improve the upper side of the loop too according to pseudocode. > > > + VM_BUG_ON(!isolated_pages); > > This BUG_ON is overkill. hpage_nr_pages would have to return 0. > > > + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); > > This would require order > MAX_ORDER_NR_PAGES to be passed into > isolate_lru_pages or for a huge page to be unaligned to a power of > two. The former is very unlikely and the latter is not supported by > any CPU. Minchan also disliked the VM_BUG_ON, it's clearly way overkill, but frankly the pfn physical scans are tricky enough things and if there's a race and the order is wrong for whatever reason (no compound page or overwritten by driver messing with subpages) we'll just trip into some weird pointer next iteration (or maybe not and it'll go ahead unnoticed if it's not beyond the range) and in that case I'd like to notice immediately. But probably it's too paranoid even of a VM_BUG_ON so I surely can remove it... > > > } else { > > - /* the page is freed already. */ > > - if (!page_count(cursor_page)) > > + /* > > + * Check if the page is freed already. > > + * > > + * We can't use page_count() as that > > + * requires compound_head and we don't > > + * have a pin on the page here. If a > > + * page is tail, we may or may not > > + * have isolated the head, so assume > > + * it's not free, it'd be tricky to > > + * track the head status without a > > + * page pin. > > + */ > > + if (!PageTail(cursor_page) && > > + !__page_count(cursor_page)) > > continue; > > break; > > Ack to this part. This is also the only important part that fixes the potential oops. > I'm not keen on __page_count() as __ normally means the "unlocked" > version of a function although I realise that rule isn't universal > either. I can't think of a better name though. If better suggestions comes to mind I can change it... Or I can also use atomic_read like in the first patch... it's up to you. I figured it wasn't so nice to call atomic_read and there are other places in huge_memory.c that used that for bugchecks and it can be cleaned up with __page_count. The _count having _ prefix is the thing that makes it look like a more private field not to use in generic VM code so the raw value can be altered without changing all callers of __page_count similar to _mapcount. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 12:49 ` Andrea Arcangeli @ 2011-06-06 14:47 ` Mel Gorman 0 siblings, 0 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 14:47 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Mon, Jun 06, 2011 at 02:49:54PM +0200, Andrea Arcangeli wrote: > On Mon, Jun 06, 2011 at 11:32:16AM +0100, Mel Gorman wrote: > > This patch is pulling in stuff from Minchan. Minimally his patch should > > be kept separate to preserve history or his Signed-off should be > > included on this patch. > > Well I didn't apply Minchan's patch, just improved it as he suggested > from pseudocode, but I can add his signed-off-by no prob. > My bad, the pseudo-code was close enough to being a patch I felt it at least merited a mention in the patch. > > I still think this optimisation is rare and only applies if we are > > encountering huge pages during the linear scan. How often are we doing > > that really? > > Well it's so fast to do it, that it looks worthwhile. You probably > noticed initially I suggested only the fix for page_count > (theoretical) oops, and I argued we could improve some more bits, but > then it was kind of obvious to improve the upper side of the loop too > according to pseudocode. > I don't feel very strongly about it. I don't think there is much of a boost because of how rarely we'll encounter this situation but there is no harm either. I think the page_count fix is more important. > > > > > + VM_BUG_ON(!isolated_pages); > > > > This BUG_ON is overkill. hpage_nr_pages would have to return 0. > > > > > + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); > > > > This would require order > MAX_ORDER_NR_PAGES to be passed into > > isolate_lru_pages or for a huge page to be unaligned to a power of > > two. The former is very unlikely and the latter is not supported by > > any CPU. > > Minchan also disliked the VM_BUG_ON, it's clearly way overkill, but > frankly the pfn physical scans are tricky enough things and if there's > a race and the order is wrong for whatever reason (no compound page or > overwritten by driver messing with subpages) we'll just trip into some > weird pointer next iteration (or maybe not and it'll go ahead > unnoticed if it's not beyond the range) and in that case I'd like to > notice immediately. > I guess there is always the chance that an out-of-tree driver will do something utterly insane with a transparent hugepage and while this BUG_ON is "impossible", it doesn't hurt either. > But probably it's too paranoid even of a VM_BUG_ON so I surely can > remove it... > > > > > > } else { > > > - /* the page is freed already. */ > > > - if (!page_count(cursor_page)) > > > + /* > > > + * Check if the page is freed already. > > > + * > > > + * We can't use page_count() as that > > > + * requires compound_head and we don't > > > + * have a pin on the page here. If a > > > + * page is tail, we may or may not > > > + * have isolated the head, so assume > > > + * it's not free, it'd be tricky to > > > + * track the head status without a > > > + * page pin. > > > + */ > > > + if (!PageTail(cursor_page) && > > > + !__page_count(cursor_page)) > > > continue; > > > break; > > > > Ack to this part. > > This is also the only important part that fixes the potential oops. > Agreed. > > I'm not keen on __page_count() as __ normally means the "unlocked" > > version of a function although I realise that rule isn't universal > > either. I can't think of a better name though. > > If better suggestions comes to mind I can change it... Or I can also > use atomic_read like in the first patch... it's up to you. The atomic_read is not an improvement. What you have is better than adding another atomic_read to page count. > I figured > it wasn't so nice to call atomic_read and there are other places in > huge_memory.c that used that for bugchecks and it can be cleaned up > with __page_count. The _count having _ prefix is the thing that makes > it look like a more private field not to use in generic VM code so the > raw value can be altered without changing all callers of __page_count > similar to _mapcount. There is that. Go with __page_count because it's better than an atomic_read. I'm happy to ack the __page_count part of this patch so please split it out because it is a -stable candidate where as the potential optimisation and VM_BUG_ONs are not. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:32 ` Mel Gorman 2011-06-06 12:49 ` Andrea Arcangeli @ 2011-06-06 14:07 ` Minchan Kim 1 sibling, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-06 14:07 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Mon, Jun 06, 2011 at 11:32:16AM +0100, Mel Gorman wrote: > On Fri, Jun 03, 2011 at 08:07:30PM +0200, Andrea Arcangeli wrote: > > On Fri, Jun 03, 2011 at 07:37:07PM +0200, Andrea Arcangeli wrote: > > > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > > > Do you want this? (it's almost pseudo-code) > > > > > > Yes that's good idea so we at least take into account if we isolated > > > something big, and it's pointless to insist wasting CPU on the tail > > > pages and even trace a fail because of tail pages after it. > > > > > > I introduced a __page_count to increase readability. It's still > > > hackish to work on subpages in vmscan.c but at least I added a comment > > > and until we serialize destroy_compound_page vs compound_head, I guess > > > there's no better way. I didn't attempt to add out of order > > > serialization similar to what exists for split_huge_page vs > > > compound_trans_head yet, as the page can be allocated or go away from > > > under us, in split_huge_page vs compound_trans_head it's simpler > > > because both callers are required to hold a pin on the page so the > > > page can't go be reallocated and destroyed under it. > > > > Sent too fast... had to shuffle a few things around... trying again. > > > > === > > Subject: mm: no page_count without a page pin > > > > From: Andrea Arcangeli <aarcange@redhat.com> > > > > It's unsafe to run page_count during the physical pfn scan because > > compound_head could trip on a dangling pointer when reading page->first_page if > > the compound page is being freed by another CPU. Also properly take into > > account if we isolated a compound page during the scan and break the loop if > > we've isolated enoguh. Introduce __page_count to cleanup some atomic_read from > > &page->_count in common code to cleanup. > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > This patch is pulling in stuff from Minchan. Minimally his patch should > be kept separate to preserve history or his Signed-off should be > included on this patch. > > > --- > > arch/powerpc/mm/gup.c | 2 - > > arch/powerpc/platforms/512x/mpc512x_shared.c | 2 - > > arch/x86/mm/gup.c | 2 - > > fs/nilfs2/page.c | 2 - > > include/linux/mm.h | 13 ++++++---- > > mm/huge_memory.c | 4 +-- > > mm/internal.h | 2 - > > mm/page_alloc.c | 6 ++-- > > mm/swap.c | 4 +-- > > mm/vmscan.c | 35 ++++++++++++++++++++------- > > 10 files changed, 47 insertions(+), 25 deletions(-) > > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1047,7 +1047,7 @@ static unsigned long isolate_lru_pages(u > > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > > struct page *page; > > unsigned long pfn; > > - unsigned long end_pfn; > > + unsigned long start_pfn, end_pfn; > > unsigned long page_pfn; > > int zone_id; > > > > @@ -1087,9 +1087,9 @@ static unsigned long isolate_lru_pages(u > > */ > > zone_id = page_zone_id(page); > > page_pfn = page_to_pfn(page); > > - pfn = page_pfn & ~((1 << order) - 1); > > - end_pfn = pfn + (1 << order); > > - for (; pfn < end_pfn; pfn++) { > > + start_pfn = page_pfn & ~((1 << order) - 1); > > + end_pfn = start_pfn + (1 << order); > > + for (pfn = start_pfn; pfn < end_pfn; pfn++) { > > struct page *cursor_page; > > > > /* The target page is in the block, ignore it. */ > > @@ -1116,16 +1116,33 @@ static unsigned long isolate_lru_pages(u > > break; > > > > if (__isolate_lru_page(cursor_page, mode, file) == 0) { > > + unsigned int isolated_pages; > > list_move(&cursor_page->lru, dst); > > mem_cgroup_del_lru(cursor_page); > > - nr_taken += hpage_nr_pages(page); > > - nr_lumpy_taken++; > > + isolated_pages = hpage_nr_pages(page); > > + nr_taken += isolated_pages; > > + nr_lumpy_taken += isolated_pages; > > if (PageDirty(cursor_page)) > > - nr_lumpy_dirty++; > > + nr_lumpy_dirty += isolated_pages; > > scan++; > > + pfn += isolated_pages-1; > > Ah, here is the isolated_pages - 1 which is necessary. Should have read > the whole thread before responding to anything :). > > I still think this optimisation is rare and only applies if we are > encountering huge pages during the linear scan. How often are we doing > that really? > > > + VM_BUG_ON(!isolated_pages); > > This BUG_ON is overkill. hpage_nr_pages would have to return 0. > > > + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); > > This would require order > MAX_ORDER_NR_PAGES to be passed into > isolate_lru_pages or for a huge page to be unaligned to a power of > two. The former is very unlikely and the latter is not supported by > any CPU. > > > } else { > > - /* the page is freed already. */ > > - if (!page_count(cursor_page)) > > + /* > > + * Check if the page is freed already. > > + * > > + * We can't use page_count() as that > > + * requires compound_head and we don't > > + * have a pin on the page here. If a > > + * page is tail, we may or may not > > + * have isolated the head, so assume > > + * it's not free, it'd be tricky to > > + * track the head status without a > > + * page pin. > > + */ > > + if (!PageTail(cursor_page) && > > + !__page_count(cursor_page)) > > continue; > > break; > > Ack to this part. > > I'm not keen on __page_count() as __ normally means the "unlocked" > version of a function although I realise that rule isn't universal Yes. It's not univeral. I have thought about it as it's just private function or utility function (ie, not-exportable). So I don't mind the name. > either. I can't think of a better name though. Me, too. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 23:01 ` Minchan Kim 2011-06-03 17:37 ` Andrea Arcangeli @ 2011-06-06 10:15 ` Mel Gorman 2011-06-06 10:26 ` Mel Gorman ` (2 more replies) 1 sibling, 3 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 10:15 UTC (permalink / raw) To: Minchan Kim Cc: Andrea Arcangeli, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > On Fri, Jun 3, 2011 at 7:32 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > > On Fri, Jun 03, 2011 at 07:23:48AM +0900, Minchan Kim wrote: > >> I mean we have more tail pages than head pages. So I think we are likely to > >> meet tail pages. Of course, compared to all pages(page cache, anon and > >> so on), compound pages would be very small percentage. > > > > Yes that's my point, that being a small percentage it's no big deal to > > break the loop early. > > Indeed. > > > > >> > isolated the head and it's useless to insist on more tail pages (at > >> > least for large page size like on x86). Plus we've compaction so > >> > >> I can't understand your point. Could you elaborate it? > > > > What I meant is that if we already isolated the head page of the THP, > > we don't need to try to free the tail pages and breaking the loop > > early, will still give us a chance to free a whole 2m because we > > isolated the head page (it'll involve some work and swapping but if it > > was a compoundtranspage we're ok to break the loop and we're not > > making the logic any worse). Provided the PMD_SIZE is quite large like > > 2/4m... > > Do you want this? (it's almost pseudo-code) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 7a4469b..9d7609f 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1017,7 +1017,7 @@ static unsigned long isolate_lru_pages(unsigned > long nr_to_scan, > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > struct page *page; > unsigned long pfn; > - unsigned long end_pfn; > + unsigned long start_pfn, end_pfn; > unsigned long page_pfn; > int zone_id; > > @@ -1057,9 +1057,9 @@ static unsigned long isolate_lru_pages(unsigned > long nr_to_scan, > */ > zone_id = page_zone_id(page); > page_pfn = page_to_pfn(page); > - pfn = page_pfn & ~((1 << order) - 1); > + start_pfn = pfn = page_pfn & ~((1 << order) - 1); > end_pfn = pfn + (1 << order); > - for (; pfn < end_pfn; pfn++) { > + while (pfn < end_pfn) { > struct page *cursor_page; > > /* The target page is in the block, ignore it. */ > @@ -1086,17 +1086,25 @@ static unsigned long > isolate_lru_pages(unsigned long nr_to_scan, > break; > > if (__isolate_lru_page(cursor_page, mode, file) == 0) { > + int isolated_pages; > list_move(&cursor_page->lru, dst); > mem_cgroup_del_lru(cursor_page); > - nr_taken += hpage_nr_pages(page); > + isolated_pages = hpage_nr_pages(page); > + nr_taken += isolated_pages; > + /* if we isolated pages enough, let's > break early */ > + if (nr_taken > end_pfn - start_pfn) > + break; > + pfn += isolated_pages; I think this condition is somewhat unlikely. We are scanning within aligned blocks in this linear scanner. Huge pages are always aligned so the only situation where we'll encounter a hugepage in the middle of this linear scan is when the requested order is larger than a huge page. This is exceptionally rare. Did I miss something? > nr_lumpy_taken++; > if (PageDirty(cursor_page)) > nr_lumpy_dirty++; > scan++; > } else { > /* the page is freed already. */ > - if (!page_count(cursor_page)) > + if (!page_count(cursor_page)) { > + pfn++; > continue; > + } > break; > } > } > > > > > The only way this patch makes things worse is for slub order 3 in the > > process of being freed. But tail pages aren't generally free anyway so > > I doubt this really makes any difference plus the tail is getting > > cleared as soon as the page reaches the buddy so it's probably > > Okay. Considering getting clear PG_tail as soon as slub order 3 is > freed, it would be very rare case. > > > unnoticeable as this then makes a difference only during a race (plus > > the tail page can't be isolated, only head page can be part of lrus > > and only if they're THP). > > > >> > insisting and screwing lru ordering isn't worth it, better to be > >> > permissive and abort... in fact I wouldn't dislike to remove the > >> > entire lumpy logic when COMPACTION_BUILD is true, but that alters the > >> > trace too... > >> > >> AFAIK, it's final destination to go as compaction will not break lru > >> ordering if my patch(inorder-putback) is merged. > > > > Agreed. I like your patchset, sorry for not having reviewed it in > > detail yet but there were other issues popping up in the last few > > days. > > No problem. it's urgent than mine. :) > I'm going to take the opportunity to apologise for not reviewing that series yet. I've been kept too busy with other bugs to set side the few hours I need to review the series. I'm hoping to get to it this week if everything goes well. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:15 ` Mel Gorman @ 2011-06-06 10:26 ` Mel Gorman 2011-06-06 14:01 ` Minchan Kim 2011-06-06 14:26 ` Minchan Kim 2 siblings, 0 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 10:26 UTC (permalink / raw) To: Minchan Kim Cc: Andrea Arcangeli, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Mon, Jun 06, 2011 at 11:15:57AM +0100, Mel Gorman wrote: > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > On Fri, Jun 3, 2011 at 7:32 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > > > On Fri, Jun 03, 2011 at 07:23:48AM +0900, Minchan Kim wrote: > > >> I mean we have more tail pages than head pages. So I think we are likely to > > >> meet tail pages. Of course, compared to all pages(page cache, anon and > > >> so on), compound pages would be very small percentage. > > > > > > Yes that's my point, that being a small percentage it's no big deal to > > > break the loop early. > > > > Indeed. > > > > > > > >> > isolated the head and it's useless to insist on more tail pages (at > > >> > least for large page size like on x86). Plus we've compaction so > > >> > > >> I can't understand your point. Could you elaborate it? > > > > > > What I meant is that if we already isolated the head page of the THP, > > > we don't need to try to free the tail pages and breaking the loop > > > early, will still give us a chance to free a whole 2m because we > > > isolated the head page (it'll involve some work and swapping but if it > > > was a compoundtranspage we're ok to break the loop and we're not > > > making the logic any worse). Provided the PMD_SIZE is quite large like > > > 2/4m... > > > > Do you want this? (it's almost pseudo-code) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 7a4469b..9d7609f 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1017,7 +1017,7 @@ static unsigned long isolate_lru_pages(unsigned > > long nr_to_scan, > > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > > struct page *page; > > unsigned long pfn; > > - unsigned long end_pfn; > > + unsigned long start_pfn, end_pfn; > > unsigned long page_pfn; > > int zone_id; > > > > @@ -1057,9 +1057,9 @@ static unsigned long isolate_lru_pages(unsigned > > long nr_to_scan, > > */ > > zone_id = page_zone_id(page); > > page_pfn = page_to_pfn(page); > > - pfn = page_pfn & ~((1 << order) - 1); > > + start_pfn = pfn = page_pfn & ~((1 << order) - 1); > > end_pfn = pfn + (1 << order); > > - for (; pfn < end_pfn; pfn++) { > > + while (pfn < end_pfn) { > > struct page *cursor_page; > > > > /* The target page is in the block, ignore it. */ > > @@ -1086,17 +1086,25 @@ static unsigned long > > isolate_lru_pages(unsigned long nr_to_scan, > > break; > > > > if (__isolate_lru_page(cursor_page, mode, file) == 0) { > > + int isolated_pages; > > list_move(&cursor_page->lru, dst); > > mem_cgroup_del_lru(cursor_page); > > - nr_taken += hpage_nr_pages(page); > > + isolated_pages = hpage_nr_pages(page); > > + nr_taken += isolated_pages; > > + /* if we isolated pages enough, let's > > break early */ > > + if (nr_taken > end_pfn - start_pfn) > > + break; > > + pfn += isolated_pages; > > I think this condition is somewhat unlikely. We are scanning within > aligned blocks in this linear scanner. Huge pages are always aligned > so the only situation where we'll encounter a hugepage in the middle > of this linear scan is when the requested order is larger than a huge > page. This is exceptionally rare. > > Did I miss something? > I forgot to mention the "pfn += isolated_pages" but I'm also worried about it. It's a performance gain iif we are encountering huge pages during the linear scan which I think is rare but also, I think this is now skipping pages in the linear scan because we now have for (; pfn < end_pfn; pfn++) { if (isolate page) { isolated_pages = hpage_nr_pages(page); pfn += isolated_pages; } } hpage_nr_pages is returning 1 for order-0 LRU pages so now the loop is effectively for (; pfn < end_pfn; pfn += 2) Did you mean pfn += isolated_pages - 1; ? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:15 ` Mel Gorman 2011-06-06 10:26 ` Mel Gorman @ 2011-06-06 14:01 ` Minchan Kim 2011-06-06 14:26 ` Minchan Kim 2 siblings, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-06 14:01 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Mon, Jun 06, 2011 at 11:15:57AM +0100, Mel Gorman wrote: > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > On Fri, Jun 3, 2011 at 7:32 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > > > On Fri, Jun 03, 2011 at 07:23:48AM +0900, Minchan Kim wrote: > > >> I mean we have more tail pages than head pages. So I think we are likely to > > >> meet tail pages. Of course, compared to all pages(page cache, anon and > > >> so on), compound pages would be very small percentage. > > > > > > Yes that's my point, that being a small percentage it's no big deal to > > > break the loop early. > > > > Indeed. > > > > > > > >> > isolated the head and it's useless to insist on more tail pages (at > > >> > least for large page size like on x86). Plus we've compaction so > > >> > > >> I can't understand your point. Could you elaborate it? > > > > > > What I meant is that if we already isolated the head page of the THP, > > > we don't need to try to free the tail pages and breaking the loop > > > early, will still give us a chance to free a whole 2m because we > > > isolated the head page (it'll involve some work and swapping but if it > > > was a compoundtranspage we're ok to break the loop and we're not > > > making the logic any worse). Provided the PMD_SIZE is quite large like > > > 2/4m... > > > > Do you want this? (it's almost pseudo-code) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 7a4469b..9d7609f 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1017,7 +1017,7 @@ static unsigned long isolate_lru_pages(unsigned > > long nr_to_scan, > > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > > struct page *page; > > unsigned long pfn; > > - unsigned long end_pfn; > > + unsigned long start_pfn, end_pfn; > > unsigned long page_pfn; > > int zone_id; > > > > @@ -1057,9 +1057,9 @@ static unsigned long isolate_lru_pages(unsigned > > long nr_to_scan, > > */ > > zone_id = page_zone_id(page); > > page_pfn = page_to_pfn(page); > > - pfn = page_pfn & ~((1 << order) - 1); > > + start_pfn = pfn = page_pfn & ~((1 << order) - 1); > > end_pfn = pfn + (1 << order); > > - for (; pfn < end_pfn; pfn++) { > > + while (pfn < end_pfn) { > > struct page *cursor_page; > > > > /* The target page is in the block, ignore it. */ > > @@ -1086,17 +1086,25 @@ static unsigned long > > isolate_lru_pages(unsigned long nr_to_scan, > > break; > > > > if (__isolate_lru_page(cursor_page, mode, file) == 0) { > > + int isolated_pages; > > list_move(&cursor_page->lru, dst); > > mem_cgroup_del_lru(cursor_page); > > - nr_taken += hpage_nr_pages(page); > > + isolated_pages = hpage_nr_pages(page); > > + nr_taken += isolated_pages; > > + /* if we isolated pages enough, let's > > break early */ > > + if (nr_taken > end_pfn - start_pfn) > > + break; > > + pfn += isolated_pages; > > I think this condition is somewhat unlikely. We are scanning within > aligned blocks in this linear scanner. Huge pages are always aligned > so the only situation where we'll encounter a hugepage in the middle > of this linear scan is when the requested order is larger than a huge > page. This is exceptionally rare. > > Did I miss something? Never. You're absolute right. I don't have systems which have lots of hpages. But I have heard some guys tunes MAX_ORDER(Whether it's a good or bad is off-topic). Anyway, it would be good in such system but I admit it would be rare. I don't have strong mind about this pseudo patch. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:15 ` Mel Gorman 2011-06-06 10:26 ` Mel Gorman 2011-06-06 14:01 ` Minchan Kim @ 2011-06-06 14:26 ` Minchan Kim 2 siblings, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-06 14:26 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Mon, Jun 06, 2011 at 11:15:57AM +0100, Mel Gorman wrote: > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote: > > On Fri, Jun 3, 2011 at 7:32 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > > > On Fri, Jun 03, 2011 at 07:23:48AM +0900, Minchan Kim wrote: <snip> > > >> AFAIK, it's final destination to go as compaction will not break lru > > >> ordering if my patch(inorder-putback) is merged. > > > > > > Agreed. I like your patchset, sorry for not having reviewed it in > > > detail yet but there were other issues popping up in the last few > > > days. > > > > No problem. it's urgent than mine. :) > > > > I'm going to take the opportunity to apologise for not reviewing that > series yet. I've been kept too busy with other bugs to set side the > few hours I need to review the series. I'm hoping to get to it this > week if everything goes well. I am refactoring the code about migration. Maybe, I will resend it on tomorrow. I hope you guys reviews that. :) Thanks, Mel. > > -- > Mel Gorman > SUSE Labs -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 18:23 ` Andrea Arcangeli 2011-06-02 20:21 ` Minchan Kim @ 2011-06-02 23:02 ` Minchan Kim 1 sibling, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-02 23:02 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm On Fri, Jun 3, 2011 at 3:23 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Tue, May 31, 2011 at 11:38:30PM +0900, Minchan Kim wrote: >> > Yes. You find a new BUG. >> > It seems to be related to this problem but it should be solved although >> >> typo : It doesn't seem to be. > > This should fix it, but I doubt it matters for this problem. > > === > Subject: mm: no page_count without a page pin > > From: Andrea Arcangeli <aarcange@redhat.com> > > It's unsafe to run page_count during the physical pfn scan. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Nitpick : I want to remain " /* the page is freed already. */" comment. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 14:14 ` Andrea Arcangeli 2011-05-31 14:37 ` Minchan Kim @ 2011-06-01 0:57 ` Mel Gorman 2011-06-01 9:24 ` Mel Gorman 2011-06-01 17:58 ` Mel Gorman 1 sibling, 2 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-01 0:57 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 04:14:02PM +0200, Andrea Arcangeli wrote: > On Tue, May 31, 2011 at 10:33:40PM +0900, Minchan Kim wrote: > > I checked them before sending patch but I got failed to find strange things. :( > > My review also doesn't show other bugs in migrate_pages callers like > that one. > > > Now I am checking the page's SwapBacked flag can be changed > > between before and after of migrate_pages so accounting of NR_ISOLATED_XX can > > make mistake. I am approaching the failure, too. Hmm. > > When I checked that, I noticed the ClearPageSwapBacked in swapcache if > radix insertion fails, but that happens before adding the page in the > LRU so it shouldn't have a chance to be isolated. > After hammering three machines for several hours, I managed to trigger this once on x86 !CONFIG_SMP CONFIG_PREEMPT HIGHMEM4G (so no PAE) and caught the following trace. May 31 23:45:37 arnold kernel: WARNING: at include/linux/vmstat.h:167 compact_zone+0xf8/0x53c() May 31 23:45:37 arnold kernel: Hardware name: May 31 23:45:37 arnold kernel: Modules linked in: 3c59x mii sr_mod forcedeth cdrom ext4 mbcache jbd2 crc16 sd_mod ata_generic pata_amd sata_nv libata scsi_mod May 31 23:45:37 arnold kernel: Pid: 16172, comm: usemem Not tainted 2.6.38.4-autobuild #17 May 31 23:45:37 arnold kernel: Call Trace: May 31 23:45:37 arnold kernel: [<c10277f5>] ? warn_slowpath_common+0x65/0x7a May 31 23:45:37 arnold kernel: [<c1098b12>] ? compact_zone+0xf8/0x53c May 31 23:45:37 arnold kernel: [<c1027819>] ? warn_slowpath_null+0xf/0x13 May 31 23:45:37 arnold kernel: [<c1098b12>] ? compact_zone+0xf8/0x53c May 31 23:45:37 arnold kernel: [<c1098fe3>] ? compact_zone_order+0x8d/0x95 May 31 23:45:37 arnold kernel: [<c1099068>] ? try_to_compact_pages+0x7d/0xc8 May 31 23:45:37 arnold kernel: [<c107ba56>] ? __alloc_pages_direct_compact+0x71/0x102 May 31 23:45:37 arnold kernel: [<c107be15>] ? __alloc_pages_nodemask+0x32e/0x57d May 31 23:45:37 arnold kernel: [<c10914a6>] ? anon_vma_prepare+0x13/0x109 May 31 23:45:37 arnold kernel: [<c109fb01>] ? do_huge_pmd_anonymous_page+0xc9/0x285 May 31 23:45:37 arnold kernel: [<c1018f6a>] ? do_page_fault+0x0/0x346 May 31 23:45:37 arnold kernel: [<c108cb5e>] ? handle_mm_fault+0x7b/0x13a May 31 23:45:37 arnold kernel: [<c1018f6a>] ? do_page_fault+0x0/0x346 May 31 23:45:37 arnold kernel: [<c1019298>] ? do_page_fault+0x32e/0x346 May 31 23:45:37 arnold kernel: [<c104a234>] ? trace_hardirqs_off+0xb/0xd May 31 23:45:37 arnold kernel: [<c100482c>] ? do_softirq+0x9f/0xb5 May 31 23:45:37 arnold kernel: [<c12a5dee>] ? restore_all_notrace+0x0/0x18 May 31 23:45:37 arnold kernel: [<c1018f6a>] ? do_page_fault+0x0/0x346 May 31 23:45:37 arnold kernel: [<c104e91e>] ? trace_hardirqs_on_caller+0xfd/0x11e May 31 23:45:37 arnold kernel: [<c1018f6a>] ? do_page_fault+0x0/0x346 May 31 23:45:37 arnold kernel: [<c12a61d9>] ? error_code+0x5d/0x64 May 31 23:45:37 arnold kernel: [<c1018f6a>] ? do_page_fault+0x0/0x346 This is triggering in compactions too_many_isolated() where the NR_ISOLATED_FILE counter has gone negative so the damage was already done. Most likely, the damage was caused when compaction called putback_lru_pages() on pages that failed the migration that were accounted as isolated anon during isolation and putback as isolated file magically. It's almost 2am so I'm wiped but the first thing in the morning I want to check is if http://lkml.org/lkml/2010/8/26/32 is relevant. Specifically, if during transparent hugepage collapsing or splitting we are not protected by the anon_vma lock allowing an imbalance to occur while calling release_pte_pages(). This seems a bit far-reached though as I'd think at least the anon counter would be corrupted by that. A related possibility is that if the wrong anon_vma is being locked then there is a race between collapse_huge_page and when migration drops to 0 allowing release_pte_pages() to miss the page entirely. Again, wrong counter being corrupted you'd think. Another possibility is that because this is !PAE that the !SMP version of native_pmdp_get_and_clear is somehow insufficient although I can't think how it might be - unless the lack of a barrier with preemption enabled is somehow a factor. Again, it's reaching because one would expect the anon counter to get messed up, not the file one. I can't formulate a theory as to how PageSwapBacked gets cleared during migration that would cause compaction's putback_lru_pages to decrement the wrong counter. Maybe sleep will figure it out :( -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-01 0:57 ` Mel Gorman @ 2011-06-01 9:24 ` Mel Gorman 2011-06-01 17:58 ` Mel Gorman 1 sibling, 0 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-01 9:24 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Wed, Jun 01, 2011 at 01:57:47AM +0100, Mel Gorman wrote: > It's almost 2am so I'm wiped but the first thing in the morning > I want to check is if http://lkml.org/lkml/2010/8/26/32 is > relevant. It's not. The patch I really meant was https://lkml.org/lkml/2011/5/28/155 and it's irrelevant to 2.6.38.4 which is already doing the right thing of rechecking page->mapping under lock. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-01 0:57 ` Mel Gorman 2011-06-01 9:24 ` Mel Gorman @ 2011-06-01 17:58 ` Mel Gorman 2011-06-01 19:15 ` Andrea Arcangeli 1 sibling, 1 reply; 63+ messages in thread From: Mel Gorman @ 2011-06-01 17:58 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Wed, Jun 01, 2011 at 01:57:47AM +0100, Mel Gorman wrote: > On Tue, May 31, 2011 at 04:14:02PM +0200, Andrea Arcangeli wrote: > > On Tue, May 31, 2011 at 10:33:40PM +0900, Minchan Kim wrote: > > > I checked them before sending patch but I got failed to find strange things. :( > > > > My review also doesn't show other bugs in migrate_pages callers like > > that one. > > > > > Now I am checking the page's SwapBacked flag can be changed > > > between before and after of migrate_pages so accounting of NR_ISOLATED_XX can > > > make mistake. I am approaching the failure, too. Hmm. > > > > When I checked that, I noticed the ClearPageSwapBacked in swapcache if > > radix insertion fails, but that happens before adding the page in the > > LRU so it shouldn't have a chance to be isolated. > > > > After hammering three machines for several hours, I managed to trigger > this once on x86 !CONFIG_SMP CONFIG_PREEMPT HIGHMEM4G (so no PAE) > and caught the following trace. > Umm, HIGHMEM4G implies a two-level pagetable layout so where are things like _PAGE_BIT_SPLITTING being set when THP is enabled? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-01 17:58 ` Mel Gorman @ 2011-06-01 19:15 ` Andrea Arcangeli 2011-06-01 21:40 ` Mel Gorman 0 siblings, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-01 19:15 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Wed, Jun 01, 2011 at 06:58:09PM +0100, Mel Gorman wrote: > Umm, HIGHMEM4G implies a two-level pagetable layout so where are > things like _PAGE_BIT_SPLITTING being set when THP is enabled? They should be set on the pgd, pud_offset/pgd_offset will just bypass. The splitting bit shouldn't be special about it, the present bit should work the same. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-01 19:15 ` Andrea Arcangeli @ 2011-06-01 21:40 ` Mel Gorman 2011-06-01 23:30 ` Andrea Arcangeli 0 siblings, 1 reply; 63+ messages in thread From: Mel Gorman @ 2011-06-01 21:40 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Wed, Jun 01, 2011 at 09:15:29PM +0200, Andrea Arcangeli wrote: > On Wed, Jun 01, 2011 at 06:58:09PM +0100, Mel Gorman wrote: > > Umm, HIGHMEM4G implies a two-level pagetable layout so where are > > things like _PAGE_BIT_SPLITTING being set when THP is enabled? > > They should be set on the pgd, pud_offset/pgd_offset will just bypass. > The splitting bit shouldn't be special about it, the present bit > should work the same. This comment is misleading at best then. #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ At the PGD level, it can have PSE set obviously but it's not a PMD. I confess I haven't checked the manual to see if it's safe to use _PAGE_BIT_UNUSED1 like this so am taking your word for it. I found that the bug is far harder to reproduce with 3 pagetable levels than with 2 but that is just timing. So far it has proven impossible on x86-64 at least within 27 hours so that has me looking at how pagetable management between x86 and x86-64 differ. Barriers are a big different between how 32-bit !SMP and X86-64 but don't know yet which one is relevant or if this is even the right direction. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-01 21:40 ` Mel Gorman @ 2011-06-01 23:30 ` Andrea Arcangeli 2011-06-02 1:03 ` Mel Gorman 0 siblings, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-01 23:30 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable Hi Mel, On Wed, Jun 01, 2011 at 10:40:18PM +0100, Mel Gorman wrote: > On Wed, Jun 01, 2011 at 09:15:29PM +0200, Andrea Arcangeli wrote: > > On Wed, Jun 01, 2011 at 06:58:09PM +0100, Mel Gorman wrote: > > > Umm, HIGHMEM4G implies a two-level pagetable layout so where are > > > things like _PAGE_BIT_SPLITTING being set when THP is enabled? > > > > They should be set on the pgd, pud_offset/pgd_offset will just bypass. > > The splitting bit shouldn't be special about it, the present bit > > should work the same. > > This comment is misleading at best then. > > #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ >From common code point of view it's set in the pmd, the comment can be extended to specify it's actually the pgd in case of 32bit noPAE but I didn't think it was too misleading as we think in common code terms all over the code, the fact it's a bypass is pretty clear across the whole archs. > At the PGD level, it can have PSE set obviously but it's not a > PMD. I confess I haven't checked the manual to see if it's safe to > use _PAGE_BIT_UNUSED1 like this so am taking your word for it. I To be sure I re-checked on 253668.pdf page 113/114 noPAE and page 122 PAE, on x86 32bit/64 all ptes/pmd/pgd (32bit/64bit PAE/noPAE) have bit 9-11 "Avail" to software. So I think we should be safe here. > found that the bug is far harder to reproduce with 3 pagetable levels > than with 2 but that is just timing. So far it has proven impossible > on x86-64 at least within 27 hours so that has me looking at how > pagetable management between x86 and x86-64 differ. Weird. However I could see it screwing the nr_inactive/active_* stats, but the nr_isolated should never go below zero, and especially not anon even if split_huge_page does the accounting wrong (and migrate/compaction won't mess with THP), or at least I'd expect things to fall apart in other ways and not with just a fairly innocuous and not-memory corrupting nr_isolated_ counter going off just by one. The khugepaged nr_isolated_anon increment couldn't affect the file one and we hold mmap_sem write mode there to prevent the pte to change from under us, in addition to the PT and anon_vma lock. Anon_vma lock being wrong sounds unlikely too, and even if it was it should screw the nr_isolated_anon counter, impossible to screw the nr_isolated_file with khugepaged. Where did you put your bugcheck? It looked like you put it in the < 0 reader, can you add it to all _inc/dec/mod (even _inc just in case) so we may get a stack trace including the culprit? (not guaranteed but better chance) > Barriers are a big different between how 32-bit !SMP and X86-64 but > don't know yet which one is relevant or if this is even the right > direction. The difference is we need xchg on SMP to avoid losing the dirty bit. Otherwise if we do pmd_t pmd = *pmdp; *pmdp = 0; the dirty bit may have been set in between the two by another thread running in userland in a different CPU, while the pmd was still "present". As long as interrupts don't write to read-write userland memory with the pte dirty bit clear, we shouldn't need xchg on !SMP. On PAE we also need to write 0 into pmd_low before worrying about pmd_high so the present bit is cleared before clearing the high part of the 32bit PAE pte, and we relay on xchg implicit lock to avoid a smp_wmb() in between the two writes. I'm unsure if any of this could be relevant to our problem, also there can't be more than one writer at once in the pmd, as nobody can modify it without the page_table_lock held. xchg there is just to be safe for the dirty bit (or we'd corrupt memory with threads running in userland and writing to memory on other cpus while we ptep_clear_flush). I've been wondering about the lack of "lock" on the bus in atomic.h too, but I can't see how it can possibly matter on !SMP, vmstat modifications should execute only 1 asm insn so preempt or irq can't interrupt it. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-01 23:30 ` Andrea Arcangeli @ 2011-06-02 1:03 ` Mel Gorman 2011-06-02 8:34 ` Minchan Kim 2011-06-02 13:29 ` Andrea Arcangeli 0 siblings, 2 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-02 1:03 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Thu, Jun 02, 2011 at 01:30:36AM +0200, Andrea Arcangeli wrote: > Hi Mel, > > On Wed, Jun 01, 2011 at 10:40:18PM +0100, Mel Gorman wrote: > > On Wed, Jun 01, 2011 at 09:15:29PM +0200, Andrea Arcangeli wrote: > > > On Wed, Jun 01, 2011 at 06:58:09PM +0100, Mel Gorman wrote: > > > > Umm, HIGHMEM4G implies a two-level pagetable layout so where are > > > > things like _PAGE_BIT_SPLITTING being set when THP is enabled? > > > > > > They should be set on the pgd, pud_offset/pgd_offset will just bypass. > > > The splitting bit shouldn't be special about it, the present bit > > > should work the same. > > > > This comment is misleading at best then. > > > > #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ > > From common code point of view it's set in the pmd, the comment can be > extended to specify it's actually the pgd in case of 32bit noPAE but I > didn't think it was too misleading as we think in common code terms > all over the code, the fact it's a bypass is pretty clear across the > whole archs. > Fair point. > > At the PGD level, it can have PSE set obviously but it's not a > > PMD. I confess I haven't checked the manual to see if it's safe to > > use _PAGE_BIT_UNUSED1 like this so am taking your word for it. I > > To be sure I re-checked on 253668.pdf page 113/114 noPAE and page 122 > PAE, on x86 32bit/64 all ptes/pmd/pgd (32bit/64bit PAE/noPAE) have bit > 9-11 "Avail" to software. So I think we should be safe here. > Good stuff. I was reasonably sure this was the case but as this was already "impossible", it needed to be ruled out. > > found that the bug is far harder to reproduce with 3 pagetable levels > > than with 2 but that is just timing. So far it has proven impossible > > on x86-64 at least within 27 hours so that has me looking at how > > pagetable management between x86 and x86-64 differ. > > Weird. > > However I could see it screwing the nr_inactive/active_* stats, but > the nr_isolated should never go below zero, and especially not anon > even if split_huge_page does the accounting wrong (and > migrate/compaction won't mess with THP), or at least I'd expect things > to fall apart in other ways and not with just a fairly innocuous and > not-memory corrupting nr_isolated_ counter going off just by one. > Again, agreed. I found it hard to come up with a reason why file would get messed up particularly as PageSwapBacked does not get cleared in the ordinary case until the page is freed. If we were using pages after being freed due to bad refcounting, it would show up in all sorts of bad ways. > The khugepaged nr_isolated_anon increment couldn't affect the file one > and we hold mmap_sem write mode there to prevent the pte to change > from under us, in addition to the PT and anon_vma lock. Anon_vma lock > being wrong sounds unlikely too, and even if it was it should screw > the nr_isolated_anon counter, impossible to screw the nr_isolated_file > with khugepaged. > After reviewing, I still could not find a problem with the locking that might explain this. I thought last night anon_vma might be bust in some way but today I couldn't find a problem. > Where did you put your bugcheck? It looked like you put it in the < 0 > reader, can you add it to all _inc/dec/mod (even _inc just in case) so > we may get a stack trace including the culprit? (not guaranteed but > better chance) > Did that, didn't really help other than showing the corruption happens early in the process lifetime while huge PMDs are being faulted. This made me think the problem might be on or near fork. > > Barriers are a big different between how 32-bit !SMP and X86-64 but > > don't know yet which one is relevant or if this is even the right > > direction. > > The difference is we need xchg on SMP to avoid losing the dirty > bit. Otherwise if we do pmd_t pmd = *pmdp; *pmdp = 0; the dirty bit > may have been set in between the two by another thread running in > userland in a different CPU, while the pmd was still "present". As > long as interrupts don't write to read-write userland memory with the > pte dirty bit clear, we shouldn't need xchg on !SMP. > Yep. > On PAE we also need to write 0 into pmd_low before worrying about > pmd_high so the present bit is cleared before clearing the high part > of the 32bit PAE pte, and we relay on xchg implicit lock to avoid a > smp_wmb() in between the two writes. > Yep. > I'm unsure if any of this could be relevant to our problem, also there I concluded after a while that it wasn't. Partially from reasoning about it and part by testing forcing the use of the SMP versions and finding the bug was still reproducible. > can't be more than one writer at once in the pmd, as nobody can modify > it without the page_table_lock held. xchg there is just to be safe for > the dirty bit (or we'd corrupt memory with threads running in userland > and writing to memory on other cpus while we ptep_clear_flush). > > I've been wondering about the lack of "lock" on the bus in atomic.h > too, but I can't see how it can possibly matter on !SMP, vmstat > modifications should execute only 1 asm insn so preempt or irq can't > interrupt it. To be honest, I haven't fully figured out yet why it makes such a difference on !SMP. I have a vague notion that it's because the page table page and the data is visible before the bit set by SetPageSwapBacked on the struct page is visible but haven't reasoned it out yet. If this was the case, it might allow an "anon" page to be treated as a file by compaction for accounting purposes and push the counter negative but you'd think then the anon isolation would be positive so it's something else. As I thought fork() be an issue, I looked closer at what we do there. We are calling pmd_alloc at copy_pmd_range which is a no-op when PMD is folded and copy_huge_pmd() is calling pte_alloc_one() which also has no barrier. I haven't checked this fully (it's very late again as I wasn't able to work on this during most of the day) but I wonder if it's then possible the PMD setup is not visible before insertion into the page table leading to weirdness? Why it matters to SMP is unclear unless this is a preemption thing I'm not thinking of. On a similar vein, during collapse_huge_page(), we use a barrier to ensure the data copy is visible before the PMD insertion but in __do_huge_pmd_anonymous_page(), we assume the "spinlocking to take the lru_lock inside page_add_new_anon_rmap() acts as a full memory". Thing is, it's calling lru_cache_add_lru() adding the page to a pagevec which is not necessarily taking the LRU lock and !SMP is leaving a big enough race before the pagevec gets drained to cause a problem. Of course, maybe it *is* happening on SMP but the negative counters are being reported as zero :) To see if this is along the right lines, I'm currently testing with this patch against 2.6.38.4. It hasn't blown up in 35 minutes which is an improvement over getting into trouble after 5 so I'll leave it running overnight and see can I convince myself of what is going on tomorrow. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2d29c9a..65fa251 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -631,12 +631,14 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, entry = mk_pmd(page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); entry = pmd_mkhuge(entry); + /* - * The spinlocking to take the lru_lock inside - * page_add_new_anon_rmap() acts as a full memory - * barrier to be sure clear_huge_page writes become - * visible after the set_pmd_at() write. + * Need a write barrier to ensure the writes from + * clear_huge_page become visible before the + * set_pmd_at */ + smp_wmb(); + page_add_new_anon_rmap(page, vma, haddr); set_pmd_at(mm, haddr, pmd, entry); prepare_pmd_huge_pte(pgtable, mm); @@ -753,6 +755,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmdp_set_wrprotect(src_mm, addr, src_pmd); pmd = pmd_mkold(pmd_wrprotect(pmd)); + + /* + * Write barrier to make sure the setup for the PMD is fully visible + * before the set_pmd_at + */ + smp_wmb(); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); prepare_pmd_huge_pte(pgtable, dst_mm); ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 1:03 ` Mel Gorman @ 2011-06-02 8:34 ` Minchan Kim 2011-06-02 13:29 ` Andrea Arcangeli 1 sibling, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-02 8:34 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Thu, Jun 2, 2011 at 10:03 AM, Mel Gorman <mgorman@suse.de> wrote: > On Thu, Jun 02, 2011 at 01:30:36AM +0200, Andrea Arcangeli wrote: >> Hi Mel, >> >> On Wed, Jun 01, 2011 at 10:40:18PM +0100, Mel Gorman wrote: >> > On Wed, Jun 01, 2011 at 09:15:29PM +0200, Andrea Arcangeli wrote: >> > > On Wed, Jun 01, 2011 at 06:58:09PM +0100, Mel Gorman wrote: >> > > > Umm, HIGHMEM4G implies a two-level pagetable layout so where are >> > > > things like _PAGE_BIT_SPLITTING being set when THP is enabled? >> > > >> > > They should be set on the pgd, pud_offset/pgd_offset will just bypass. >> > > The splitting bit shouldn't be special about it, the present bit >> > > should work the same. >> > >> > This comment is misleading at best then. >> > >> > #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ >> >> From common code point of view it's set in the pmd, the comment can be >> extended to specify it's actually the pgd in case of 32bit noPAE but I >> didn't think it was too misleading as we think in common code terms >> all over the code, the fact it's a bypass is pretty clear across the >> whole archs. >> > > Fair point. > >> > At the PGD level, it can have PSE set obviously but it's not a >> > PMD. I confess I haven't checked the manual to see if it's safe to >> > use _PAGE_BIT_UNUSED1 like this so am taking your word for it. I >> >> To be sure I re-checked on 253668.pdf page 113/114 noPAE and page 122 >> PAE, on x86 32bit/64 all ptes/pmd/pgd (32bit/64bit PAE/noPAE) have bit >> 9-11 "Avail" to software. So I think we should be safe here. >> > > Good stuff. I was reasonably sure this was the case but as this was > already "impossible", it needed to be ruled out. > >> > found that the bug is far harder to reproduce with 3 pagetable levels >> > than with 2 but that is just timing. So far it has proven impossible >> > on x86-64 at least within 27 hours so that has me looking at how >> > pagetable management between x86 and x86-64 differ. >> >> Weird. >> >> However I could see it screwing the nr_inactive/active_* stats, but >> the nr_isolated should never go below zero, and especially not anon >> even if split_huge_page does the accounting wrong (and >> migrate/compaction won't mess with THP), or at least I'd expect things >> to fall apart in other ways and not with just a fairly innocuous and >> not-memory corrupting nr_isolated_ counter going off just by one. >> > > Again, agreed. I found it hard to come up with a reason why file would > get messed up particularly as PageSwapBacked does not get cleared in the > ordinary case until the page is freed. If we were using pages after > being freed due to bad refcounting, it would show up in all sorts of bad > ways. > >> The khugepaged nr_isolated_anon increment couldn't affect the file one >> and we hold mmap_sem write mode there to prevent the pte to change >> from under us, in addition to the PT and anon_vma lock. Anon_vma lock >> being wrong sounds unlikely too, and even if it was it should screw >> the nr_isolated_anon counter, impossible to screw the nr_isolated_file >> with khugepaged. >> > > After reviewing, I still could not find a problem with the locking that > might explain this. I thought last night anon_vma might be bust in some > way but today I couldn't find a problem. > >> Where did you put your bugcheck? It looked like you put it in the < 0 >> reader, can you add it to all _inc/dec/mod (even _inc just in case) so >> we may get a stack trace including the culprit? (not guaranteed but >> better chance) >> > > Did that, didn't really help other than showing the corruption happens > early in the process lifetime while huge PMDs are being faulted. This > made me think the problem might be on or near fork. > >> > Barriers are a big different between how 32-bit !SMP and X86-64 but >> > don't know yet which one is relevant or if this is even the right >> > direction. >> >> The difference is we need xchg on SMP to avoid losing the dirty >> bit. Otherwise if we do pmd_t pmd = *pmdp; *pmdp = 0; the dirty bit >> may have been set in between the two by another thread running in >> userland in a different CPU, while the pmd was still "present". As >> long as interrupts don't write to read-write userland memory with the >> pte dirty bit clear, we shouldn't need xchg on !SMP. >> > > Yep. > >> On PAE we also need to write 0 into pmd_low before worrying about >> pmd_high so the present bit is cleared before clearing the high part >> of the 32bit PAE pte, and we relay on xchg implicit lock to avoid a >> smp_wmb() in between the two writes. >> > > Yep. > >> I'm unsure if any of this could be relevant to our problem, also there > > I concluded after a while that it wasn't. Partially from reasoning about > it and part by testing forcing the use of the SMP versions and finding > the bug was still reproducible. > >> can't be more than one writer at once in the pmd, as nobody can modify >> it without the page_table_lock held. xchg there is just to be safe for >> the dirty bit (or we'd corrupt memory with threads running in userland >> and writing to memory on other cpus while we ptep_clear_flush). >> >> I've been wondering about the lack of "lock" on the bus in atomic.h >> too, but I can't see how it can possibly matter on !SMP, vmstat >> modifications should execute only 1 asm insn so preempt or irq can't >> interrupt it. > > To be honest, I haven't fully figured out yet why it makes such > a difference on !SMP. I have a vague notion that it's because > the page table page and the data is visible before the bit set by > SetPageSwapBacked on the struct page is visible but haven't reasoned > it out yet. If this was the case, it might allow an "anon" page to > be treated as a file by compaction for accounting purposes and push > the counter negative but you'd think then the anon isolation would > be positive so it's something else. > > As I thought fork() be an issue, I looked closer at what we do > there. We are calling pmd_alloc at copy_pmd_range which is a no-op > when PMD is folded and copy_huge_pmd() is calling pte_alloc_one() > which also has no barrier. I haven't checked this fully (it's very > late again as I wasn't able to work on this during most of the day) > but I wonder if it's then possible the PMD setup is not visible before > insertion into the page table leading to weirdness? Why it matters to > SMP is unclear unless this is a preemption thing I'm not thinking of. > > On a similar vein, during collapse_huge_page(), we use a barrier > to ensure the data copy is visible before the PMD insertion but in > __do_huge_pmd_anonymous_page(), we assume the "spinlocking to take the > lru_lock inside page_add_new_anon_rmap() acts as a full memory". Thing > is, it's calling lru_cache_add_lru() adding the page to a pagevec > which is not necessarily taking the LRU lock and !SMP is leaving a > big enough race before the pagevec gets drained to cause a problem. > Of course, maybe it *is* happening on SMP but the negative counters > are being reported as zero :) Yes. although we have atomic_inc of in get_page, it doesn't imply full memory barrier. So we need explicit memory barrier. I think you're right. But I can't think of that it's related to this problem(UP, preemption). -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 1:03 ` Mel Gorman 2011-06-02 8:34 ` Minchan Kim @ 2011-06-02 13:29 ` Andrea Arcangeli 2011-06-02 14:50 ` Mel Gorman 1 sibling, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-02 13:29 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Thu, Jun 02, 2011 at 02:03:52AM +0100, Mel Gorman wrote: > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2d29c9a..65fa251 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -631,12 +631,14 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > entry = mk_pmd(page, vma->vm_page_prot); > entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > entry = pmd_mkhuge(entry); > + > /* > - * The spinlocking to take the lru_lock inside > - * page_add_new_anon_rmap() acts as a full memory > - * barrier to be sure clear_huge_page writes become > - * visible after the set_pmd_at() write. > + * Need a write barrier to ensure the writes from > + * clear_huge_page become visible before the > + * set_pmd_at > */ > + smp_wmb(); > + On x86 at least this is noop because of the spin_lock(&page_table_lock) after clear_huge_page. But I'm not against adding this in case other archs supports THP later. But smp_wmb() is optimized away at build time by cpp so this can't possibly help if you're reproducing !SMP. > page_add_new_anon_rmap(page, vma, haddr); > set_pmd_at(mm, haddr, pmd, entry); > prepare_pmd_huge_pte(pgtable, mm); > @@ -753,6 +755,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pmdp_set_wrprotect(src_mm, addr, src_pmd); > pmd = pmd_mkold(pmd_wrprotect(pmd)); > + > + /* > + * Write barrier to make sure the setup for the PMD is fully visible > + * before the set_pmd_at > + */ > + smp_wmb(); > + > set_pmd_at(dst_mm, addr, dst_pmd, pmd); > prepare_pmd_huge_pte(pgtable, dst_mm); This part seems superfluous to me, it's also noop for !SMP. Only wmb() would stay. the pmd is perfectly fine to stay in a register, not even a compiler barrier is needed, even less a smp serialization. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 13:29 ` Andrea Arcangeli @ 2011-06-02 14:50 ` Mel Gorman 2011-06-02 15:37 ` Andrea Arcangeli 0 siblings, 1 reply; 63+ messages in thread From: Mel Gorman @ 2011-06-02 14:50 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Thu, Jun 02, 2011 at 03:29:54PM +0200, Andrea Arcangeli wrote: > On Thu, Jun 02, 2011 at 02:03:52AM +0100, Mel Gorman wrote: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 2d29c9a..65fa251 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -631,12 +631,14 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > > entry = mk_pmd(page, vma->vm_page_prot); > > entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > > entry = pmd_mkhuge(entry); > > + > > /* > > - * The spinlocking to take the lru_lock inside > > - * page_add_new_anon_rmap() acts as a full memory > > - * barrier to be sure clear_huge_page writes become > > - * visible after the set_pmd_at() write. > > + * Need a write barrier to ensure the writes from > > + * clear_huge_page become visible before the > > + * set_pmd_at > > */ > > + smp_wmb(); > > + > > On x86 at least this is noop because of the > spin_lock(&page_table_lock) after clear_huge_page. But I'm not against > adding this in case other archs supports THP later. > I thought spin lock acquisition was one-way where loads/stores preceeding the lock are allowed to leak into the protected region but not the other way around? So we have clear_huge_page() __SetPageUptodate(page); spin_lock(&mm->page_table_lock); ... set_pmd_at(mm, haddr, pmd, entry); This spinlock itself does not guarantee that writes from clear_huge_page are complete before that set_pmd_at(). Whether this is right or wrong, why is the same not true in collapse_huge_page()? There we are __collapse_huge_page_copy(pte, new_page, vma, address, ptl); .... smp_wmb(); spin_lock(&mm->page_table_lock); ... set_pmd_at(mm, address, pmd, _pmd); with the comment stressing that this is necessary. > But smp_wmb() is optimized away at build time by cpp so this can't > possibly help if you're reproducing !SMP. > On X86 !SMP, this is still a barrier() which on gcc is #define barrier() __asm__ __volatile__("": : :"memory") so it's a compiler barrier. I'm not working on this at this at the moment but when I get to it, I'll compare the object files and see if there are relevant differences. Could be tomorrow before I get the chance again. > > page_add_new_anon_rmap(page, vma, haddr); > > set_pmd_at(mm, haddr, pmd, entry); > > prepare_pmd_huge_pte(pgtable, mm); > > @@ -753,6 +755,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > > > pmdp_set_wrprotect(src_mm, addr, src_pmd); > > pmd = pmd_mkold(pmd_wrprotect(pmd)); > > + > > + /* > > + * Write barrier to make sure the setup for the PMD is fully visible > > + * before the set_pmd_at > > + */ > > + smp_wmb(); > > + > > set_pmd_at(dst_mm, addr, dst_pmd, pmd); > > prepare_pmd_huge_pte(pgtable, dst_mm); > > This part seems superfluous to me, it's also noop for !SMP. Other than being a compiler barrier. > Only wmb() > would stay. the pmd is perfectly fine to stay in a register, not even > a compiler barrier is needed, even less a smp serialization. There is an explanation in here somewhere because as I write this, the test machine has survived 14 hours under continual stress without the isolated counters going negative with over 128 million pages successfully migrated and a million pages failed to migrate due to direct compaction being called 80,000 times. It's possible it's a co-incidence but it's some co-incidence! -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 14:50 ` Mel Gorman @ 2011-06-02 15:37 ` Andrea Arcangeli 2011-06-03 2:09 ` Mel Gorman 0 siblings, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-02 15:37 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Thu, Jun 02, 2011 at 03:50:19PM +0100, Mel Gorman wrote: > I thought spin lock acquisition was one-way where loads/stores > preceeding the lock are allowed to leak into the protected region > but not the other way around? That's true for ia64, x86 not AFIK. > So we have > > clear_huge_page() > __SetPageUptodate(page); > spin_lock(&mm->page_table_lock); > ... > set_pmd_at(mm, haddr, pmd, entry); > > This spinlock itself does not guarantee that writes from > clear_huge_page are complete before that set_pmd_at(). It does on x86. > Whether this is right or wrong, why is the same not true in > collapse_huge_page()? There we are > > __collapse_huge_page_copy(pte, new_page, vma, address, ptl); > .... > smp_wmb(); > spin_lock(&mm->page_table_lock); > ... > set_pmd_at(mm, address, pmd, _pmd); > > with the comment stressing that this is necessary. So your first part of the patch is right, but it should be only a theoretical improvement. > > But smp_wmb() is optimized away at build time by cpp so this can't > > possibly help if you're reproducing !SMP. > > > > On X86 !SMP, this is still a barrier() which on gcc is > > #define barrier() __asm__ __volatile__("": : :"memory") > > so it's a compiler barrier. I'm not working on this at this at the > moment but when I get to it, I'll compare the object files and see > if there are relevant differences. Could be tomorrow before I get > the chance again. clear_huge_page called by do_huge_pmd_anonymous_page is an external function (not static so gcc can't make assumption) and that is a full equivalent to a barrier() after the function returns, so the only relevancy of a smp_wmb on x86 SMP or !SMP would be zero (unless X86_OOSTORE is set which I think is not, and that would only apply to SMP). > > > page_add_new_anon_rmap(page, vma, haddr); > > > set_pmd_at(mm, haddr, pmd, entry); > > > prepare_pmd_huge_pte(pgtable, mm); > > > @@ -753,6 +755,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > > > > > pmdp_set_wrprotect(src_mm, addr, src_pmd); > > > pmd = pmd_mkold(pmd_wrprotect(pmd)); > > > + > > > + /* > > > + * Write barrier to make sure the setup for the PMD is fully visible > > > + * before the set_pmd_at > > > + */ > > > + smp_wmb(); > > > + > > > set_pmd_at(dst_mm, addr, dst_pmd, pmd); > > > prepare_pmd_huge_pte(pgtable, dst_mm); > > > > This part seems superfluous to me, it's also noop for !SMP. > > Other than being a compiler barrier. Yes but my point is this is ok to be cached in registers, the pmd setup doesn't need to hit on main memory to be safe, it's local. pmdp_set_wrprotect is done with a clear_bit and the dependency of the code will require reading that after the clear_bit on !SMP. Not sure how can possibly a barrier() above can matter. > > Only wmb() > > would stay. the pmd is perfectly fine to stay in a register, not even > > a compiler barrier is needed, even less a smp serialization. > > There is an explanation in here somewhere because as I write this, > the test machine has survived 14 hours under continual stress without > the isolated counters going negative with over 128 million pages > successfully migrated and a million pages failed to migrate due to > direct compaction being called 80,000 times. It's possible it's a > co-incidence but it's some co-incidence! No idea... ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-02 15:37 ` Andrea Arcangeli @ 2011-06-03 2:09 ` Mel Gorman 2011-06-03 14:49 ` Mel Gorman 0 siblings, 1 reply; 63+ messages in thread From: Mel Gorman @ 2011-06-03 2:09 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote: > > There is an explanation in here somewhere because as I write this, > > the test machine has survived 14 hours under continual stress without > > the isolated counters going negative with over 128 million pages > > successfully migrated and a million pages failed to migrate due to > > direct compaction being called 80,000 times. It's possible it's a > > co-incidence but it's some co-incidence! > > No idea... I wasn't able to work on this most of the day but was looking at this closer this evening again and I think I might have thought of another theory that could cause this problem. When THP is isolating pages, it accounts for the pages isolated against the zone of course. If it backs out, it finds the pages from the PTEs. On !SMP but PREEMPT, we may not have adequate protection against a new page from a different zone being inserted into the PTE causing us to decrement against the wrong zone. While the global counter is fine, the per-zone counters look corrupted. You'd still think it was the anon counter tht got screwed rather than the file one if it really was THP unfortunately so it's not the full picture. I'm going to start a test monitoring both zoneinfo and vmstat to see if vmstat looks fine while the per-zone counters that are negative are offset by a positive count on the other zones that when added together become 0. Hopefully it'll actually trigger overnight :/ -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 2:09 ` Mel Gorman @ 2011-06-03 14:49 ` Mel Gorman 2011-06-03 15:45 ` Andrea Arcangeli 2011-06-04 6:58 ` Minchan Kim 0 siblings, 2 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-03 14:49 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Fri, Jun 03, 2011 at 03:09:20AM +0100, Mel Gorman wrote: > On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote: > > > There is an explanation in here somewhere because as I write this, > > > the test machine has survived 14 hours under continual stress without > > > the isolated counters going negative with over 128 million pages > > > successfully migrated and a million pages failed to migrate due to > > > direct compaction being called 80,000 times. It's possible it's a > > > co-incidence but it's some co-incidence! > > > > No idea... > > I wasn't able to work on this most of the day but was looking at this > closer this evening again and I think I might have thought of another > theory that could cause this problem. > > When THP is isolating pages, it accounts for the pages isolated against > the zone of course. If it backs out, it finds the pages from the PTEs. > On !SMP but PREEMPT, we may not have adequate protection against a new > page from a different zone being inserted into the PTE causing us to > decrement against the wrong zone. While the global counter is fine, > the per-zone counters look corrupted. You'd still think it was the > anon counter tht got screwed rather than the file one if it really was > THP unfortunately so it's not the full picture. I'm going to start > a test monitoring both zoneinfo and vmstat to see if vmstat looks > fine while the per-zone counters that are negative are offset by a > positive count on the other zones that when added together become 0. > Hopefully it'll actually trigger overnight :/ > Right idea of the wrong zone being accounted for but wrong place. I think the following patch should fix the problem; ==== CUT HERE === mm: compaction: Ensure that the compaction free scanner does not move to the next zone Compaction works with two scanners, a migration and a free scanner. When the scanners crossover, migration within the zone is complete. The location of the scanner is recorded on each cycle to avoid excesive scanning. When a zone is small and mostly reserved, it's very easy for the migration scanner to be close to the end of the zone. Then the following situation can occurs o migration scanner isolates some pages near the end of the zone o free scanner starts at the end of the zone but finds that the migration scanner is already there o free scanner gets reinitialised for the next cycle as cc->migrate_pfn + pageblock_nr_pages moving the free scanner into the next zone o migration scanner moves into the next zone but continues accounting against the old zone When this happens, NR_ISOLATED accounting goes haywire because some of the accounting happens against the wrong zone. One zones counter remains positive while the other goes negative even though the overall global count is accurate. This was reported on X86-32 with !SMP because !SMP allows the negative counters to be visible. The fact that it is difficult to reproduce on X86-64 is probably just a co-incidence as the bug should theoritically be possible there. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a4337bc..ec1ed3b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -144,9 +144,20 @@ static void isolate_freepages(struct zone *zone, int nr_freepages = cc->nr_freepages; struct list_head *freelist = &cc->freepages; + /* + * Initialise the free scanner. The starting point is where we last + * scanned from (or the end of the zone if starting). The low point + * is the end of the pageblock the migration scanner is using. + */ pfn = cc->free_pfn; low_pfn = cc->migrate_pfn + pageblock_nr_pages; - high_pfn = low_pfn; + + /* + * Take care that if the migration scanner is at the end of the zone + * that the free scanner does not accidentally move to the next zone + * in the next isolation cycle. + */ + high_pfn = min(low_pfn, pfn); /* * Isolate free pages until enough are available to migrate the ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 14:49 ` Mel Gorman @ 2011-06-03 15:45 ` Andrea Arcangeli 2011-06-04 7:25 ` Minchan Kim ` (2 more replies) 2011-06-04 6:58 ` Minchan Kim 1 sibling, 3 replies; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-03 15:45 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > Right idea of the wrong zone being accounted for but wrong place. I > think the following patch should fix the problem; Looks good thanks. I also found this bug during my debugging that made NR_SHMEM underflow. === Subject: migrate: don't account swapcache as shmem From: Andrea Arcangeli <aarcange@redhat.com> swapcache will reach the below code path in migrate_page_move_mapping, and swapcache is accounted as NR_FILE_PAGES but it's not accounted as NR_SHMEM. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- diff --git a/mm/migrate.c b/mm/migrate.c index e4a5c91..2597a27 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -288,7 +288,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, */ __dec_zone_page_state(page, NR_FILE_PAGES); __inc_zone_page_state(newpage, NR_FILE_PAGES); - if (PageSwapBacked(page)) { + if (mapping != &swapper_space && PageSwapBacked(page)) { __dec_zone_page_state(page, NR_SHMEM); __inc_zone_page_state(newpage, NR_SHMEM); } ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 15:45 ` Andrea Arcangeli @ 2011-06-04 7:25 ` Minchan Kim 2011-06-06 10:39 ` Mel Gorman 2011-06-06 22:32 ` Andrew Morton 2 siblings, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-04 7:25 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Fri, Jun 03, 2011 at 05:45:54PM +0200, Andrea Arcangeli wrote: > On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > > Right idea of the wrong zone being accounted for but wrong place. I > > think the following patch should fix the problem; > > Looks good thanks. > > I also found this bug during my debugging that made NR_SHMEM underflow. > > === > Subject: migrate: don't account swapcache as shmem > > From: Andrea Arcangeli <aarcange@redhat.com> > > swapcache will reach the below code path in migrate_page_move_mapping, > and swapcache is accounted as NR_FILE_PAGES but it's not accounted as > NR_SHMEM. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Nice catch! -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 15:45 ` Andrea Arcangeli 2011-06-04 7:25 ` Minchan Kim @ 2011-06-06 10:39 ` Mel Gorman 2011-06-06 12:38 ` Andrea Arcangeli 2011-06-06 14:19 ` Minchan Kim 2011-06-06 22:32 ` Andrew Morton 2 siblings, 2 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 10:39 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Fri, Jun 03, 2011 at 05:45:54PM +0200, Andrea Arcangeli wrote: > On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > > Right idea of the wrong zone being accounted for but wrong place. I > > think the following patch should fix the problem; > > Looks good thanks. > > I also found this bug during my debugging that made NR_SHMEM underflow. > > === > Subject: migrate: don't account swapcache as shmem > > From: Andrea Arcangeli <aarcange@redhat.com> > > swapcache will reach the below code path in migrate_page_move_mapping, > and swapcache is accounted as NR_FILE_PAGES but it's not accounted as > NR_SHMEM. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Well spotted. Acked-by: Mel Gorman <mgorman@suse.de> Minor nit. swapper_space is rarely referred to outside of the swap code. Might it be more readable to use /* * swapcache is accounted as NR_FILE_PAGES but it is not * accounted as NR_SHMEM * if (PageSwapBacked(page) && !PageSwapCache(page)) ? > --- > > diff --git a/mm/migrate.c b/mm/migrate.c > index e4a5c91..2597a27 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -288,7 +288,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, > */ > __dec_zone_page_state(page, NR_FILE_PAGES); > __inc_zone_page_state(newpage, NR_FILE_PAGES); > - if (PageSwapBacked(page)) { > + if (mapping != &swapper_space && PageSwapBacked(page)) { > __dec_zone_page_state(page, NR_SHMEM); > __inc_zone_page_state(newpage, NR_SHMEM); > } > > -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:39 ` Mel Gorman @ 2011-06-06 12:38 ` Andrea Arcangeli 2011-06-06 14:55 ` Mel Gorman 2011-06-06 14:19 ` Minchan Kim 1 sibling, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-06 12:38 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, Jun 06, 2011 at 11:39:24AM +0100, Mel Gorman wrote: > Well spotted. > > Acked-by: Mel Gorman <mgorman@suse.de> > > Minor nit. swapper_space is rarely referred to outside of the swap > code. Might it be more readable to use > > /* > * swapcache is accounted as NR_FILE_PAGES but it is not > * accounted as NR_SHMEM > * > if (PageSwapBacked(page) && !PageSwapCache(page)) I thought the comparison on swapper_space would be faster as it was immediate vs register in CPU, instead of forcing a memory access. Otherwise I would have used the above. Now the test_bit is written in C and lockless so it's not likely to be very different considering the cacheline is hot in the CPU but it's still referencing memory instead register vs immediate comparison. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 12:38 ` Andrea Arcangeli @ 2011-06-06 14:55 ` Mel Gorman 0 siblings, 0 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 14:55 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, Jun 06, 2011 at 02:38:51PM +0200, Andrea Arcangeli wrote: > On Mon, Jun 06, 2011 at 11:39:24AM +0100, Mel Gorman wrote: > > Well spotted. > > > > Acked-by: Mel Gorman <mgorman@suse.de> > > > > Minor nit. swapper_space is rarely referred to outside of the swap > > code. Might it be more readable to use > > > > /* > > * swapcache is accounted as NR_FILE_PAGES but it is not > > * accounted as NR_SHMEM > > * > > if (PageSwapBacked(page) && !PageSwapCache(page)) > > I thought the comparison on swapper_space would be faster as it was > immediate vs register in CPU, instead of forcing a memory > access. Otherwise I would have used the above. Now the test_bit is > written in C and lockless so it's not likely to be very different > considering the cacheline is hot in the CPU but it's still referencing > memory instead register vs immediate comparison. Ok, I had not considered that. That is a micro-optimisation but it's there. I thought my version is more readable and migration is not really a fast path but yours is still better. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:39 ` Mel Gorman 2011-06-06 12:38 ` Andrea Arcangeli @ 2011-06-06 14:19 ` Minchan Kim 1 sibling, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-06 14:19 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, Jun 06, 2011 at 11:39:24AM +0100, Mel Gorman wrote: > On Fri, Jun 03, 2011 at 05:45:54PM +0200, Andrea Arcangeli wrote: > > On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > > > Right idea of the wrong zone being accounted for but wrong place. I > > > think the following patch should fix the problem; > > > > Looks good thanks. > > > > I also found this bug during my debugging that made NR_SHMEM underflow. > > > > === > > Subject: migrate: don't account swapcache as shmem > > > > From: Andrea Arcangeli <aarcange@redhat.com> > > > > swapcache will reach the below code path in migrate_page_move_mapping, > > and swapcache is accounted as NR_FILE_PAGES but it's not accounted as > > NR_SHMEM. > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > Well spotted. > > Acked-by: Mel Gorman <mgorman@suse.de> > > Minor nit. swapper_space is rarely referred to outside of the swap > code. Might it be more readable to use > > /* > * swapcache is accounted as NR_FILE_PAGES but it is not > * accounted as NR_SHMEM > * > if (PageSwapBacked(page) && !PageSwapCache(page)) I like this. but as it's "and" operation, CPU have to execute two condition comparison. but how about below? if (!PageSwapCache(page) && PageSwapBacked(page)) PageSwapCache implys PageSwapBacked so we can handle non-swapbacked pages as just 1 comparison. > > ? > > > --- > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index e4a5c91..2597a27 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -288,7 +288,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, > > */ > > __dec_zone_page_state(page, NR_FILE_PAGES); > > __inc_zone_page_state(newpage, NR_FILE_PAGES); > > - if (PageSwapBacked(page)) { > > + if (mapping != &swapper_space && PageSwapBacked(page)) { > > __dec_zone_page_state(page, NR_SHMEM); > > __inc_zone_page_state(newpage, NR_SHMEM); > > } > > > > > > -- > Mel Gorman > SUSE Labs -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 15:45 ` Andrea Arcangeli 2011-06-04 7:25 ` Minchan Kim 2011-06-06 10:39 ` Mel Gorman @ 2011-06-06 22:32 ` Andrew Morton 2 siblings, 0 replies; 63+ messages in thread From: Andrew Morton @ 2011-06-06 22:32 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Minchan Kim, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Fri, 3 Jun 2011 17:45:54 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote: > On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > > Right idea of the wrong zone being accounted for but wrong place. I > > think the following patch should fix the problem; > > Looks good thanks. > > I also found this bug during my debugging that made NR_SHMEM underflow. > > === > Subject: migrate: don't account swapcache as shmem > > From: Andrea Arcangeli <aarcange@redhat.com> > > swapcache will reach the below code path in migrate_page_move_mapping, > and swapcache is accounted as NR_FILE_PAGES but it's not accounted as > NR_SHMEM. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > > diff --git a/mm/migrate.c b/mm/migrate.c > index e4a5c91..2597a27 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -288,7 +288,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, > */ > __dec_zone_page_state(page, NR_FILE_PAGES); > __inc_zone_page_state(newpage, NR_FILE_PAGES); > - if (PageSwapBacked(page)) { > + if (mapping != &swapper_space && PageSwapBacked(page)) { > __dec_zone_page_state(page, NR_SHMEM); > __inc_zone_page_state(newpage, NR_SHMEM); > } fyi, this was the only patch I applied from this whole thread. Once the dust has settled, could people please resend whatever they have, including any acked-by's and reviewed-by's? Thanks. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-03 14:49 ` Mel Gorman 2011-06-03 15:45 ` Andrea Arcangeli @ 2011-06-04 6:58 ` Minchan Kim 2011-06-06 10:43 ` Mel Gorman 1 sibling, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-06-04 6:58 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > On Fri, Jun 03, 2011 at 03:09:20AM +0100, Mel Gorman wrote: > > On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote: > > > > There is an explanation in here somewhere because as I write this, > > > > the test machine has survived 14 hours under continual stress without > > > > the isolated counters going negative with over 128 million pages > > > > successfully migrated and a million pages failed to migrate due to > > > > direct compaction being called 80,000 times. It's possible it's a > > > > co-incidence but it's some co-incidence! > > > > > > No idea... > > > > I wasn't able to work on this most of the day but was looking at this > > closer this evening again and I think I might have thought of another > > theory that could cause this problem. > > > > When THP is isolating pages, it accounts for the pages isolated against > > the zone of course. If it backs out, it finds the pages from the PTEs. > > On !SMP but PREEMPT, we may not have adequate protection against a new > > page from a different zone being inserted into the PTE causing us to > > decrement against the wrong zone. While the global counter is fine, > > the per-zone counters look corrupted. You'd still think it was the > > anon counter tht got screwed rather than the file one if it really was > > THP unfortunately so it's not the full picture. I'm going to start > > a test monitoring both zoneinfo and vmstat to see if vmstat looks > > fine while the per-zone counters that are negative are offset by a > > positive count on the other zones that when added together become 0. > > Hopefully it'll actually trigger overnight :/ > > > > Right idea of the wrong zone being accounted for but wrong place. I > think the following patch should fix the problem; > > ==== CUT HERE === > mm: compaction: Ensure that the compaction free scanner does not move to the next zone > > Compaction works with two scanners, a migration and a free > scanner. When the scanners crossover, migration within the zone is > complete. The location of the scanner is recorded on each cycle to > avoid excesive scanning. > > When a zone is small and mostly reserved, it's very easy for the > migration scanner to be close to the end of the zone. Then the following > situation can occurs > > o migration scanner isolates some pages near the end of the zone > o free scanner starts at the end of the zone but finds that the > migration scanner is already there > o free scanner gets reinitialised for the next cycle as > cc->migrate_pfn + pageblock_nr_pages > moving the free scanner into the next zone > o migration scanner moves into the next zone but continues accounting > against the old zone > > When this happens, NR_ISOLATED accounting goes haywire because some > of the accounting happens against the wrong zone. One zones counter > remains positive while the other goes negative even though the overall > global count is accurate. This was reported on X86-32 with !SMP because > !SMP allows the negative counters to be visible. The fact that it is > difficult to reproduce on X86-64 is probably just a co-incidence as I guess it's related to zone sizes. X86-64 has small DMA and large DMA32 zones for fallback of NORMAL while x86 has just a small DMA(16M) zone. I think DMA zone in x86 is easily full of non-LRU or non-movable pages. So isolate_migratepagse continues to scan for finding pages which are migratable and then it reaches near end of zone. > the bug should theoritically be possible there. > Finally, you found it. Congratulations on! > Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> When we are debugging this problem, we found a few of bugs and enhance points and submitted patches. It was a very good chance to fix Linux VM. Thanks, Mel. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-04 6:58 ` Minchan Kim @ 2011-06-06 10:43 ` Mel Gorman 2011-06-06 12:40 ` Andrea Arcangeli 2011-06-06 13:23 ` Minchan Kim 0 siblings, 2 replies; 63+ messages in thread From: Mel Gorman @ 2011-06-06 10:43 UTC (permalink / raw) To: Minchan Kim Cc: Andrea Arcangeli, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Sat, Jun 04, 2011 at 03:58:53PM +0900, Minchan Kim wrote: > On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > > On Fri, Jun 03, 2011 at 03:09:20AM +0100, Mel Gorman wrote: > > > On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote: > > > > > There is an explanation in here somewhere because as I write this, > > > > > the test machine has survived 14 hours under continual stress without > > > > > the isolated counters going negative with over 128 million pages > > > > > successfully migrated and a million pages failed to migrate due to > > > > > direct compaction being called 80,000 times. It's possible it's a > > > > > co-incidence but it's some co-incidence! > > > > > > > > No idea... > > > > > > I wasn't able to work on this most of the day but was looking at this > > > closer this evening again and I think I might have thought of another > > > theory that could cause this problem. > > > > > > When THP is isolating pages, it accounts for the pages isolated against > > > the zone of course. If it backs out, it finds the pages from the PTEs. > > > On !SMP but PREEMPT, we may not have adequate protection against a new > > > page from a different zone being inserted into the PTE causing us to > > > decrement against the wrong zone. While the global counter is fine, > > > the per-zone counters look corrupted. You'd still think it was the > > > anon counter tht got screwed rather than the file one if it really was > > > THP unfortunately so it's not the full picture. I'm going to start > > > a test monitoring both zoneinfo and vmstat to see if vmstat looks > > > fine while the per-zone counters that are negative are offset by a > > > positive count on the other zones that when added together become 0. > > > Hopefully it'll actually trigger overnight :/ > > > > > > > Right idea of the wrong zone being accounted for but wrong place. I > > think the following patch should fix the problem; > > > > ==== CUT HERE === > > mm: compaction: Ensure that the compaction free scanner does not move to the next zone > > > > Compaction works with two scanners, a migration and a free > > scanner. When the scanners crossover, migration within the zone is > > complete. The location of the scanner is recorded on each cycle to > > avoid excesive scanning. > > > > When a zone is small and mostly reserved, it's very easy for the > > migration scanner to be close to the end of the zone. Then the following > > situation can occurs > > > > o migration scanner isolates some pages near the end of the zone > > o free scanner starts at the end of the zone but finds that the > > migration scanner is already there > > o free scanner gets reinitialised for the next cycle as > > cc->migrate_pfn + pageblock_nr_pages > > moving the free scanner into the next zone > > o migration scanner moves into the next zone but continues accounting > > against the old zone > > > > When this happens, NR_ISOLATED accounting goes haywire because some > > of the accounting happens against the wrong zone. One zones counter > > remains positive while the other goes negative even though the overall > > global count is accurate. This was reported on X86-32 with !SMP because > > !SMP allows the negative counters to be visible. The fact that it is > > difficult to reproduce on X86-64 is probably just a co-incidence as > > I guess it's related to zone sizes. > X86-64 has small DMA and large DMA32 zones for fallback of NORMAL while > x86 has just a small DMA(16M) zone. > Yep, this is a possibility as well as the use of lowmem reserves. > I think DMA zone in x86 is easily full of non-LRU or non-movable pages. Maybe not full, but it has more PageReserved pages than anywhere else and few MIGRATE_MOVABLE blocks. MIGRATE_MOVABLE gets skipped during async compaction we could easily reach the end of the DMA zone quickly. > So isolate_migratepagse continues to scan for finding pages which are migratable > and then it reaches near end of zone. > > > the bug should theoritically be possible there. > > Finally, you found it. Congratulations on! > > Signed-off-by: Mel Gorman <mgorman@suse.de> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> > > When we are debugging this problem, we found a few of bugs and enhance points > and submitted patches. It was a very good chance to fix Linux VM. > Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:43 ` Mel Gorman @ 2011-06-06 12:40 ` Andrea Arcangeli 2011-06-06 13:27 ` Minchan Kim 2011-06-06 13:23 ` Minchan Kim 1 sibling, 1 reply; 63+ messages in thread From: Andrea Arcangeli @ 2011-06-06 12:40 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, Jun 06, 2011 at 11:43:45AM +0100, Mel Gorman wrote: > Maybe not full, but it has more PageReserved pages than anywhere else > and few MIGRATE_MOVABLE blocks. MIGRATE_MOVABLE gets skipped during > async compaction we could easily reach the end of the DMA zone quickly. Debug data has nr_isolated_file in dma zone 1 and nr_isolated_file in normal zone being -1. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 12:40 ` Andrea Arcangeli @ 2011-06-06 13:27 ` Minchan Kim 0 siblings, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-06 13:27 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, Jun 06, 2011 at 02:40:25PM +0200, Andrea Arcangeli wrote: > On Mon, Jun 06, 2011 at 11:43:45AM +0100, Mel Gorman wrote: > > Maybe not full, but it has more PageReserved pages than anywhere else > > and few MIGRATE_MOVABLE blocks. MIGRATE_MOVABLE gets skipped during > > async compaction we could easily reach the end of the DMA zone quickly. > > Debug data has nr_isolated_file in dma zone 1 and nr_isolated_file in > normal zone being -1. It's exactly match with Mel's case. Thanks for the proving, Andrea. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-06-06 10:43 ` Mel Gorman 2011-06-06 12:40 ` Andrea Arcangeli @ 2011-06-06 13:23 ` Minchan Kim 1 sibling, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-06-06 13:23 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, Jun 06, 2011 at 11:43:45AM +0100, Mel Gorman wrote: > On Sat, Jun 04, 2011 at 03:58:53PM +0900, Minchan Kim wrote: > > On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote: > > > On Fri, Jun 03, 2011 at 03:09:20AM +0100, Mel Gorman wrote: > > > > On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote: > > > > > > There is an explanation in here somewhere because as I write this, > > > > > > the test machine has survived 14 hours under continual stress without > > > > > > the isolated counters going negative with over 128 million pages > > > > > > successfully migrated and a million pages failed to migrate due to > > > > > > direct compaction being called 80,000 times. It's possible it's a > > > > > > co-incidence but it's some co-incidence! > > > > > > > > > > No idea... > > > > > > > > I wasn't able to work on this most of the day but was looking at this > > > > closer this evening again and I think I might have thought of another > > > > theory that could cause this problem. > > > > > > > > When THP is isolating pages, it accounts for the pages isolated against > > > > the zone of course. If it backs out, it finds the pages from the PTEs. > > > > On !SMP but PREEMPT, we may not have adequate protection against a new > > > > page from a different zone being inserted into the PTE causing us to > > > > decrement against the wrong zone. While the global counter is fine, > > > > the per-zone counters look corrupted. You'd still think it was the > > > > anon counter tht got screwed rather than the file one if it really was > > > > THP unfortunately so it's not the full picture. I'm going to start > > > > a test monitoring both zoneinfo and vmstat to see if vmstat looks > > > > fine while the per-zone counters that are negative are offset by a > > > > positive count on the other zones that when added together become 0. > > > > Hopefully it'll actually trigger overnight :/ > > > > > > > > > > Right idea of the wrong zone being accounted for but wrong place. I > > > think the following patch should fix the problem; > > > > > > ==== CUT HERE === > > > mm: compaction: Ensure that the compaction free scanner does not move to the next zone > > > > > > Compaction works with two scanners, a migration and a free > > > scanner. When the scanners crossover, migration within the zone is > > > complete. The location of the scanner is recorded on each cycle to > > > avoid excesive scanning. > > > > > > When a zone is small and mostly reserved, it's very easy for the > > > migration scanner to be close to the end of the zone. Then the following > > > situation can occurs > > > > > > o migration scanner isolates some pages near the end of the zone > > > o free scanner starts at the end of the zone but finds that the > > > migration scanner is already there > > > o free scanner gets reinitialised for the next cycle as > > > cc->migrate_pfn + pageblock_nr_pages > > > moving the free scanner into the next zone > > > o migration scanner moves into the next zone but continues accounting > > > against the old zone > > > > > > When this happens, NR_ISOLATED accounting goes haywire because some > > > of the accounting happens against the wrong zone. One zones counter > > > remains positive while the other goes negative even though the overall > > > global count is accurate. This was reported on X86-32 with !SMP because > > > !SMP allows the negative counters to be visible. The fact that it is > > > difficult to reproduce on X86-64 is probably just a co-incidence as > > > > I guess it's related to zone sizes. > > X86-64 has small DMA and large DMA32 zones for fallback of NORMAL while > > x86 has just a small DMA(16M) zone. > > > > Yep, this is a possibility as well as the use of lowmem reserves. > > > I think DMA zone in x86 is easily full of non-LRU or non-movable pages. > > Maybe not full, but it has more PageReserved pages than anywhere else Yeb. It's very possible. > and few MIGRATE_MOVABLE blocks. MIGRATE_MOVABLE gets skipped during non-MIGRATE_MOVABLE gets skipped during To be clear for someone in future, let's fix typo. > async compaction we could easily reach the end of the DMA zone quickly. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 17:53 ` Andrea Arcangeli 2011-05-31 12:16 ` Minchan Kim @ 2011-05-31 14:34 ` Mel Gorman 1 sibling, 0 replies; 63+ messages in thread From: Mel Gorman @ 2011-05-31 14:34 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, linux-kernel, linux-mm, stable On Mon, May 30, 2011 at 07:53:34PM +0200, Andrea Arcangeli wrote: > On Mon, May 30, 2011 at 05:55:46PM +0100, Mel Gorman wrote: > > Even with drift issues, -1 there should be "impossible". Assuming this > > is a zoneinfo file, that figure is based on global_page_state() which > > looks like > > The two cases reproducing this long hang in D state, had from SMP=n > PREEMPT=y. Clearly not common config these days. Also it didn't seem > apparent that any task was running in a code path that kept pages > isolated. > > > unsigned long, and callers are using unsigned long, is there any > > possibility the "if (x < 0)" is being optimised out? If you aware > > It was eliminated by cpp. > Yep, !CONFIG_SMP is important. > > of users reporting this problem (like the users in thread "iotop: > > khugepaged at 99.99% (2.6.38.3)"), do you know if they had a particular > > compiler in common? > > I had no reason to worry about the compiler yet but that's always good > idea to keep in mind. I'm no longer considering it if CONFIG_SMP is not set. > The thread were the bug is reported is the > "iotop" one you mentioned, and there's a tarball attached to one of > the last emails of the thread with the debug data I grepped. It was > /proc/zoneinfo file yes. That's the file I asked when I noticed > something had to be wrong with too_many_isolated and I expected either > nr_isolated or nr_inactive going wrong, it turned out it was > nr_isolated (apparently, I don't have full picture on the problem > yet). I added you in CC to a few emails but you weren't in all > replies. > I didn't pay as close attention as I should either. I was out for a few days when that thread happened and it had gone quiet by the time I came back. I didn't check if it ever got resolved. > The debug data you can find on lkml in this email: Message-Id: > <201105232005.56840.johannes.hirte@fem.tu-ilmenau.de>. > > The other relevant sysrq+t here http://pastebin.com/raw.php?i=VG28YRbi > > better save the latter (I did) as I'm worried it has a timeout on it. > > Your patch was for reports with CONFIG_SMP=y? No. Now that I look at the config, CONFIG_SMP was not set. > I'd prefer to clear out > this error before improving the too_many_isolated, Agreed. So far, it has been a impossible to reproduce but it's happened to enough people that it must exist. I have a similar config now and have applied a debugging patch to warn when the count gets screwed up. It hasn't triggered yet so it must be due to some difficult-to-hit-race. > in fact while > reviewing this code I was not impressed by too_many_isolated. For > vmscan.c if there's an huge nr_active* list and a tiny nr_inactive > (like after a truncate of filebacked pages or munmap of anon memory) > there's no reason to stall, it's better to go ahead and let it refile > more active pages. The rotating of lists should already have happened but you could be right if a process was stalled in too_many_isolated for a long period of time that the lists would become imbalanced. It'd be saved by kswapd running at the same time and rotating the list but it might need to be revisisted at some point. It's unrelated to the current problem though. > The too_many_isolated in compaction.c looks a whole > lot better than the vmscan.c one as that takes into account the active > pages too... But I refrained to make any change in this area as I > don't think the bug is in too_many_isolated itself. > Not a bug that would lock up the machine at least. > I noticed the count[] array is unsigned int, but it looks ok > (especially for 32bit ;) because the isolation is limited. > Agreed, this is not a wrapping problem. > Both bugs were reported on 32bit x86 UP builds with PREEMPT=y. The > stat accounting seem to use atomics on UP so irqs on off or > PREEMPT=y/n shouldn't matter if the increment is 1 insn long (plus no > irq code should ever mess with nr_isolated)... If it wasn't atomic and > irqs or preempt aren't disabled it could be preempt. To avoid > confusion: it's not proven that PREEMPT is related, it may be an > accident both .config had it on. I'm also unsure why it moves from > -1,0,1 I wouldn't expect a single page to be isolated like -1 pages to > be isolated, it just looks weird... > Hmm, looking at the zoneinfo, we can probably rule out some race related to page flags. If we someone isolated a page as anon and put it back as file, we'd decrement nr_isolated_file inappropriate to -1 but nr_isolated_anon would be stuck on 1 to match it. I'm looking at __collapse_huge_page_isolate and it always assumes it isolated anonymous pages. mmap_sem should be held as well as the pagetable lock and we hold the page lock while isolating from the LRU which seems fine but I wonder could there be another condition that allows a !PageSwapBacked page to sneak in there? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [stable] [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman 2011-05-30 14:31 ` Andrea Arcangeli @ 2011-05-30 14:45 ` Greg KH 2011-05-30 16:14 ` Minchan Kim ` (2 subsequent siblings) 4 siblings, 0 replies; 63+ messages in thread From: Greg KH @ 2011-05-30 14:45 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Andrea Arcangeli, linux-kernel, linux-mm, KOSAKI Motohiro, Ury Stankevich, stable On Mon, May 30, 2011 at 02:13:00PM +0100, Mel Gorman wrote: > Asynchronous compaction is used when promoting to huge pages. This is > all very nice but if there are a number of processes in compacting > memory, a large number of pages can be isolated. An "asynchronous" > process can stall for long periods of time as a result with a user > reporting that firefox can stall for 10s of seconds. This patch aborts > asynchronous compaction if too many pages are isolated as it's better to > fail a hugepage promotion than stall a process. > > If accepted, this should also be considered for 2.6.39-stable. It should > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > writeback] would be applied to 2.6.38 before consideration. > > Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/compaction.c | 32 ++++++++++++++++++++++++++------ > 1 files changed, 26 insertions(+), 6 deletions(-) <formletter> This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. </formletter> ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman 2011-05-30 14:31 ` Andrea Arcangeli 2011-05-30 14:45 ` [stable] " Greg KH @ 2011-05-30 16:14 ` Minchan Kim 2011-05-31 8:32 ` Mel Gorman 2011-05-31 4:48 ` KAMEZAWA Hiroyuki 2011-05-31 7:14 ` KOSAKI Motohiro 4 siblings, 1 reply; 63+ messages in thread From: Minchan Kim @ 2011-05-30 16:14 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Ury Stankevich, KOSAKI Motohiro, Andrea Arcangeli, linux-kernel, linux-mm, stable On Mon, May 30, 2011 at 02:13:00PM +0100, Mel Gorman wrote: > Asynchronous compaction is used when promoting to huge pages. This is > all very nice but if there are a number of processes in compacting > memory, a large number of pages can be isolated. An "asynchronous" > process can stall for long periods of time as a result with a user > reporting that firefox can stall for 10s of seconds. This patch aborts > asynchronous compaction if too many pages are isolated as it's better to > fail a hugepage promotion than stall a process. > > If accepted, this should also be considered for 2.6.39-stable. It should > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > writeback] would be applied to 2.6.38 before consideration. > > Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> I have a nitpick below. Otherwise, looks good to me. > --- > mm/compaction.c | 32 ++++++++++++++++++++++++++------ > 1 files changed, 26 insertions(+), 6 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 021a296..331a2ee 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -240,11 +240,20 @@ static bool too_many_isolated(struct zone *zone) > return isolated > (inactive + active) / 2; > } > > +/* possible outcome of isolate_migratepages */ > +typedef enum { > + ISOLATE_ABORT, /* Abort compaction now */ > + ISOLATE_NONE, /* No pages isolated, continue scanning */ > + ISOLATE_SUCCESS, /* Pages isolated, migrate */ > +} isolate_migrate_t; > + > /* > * Isolate all pages that can be migrated from the block pointed to by > * the migrate scanner within compact_control. > + * > + * Returns false if compaction should abort at this point due to congestion. false? I think it would be better to use explicit word, ISOLATE_ABORT. -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 16:14 ` Minchan Kim @ 2011-05-31 8:32 ` Mel Gorman 0 siblings, 0 replies; 63+ messages in thread From: Mel Gorman @ 2011-05-31 8:32 UTC (permalink / raw) To: Minchan Kim Cc: akpm, Ury Stankevich, KOSAKI Motohiro, Andrea Arcangeli, linux-kernel, linux-mm, stable On Tue, May 31, 2011 at 01:14:15AM +0900, Minchan Kim wrote: > On Mon, May 30, 2011 at 02:13:00PM +0100, Mel Gorman wrote: > > Asynchronous compaction is used when promoting to huge pages. This is > > all very nice but if there are a number of processes in compacting > > memory, a large number of pages can be isolated. An "asynchronous" > > process can stall for long periods of time as a result with a user > > reporting that firefox can stall for 10s of seconds. This patch aborts > > asynchronous compaction if too many pages are isolated as it's better to > > fail a hugepage promotion than stall a process. > > > > If accepted, this should also be considered for 2.6.39-stable. It should > > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > > writeback] would be applied to 2.6.38 before consideration. > > > > Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> > > Signed-off-by: Mel Gorman <mgorman@suse.de> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> > Thanks > I have a nitpick below. > Otherwise, looks good to me. > > > --- > > mm/compaction.c | 32 ++++++++++++++++++++++++++------ > > 1 files changed, 26 insertions(+), 6 deletions(-) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 021a296..331a2ee 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -240,11 +240,20 @@ static bool too_many_isolated(struct zone *zone) > > return isolated > (inactive + active) / 2; > > } > > > > +/* possible outcome of isolate_migratepages */ > > +typedef enum { > > + ISOLATE_ABORT, /* Abort compaction now */ > > + ISOLATE_NONE, /* No pages isolated, continue scanning */ > > + ISOLATE_SUCCESS, /* Pages isolated, migrate */ > > +} isolate_migrate_t; > > + > > /* > > * Isolate all pages that can be migrated from the block pointed to by > > * the migrate scanner within compact_control. > > + * > > + * Returns false if compaction should abort at this point due to congestion. > > false? I think it would be better to use explicit word, ISOLATE_ABORT. > Oops, thanks for pointing that out. I'll post a V2 once it has been figured out why NR_ISOLATE_* is getting screwed up on !CONFIG_SMP. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman ` (2 preceding siblings ...) 2011-05-30 16:14 ` Minchan Kim @ 2011-05-31 4:48 ` KAMEZAWA Hiroyuki 2011-05-31 5:38 ` Minchan Kim 2011-05-31 7:14 ` KOSAKI Motohiro 4 siblings, 1 reply; 63+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-05-31 4:48 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Ury Stankevich, KOSAKI Motohiro, Andrea Arcangeli, linux-kernel, linux-mm, stable On Mon, 30 May 2011 14:13:00 +0100 Mel Gorman <mel@csn.ul.ie> wrote: > Asynchronous compaction is used when promoting to huge pages. This is > all very nice but if there are a number of processes in compacting > memory, a large number of pages can be isolated. An "asynchronous" > process can stall for long periods of time as a result with a user > reporting that firefox can stall for 10s of seconds. This patch aborts > asynchronous compaction if too many pages are isolated as it's better to > fail a hugepage promotion than stall a process. > > If accepted, this should also be considered for 2.6.39-stable. It should > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > writeback] would be applied to 2.6.38 before consideration. > > Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> BTW, I'm surprised to see both of vmscan.c and compaction.c has too_many_isolated().. in different logic ;) BTW, compaction ignores UNEVICTABLE LRU ? Thanks, -Kame > --- > mm/compaction.c | 32 ++++++++++++++++++++++++++------ > 1 files changed, 26 insertions(+), 6 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 021a296..331a2ee 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -240,11 +240,20 @@ static bool too_many_isolated(struct zone *zone) > return isolated > (inactive + active) / 2; > } > > +/* possible outcome of isolate_migratepages */ > +typedef enum { > + ISOLATE_ABORT, /* Abort compaction now */ > + ISOLATE_NONE, /* No pages isolated, continue scanning */ > + ISOLATE_SUCCESS, /* Pages isolated, migrate */ > +} isolate_migrate_t; > + > /* > * Isolate all pages that can be migrated from the block pointed to by > * the migrate scanner within compact_control. > + * > + * Returns false if compaction should abort at this point due to congestion. > */ > -static unsigned long isolate_migratepages(struct zone *zone, > +static isolate_migrate_t isolate_migratepages(struct zone *zone, > struct compact_control *cc) > { > unsigned long low_pfn, end_pfn; > @@ -261,7 +270,7 @@ static unsigned long isolate_migratepages(struct zone *zone, > /* Do not cross the free scanner or scan within a memory hole */ > if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) { > cc->migrate_pfn = end_pfn; > - return 0; > + return ISOLATE_NONE; > } > > /* > @@ -270,10 +279,14 @@ static unsigned long isolate_migratepages(struct zone *zone, > * delay for some time until fewer pages are isolated > */ > while (unlikely(too_many_isolated(zone))) { > + /* async migration should just abort */ > + if (!cc->sync) > + return ISOLATE_ABORT; > + > congestion_wait(BLK_RW_ASYNC, HZ/10); > > if (fatal_signal_pending(current)) > - return 0; > + return ISOLATE_ABORT; > } > > /* Time to isolate some pages for migration */ > @@ -358,7 +371,7 @@ static unsigned long isolate_migratepages(struct zone *zone, > > trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated); > > - return cc->nr_migratepages; > + return ISOLATE_SUCCESS; > } > > /* > @@ -522,9 +535,15 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) > unsigned long nr_migrate, nr_remaining; > int err; > > - if (!isolate_migratepages(zone, cc)) > + switch (isolate_migratepages(zone, cc)) { > + case ISOLATE_ABORT: > + goto out; > + case ISOLATE_NONE: > continue; > - > + case ISOLATE_SUCCESS: > + ; > + } > + > nr_migrate = cc->nr_migratepages; > err = migrate_pages(&cc->migratepages, compaction_alloc, > (unsigned long)cc, false, > @@ -547,6 +566,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) > > } > > +out: > /* Release free pages and check accounting */ > cc->nr_freepages -= release_freepages(&cc->freepages); > VM_BUG_ON(cc->nr_freepages != 0); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-31 4:48 ` KAMEZAWA Hiroyuki @ 2011-05-31 5:38 ` Minchan Kim 0 siblings, 0 replies; 63+ messages in thread From: Minchan Kim @ 2011-05-31 5:38 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Mel Gorman, akpm, Ury Stankevich, KOSAKI Motohiro, Andrea Arcangeli, linux-kernel, linux-mm, stable Hi Kame, On Tue, May 31, 2011 at 01:48:35PM +0900, KAMEZAWA Hiroyuki wrote: > On Mon, 30 May 2011 14:13:00 +0100 > Mel Gorman <mel@csn.ul.ie> wrote: > > > Asynchronous compaction is used when promoting to huge pages. This is > > all very nice but if there are a number of processes in compacting > > memory, a large number of pages can be isolated. An "asynchronous" > > process can stall for long periods of time as a result with a user > > reporting that firefox can stall for 10s of seconds. This patch aborts > > asynchronous compaction if too many pages are isolated as it's better to > > fail a hugepage promotion than stall a process. > > > > If accepted, this should also be considered for 2.6.39-stable. It should > > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > > writeback] would be applied to 2.6.38 before consideration. > > > > Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > BTW, I'm surprised to see both of vmscan.c and compaction.c has too_many_isolated().. > in different logic ;) > > BTW, compaction ignores UNEVICTABLE LRU ? Good point. Yes. now compaction doesn't work with unevictable LRU but I think we have no reason to work well with unveictable pages. If we don't support unevictable lru, it would be a problem in lots of mlocked pages workload. It would be a good enhance point on compaction. > > Thanks, > -Kame -- Kind regards Minchan Kim ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous 2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman ` (3 preceding siblings ...) 2011-05-31 4:48 ` KAMEZAWA Hiroyuki @ 2011-05-31 7:14 ` KOSAKI Motohiro 4 siblings, 0 replies; 63+ messages in thread From: KOSAKI Motohiro @ 2011-05-31 7:14 UTC (permalink / raw) To: mel; +Cc: akpm, urykhy, aarcange, linux-kernel, linux-mm, stable (2011/05/30 22:13), Mel Gorman wrote: > Asynchronous compaction is used when promoting to huge pages. This is > all very nice but if there are a number of processes in compacting > memory, a large number of pages can be isolated. An "asynchronous" > process can stall for long periods of time as a result with a user > reporting that firefox can stall for 10s of seconds. This patch aborts > asynchronous compaction if too many pages are isolated as it's better to > fail a hugepage promotion than stall a process. > > If accepted, this should also be considered for 2.6.39-stable. It should > also be considered for 2.6.38-stable but ideally [11bc82d6: mm: > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no > writeback] would be applied to 2.6.38 before consideration. > > Reported-and-Tested-by: Ury Stankevich <urykhy@gmail.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2011-06-06 22:33 UTC | newest] Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman 2011-05-30 14:31 ` Andrea Arcangeli 2011-05-30 15:37 ` Mel Gorman 2011-05-30 16:55 ` Mel Gorman 2011-05-30 17:53 ` Andrea Arcangeli 2011-05-31 12:16 ` Minchan Kim 2011-05-31 12:24 ` Andrea Arcangeli 2011-05-31 13:33 ` Minchan Kim 2011-05-31 14:14 ` Andrea Arcangeli 2011-05-31 14:37 ` Minchan Kim 2011-05-31 14:38 ` Minchan Kim 2011-06-02 18:23 ` Andrea Arcangeli 2011-06-02 20:21 ` Minchan Kim 2011-06-02 20:59 ` Minchan Kim 2011-06-02 22:03 ` Andrea Arcangeli 2011-06-02 21:40 ` Andrea Arcangeli 2011-06-02 22:23 ` Minchan Kim 2011-06-02 22:32 ` Andrea Arcangeli 2011-06-02 23:01 ` Minchan Kim 2011-06-03 17:37 ` Andrea Arcangeli 2011-06-03 18:07 ` Andrea Arcangeli 2011-06-04 7:59 ` Minchan Kim 2011-06-06 10:32 ` Mel Gorman 2011-06-06 12:49 ` Andrea Arcangeli 2011-06-06 14:47 ` Mel Gorman 2011-06-06 14:07 ` Minchan Kim 2011-06-06 10:15 ` Mel Gorman 2011-06-06 10:26 ` Mel Gorman 2011-06-06 14:01 ` Minchan Kim 2011-06-06 14:26 ` Minchan Kim 2011-06-02 23:02 ` Minchan Kim 2011-06-01 0:57 ` Mel Gorman 2011-06-01 9:24 ` Mel Gorman 2011-06-01 17:58 ` Mel Gorman 2011-06-01 19:15 ` Andrea Arcangeli 2011-06-01 21:40 ` Mel Gorman 2011-06-01 23:30 ` Andrea Arcangeli 2011-06-02 1:03 ` Mel Gorman 2011-06-02 8:34 ` Minchan Kim 2011-06-02 13:29 ` Andrea Arcangeli 2011-06-02 14:50 ` Mel Gorman 2011-06-02 15:37 ` Andrea Arcangeli 2011-06-03 2:09 ` Mel Gorman 2011-06-03 14:49 ` Mel Gorman 2011-06-03 15:45 ` Andrea Arcangeli 2011-06-04 7:25 ` Minchan Kim 2011-06-06 10:39 ` Mel Gorman 2011-06-06 12:38 ` Andrea Arcangeli 2011-06-06 14:55 ` Mel Gorman 2011-06-06 14:19 ` Minchan Kim 2011-06-06 22:32 ` Andrew Morton 2011-06-04 6:58 ` Minchan Kim 2011-06-06 10:43 ` Mel Gorman 2011-06-06 12:40 ` Andrea Arcangeli 2011-06-06 13:27 ` Minchan Kim 2011-06-06 13:23 ` Minchan Kim 2011-05-31 14:34 ` Mel Gorman 2011-05-30 14:45 ` [stable] " Greg KH 2011-05-30 16:14 ` Minchan Kim 2011-05-31 8:32 ` Mel Gorman 2011-05-31 4:48 ` KAMEZAWA Hiroyuki 2011-05-31 5:38 ` Minchan Kim 2011-05-31 7:14 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).