* [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-09 15:46 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2011-02-09 15:46 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Mel Gorman, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel Hi, I think this should fix the problem of processes getting stuck in reclaim that has been reported several times. Kent actually single-stepped through this code and noted that it was never exiting shrink_zone(), which really narrowed it down a lot, considering the tons of nested loops from the allocator down to the list shrinking. Hannes --- From: Johannes Weiner <hannes@cmpxchg.org> Subject: vmscan: fix zone shrinking exit when scan work is done '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim' introduced an indefinite loop in shrink_zone(). It meant to break out of this loop when no pages had been reclaimed and not a single page was even scanned. The way it would detect the latter is by taking a snapshot of sc->nr_scanned at the beginning of the function and comparing it against the new sc->nr_scanned after the scan loop. But it would re-iterate without updating that snapshot, looping forever if sc->nr_scanned changed at least once since shrink_zone() was invoked. This is not the sole condition that would exit that loop, but it requires other processes to change the zone state, as the reclaimer that is stuck obviously can not anymore. This is only happening for higher-order allocations, where reclaim is run back to back with compaction. Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/vmscan.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..17497d0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1882,12 +1882,12 @@ static void shrink_zone(int priority, struct zone *zone, unsigned long nr[NR_LRU_LISTS]; unsigned long nr_to_scan; enum lru_list l; - unsigned long nr_reclaimed; + unsigned long nr_reclaimed, nr_scanned; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - unsigned long nr_scanned = sc->nr_scanned; restart: nr_reclaimed = 0; + nr_scanned = sc->nr_scanned; get_scan_count(zone, sc, nr, priority); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || -- 1.7.4 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-09 15:46 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2011-02-09 15:46 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Mel Gorman, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel Hi, I think this should fix the problem of processes getting stuck in reclaim that has been reported several times. Kent actually single-stepped through this code and noted that it was never exiting shrink_zone(), which really narrowed it down a lot, considering the tons of nested loops from the allocator down to the list shrinking. Hannes --- From: Johannes Weiner <hannes@cmpxchg.org> Subject: vmscan: fix zone shrinking exit when scan work is done '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim' introduced an indefinite loop in shrink_zone(). It meant to break out of this loop when no pages had been reclaimed and not a single page was even scanned. The way it would detect the latter is by taking a snapshot of sc->nr_scanned at the beginning of the function and comparing it against the new sc->nr_scanned after the scan loop. But it would re-iterate without updating that snapshot, looping forever if sc->nr_scanned changed at least once since shrink_zone() was invoked. This is not the sole condition that would exit that loop, but it requires other processes to change the zone state, as the reclaimer that is stuck obviously can not anymore. This is only happening for higher-order allocations, where reclaim is run back to back with compaction. Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/vmscan.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..17497d0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1882,12 +1882,12 @@ static void shrink_zone(int priority, struct zone *zone, unsigned long nr[NR_LRU_LISTS]; unsigned long nr_to_scan; enum lru_list l; - unsigned long nr_reclaimed; + unsigned long nr_reclaimed, nr_scanned; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - unsigned long nr_scanned = sc->nr_scanned; restart: nr_reclaimed = 0; + nr_scanned = sc->nr_scanned; get_scan_count(zone, sc, nr, priority); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || -- 1.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-09 15:46 ` Johannes Weiner @ 2011-02-09 15:54 ` Kent Overstreet -1 siblings, 0 replies; 44+ messages in thread From: Kent Overstreet @ 2011-02-09 15:54 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, Rik van Riel, Michal Hocko, linux-mm, linux-kernel On 02/09/2011 07:46 AM, Johannes Weiner wrote: > Hi, > > I think this should fix the problem of processes getting stuck in > reclaim that has been reported several times. Kent actually > single-stepped through this code and noted that it was never exiting > shrink_zone(), which really narrowed it down a lot, considering the > tons of nested loops from the allocator down to the list shrinking. > > Hannes I was able to trigger this in just a few minutes stress testing bcache, and now it's been going for half an hour working beautifully. Thanks! > > --- > From: Johannes Weiner<hannes@cmpxchg.org> > Subject: vmscan: fix zone shrinking exit when scan work is done > > '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of > lumpy reclaim' introduced an indefinite loop in shrink_zone(). > > It meant to break out of this loop when no pages had been reclaimed > and not a single page was even scanned. The way it would detect the > latter is by taking a snapshot of sc->nr_scanned at the beginning of > the function and comparing it against the new sc->nr_scanned after the > scan loop. But it would re-iterate without updating that snapshot, > looping forever if sc->nr_scanned changed at least once since > shrink_zone() was invoked. > > This is not the sole condition that would exit that loop, but it > requires other processes to change the zone state, as the reclaimer > that is stuck obviously can not anymore. > > This is only happening for higher-order allocations, where reclaim is > run back to back with compaction. > > Reported-by: Michal Hocko<mhocko@suse.cz> > Reported-by: Kent Overstreet<kent.overstreet@gmail.com> > Signed-off-by: Johannes Weiner<hannes@cmpxchg.org> Tested-by: Kent Overstreet<kent.overstreet@gmail.com> > --- > mm/vmscan.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 148c6e6..17497d0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1882,12 +1882,12 @@ static void shrink_zone(int priority, struct zone *zone, > unsigned long nr[NR_LRU_LISTS]; > unsigned long nr_to_scan; > enum lru_list l; > - unsigned long nr_reclaimed; > + unsigned long nr_reclaimed, nr_scanned; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - unsigned long nr_scanned = sc->nr_scanned; > > restart: > nr_reclaimed = 0; > + nr_scanned = sc->nr_scanned; > get_scan_count(zone, sc, nr, priority); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-09 15:54 ` Kent Overstreet 0 siblings, 0 replies; 44+ messages in thread From: Kent Overstreet @ 2011-02-09 15:54 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, Rik van Riel, Michal Hocko, linux-mm, linux-kernel On 02/09/2011 07:46 AM, Johannes Weiner wrote: > Hi, > > I think this should fix the problem of processes getting stuck in > reclaim that has been reported several times. Kent actually > single-stepped through this code and noted that it was never exiting > shrink_zone(), which really narrowed it down a lot, considering the > tons of nested loops from the allocator down to the list shrinking. > > Hannes I was able to trigger this in just a few minutes stress testing bcache, and now it's been going for half an hour working beautifully. Thanks! > > --- > From: Johannes Weiner<hannes@cmpxchg.org> > Subject: vmscan: fix zone shrinking exit when scan work is done > > '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of > lumpy reclaim' introduced an indefinite loop in shrink_zone(). > > It meant to break out of this loop when no pages had been reclaimed > and not a single page was even scanned. The way it would detect the > latter is by taking a snapshot of sc->nr_scanned at the beginning of > the function and comparing it against the new sc->nr_scanned after the > scan loop. But it would re-iterate without updating that snapshot, > looping forever if sc->nr_scanned changed at least once since > shrink_zone() was invoked. > > This is not the sole condition that would exit that loop, but it > requires other processes to change the zone state, as the reclaimer > that is stuck obviously can not anymore. > > This is only happening for higher-order allocations, where reclaim is > run back to back with compaction. > > Reported-by: Michal Hocko<mhocko@suse.cz> > Reported-by: Kent Overstreet<kent.overstreet@gmail.com> > Signed-off-by: Johannes Weiner<hannes@cmpxchg.org> Tested-by: Kent Overstreet<kent.overstreet@gmail.com> > --- > mm/vmscan.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 148c6e6..17497d0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1882,12 +1882,12 @@ static void shrink_zone(int priority, struct zone *zone, > unsigned long nr[NR_LRU_LISTS]; > unsigned long nr_to_scan; > enum lru_list l; > - unsigned long nr_reclaimed; > + unsigned long nr_reclaimed, nr_scanned; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - unsigned long nr_scanned = sc->nr_scanned; > > restart: > nr_reclaimed = 0; > + nr_scanned = sc->nr_scanned; > get_scan_count(zone, sc, nr, priority); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-09 15:46 ` Johannes Weiner @ 2011-02-09 16:46 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-09 16:46 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > Hi, > > I think this should fix the problem of processes getting stuck in > reclaim that has been reported several times. I don't think it's the only source but I'm basing this on seeing constant looping in balance_pgdat() and calling congestion_wait() a few weeks ago that I haven't rechecked since. However, this looks like a real fix for a real problem. > Kent actually > single-stepped through this code and noted that it was never exiting > shrink_zone(), which really narrowed it down a lot, considering the > tons of nested loops from the allocator down to the list shrinking. > > Hannes > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: vmscan: fix zone shrinking exit when scan work is done > > '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of > lumpy reclaim' introduced an indefinite loop in shrink_zone(). > > It meant to break out of this loop when no pages had been reclaimed > and not a single page was even scanned. The way it would detect the > latter is by taking a snapshot of sc->nr_scanned at the beginning of > the function and comparing it against the new sc->nr_scanned after the > scan loop. But it would re-iterate without updating that snapshot, > looping forever if sc->nr_scanned changed at least once since > shrink_zone() was invoked. > > This is not the sole condition that would exit that loop, but it > requires other processes to change the zone state, as the reclaimer > that is stuck obviously can not anymore. > > This is only happening for higher-order allocations, where reclaim is > run back to back with compaction. > > Reported-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Kent Overstreet <kent.overstreet@gmail.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Well spotted. Acked-by: Mel Gorman <mel@csn.ul.ie> -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-09 16:46 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-09 16:46 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > Hi, > > I think this should fix the problem of processes getting stuck in > reclaim that has been reported several times. I don't think it's the only source but I'm basing this on seeing constant looping in balance_pgdat() and calling congestion_wait() a few weeks ago that I haven't rechecked since. However, this looks like a real fix for a real problem. > Kent actually > single-stepped through this code and noted that it was never exiting > shrink_zone(), which really narrowed it down a lot, considering the > tons of nested loops from the allocator down to the list shrinking. > > Hannes > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: vmscan: fix zone shrinking exit when scan work is done > > '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of > lumpy reclaim' introduced an indefinite loop in shrink_zone(). > > It meant to break out of this loop when no pages had been reclaimed > and not a single page was even scanned. The way it would detect the > latter is by taking a snapshot of sc->nr_scanned at the beginning of > the function and comparing it against the new sc->nr_scanned after the > scan loop. But it would re-iterate without updating that snapshot, > looping forever if sc->nr_scanned changed at least once since > shrink_zone() was invoked. > > This is not the sole condition that would exit that loop, but it > requires other processes to change the zone state, as the reclaimer > that is stuck obviously can not anymore. > > This is only happening for higher-order allocations, where reclaim is > run back to back with compaction. > > Reported-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Kent Overstreet <kent.overstreet@gmail.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Well spotted. Acked-by: Mel Gorman <mel@csn.ul.ie> -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-09 16:46 ` Mel Gorman @ 2011-02-09 18:28 ` Andrea Arcangeli -1 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-09 18:28 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > Hi, > > > > I think this should fix the problem of processes getting stuck in > > reclaim that has been reported several times. > > I don't think it's the only source but I'm basing this on seeing > constant looping in balance_pgdat() and calling congestion_wait() a few > weeks ago that I haven't rechecked since. However, this looks like a > real fix for a real problem. Agreed. Just yesterday I spent some time on the lumpy compaction changes after wondering about Michal's khugepaged 100% report, and I expected some fix was needed in this area (as I couldn't find any bug in khugepaged yet, so the lumpy compaction looked the next candidate for bugs). I've also been wondering about the !nr_scanned check in should_continue_reclaim too but I didn't look too much into the caller (I was tempted to remove it all together). I don't see how checking nr_scanned can be safe even after we fix the caller to avoid passing non-zero values if "goto restart". nr_scanned is incremented even for !page_evictable... so it's not really useful to insist, just because we scanned something, in my view. It looks bogus... So my proposal would be below. ==== Subject: mm: stop checking nr_scanned in should_continue_reclaim From: Andrea Arcangeli <aarcange@redhat.com> nr_scanned is incremented even for !page_evictable... so it's not really useful to insist, just because we scanned something. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..9741884 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1831,7 +1831,6 @@ out: */ static inline bool should_continue_reclaim(struct zone *zone, unsigned long nr_reclaimed, - unsigned long nr_scanned, struct scan_control *sc) { unsigned long pages_for_compaction; @@ -1841,15 +1840,8 @@ static inline bool should_continue_reclaim(struct zone *zone, if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; - /* - * If we failed to reclaim and have scanned the full list, stop. - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far - * faster but obviously would be less likely to succeed - * allocation. If this is desirable, use GFP_REPEAT to decide - * if both reclaimed and scanned should be checked or just - * reclaimed - */ - if (!nr_reclaimed && !nr_scanned) + /* If we failed to reclaim stop. */ + if (!nr_reclaimed) return false; /* @@ -1884,7 +1876,6 @@ static void shrink_zone(int priority, struct zone *zone, enum lru_list l; unsigned long nr_reclaimed; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - unsigned long nr_scanned = sc->nr_scanned; restart: nr_reclaimed = 0; @@ -1923,8 +1914,7 @@ restart: shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0); /* reclaim/compaction might need reclaim to continue */ - if (should_continue_reclaim(zone, nr_reclaimed, - sc->nr_scanned - nr_scanned, sc)) + if (should_continue_reclaim(zone, nr_reclaimed, sc)) goto restart; throttle_vm_writeout(sc->gfp_mask); ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-09 18:28 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-09 18:28 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > Hi, > > > > I think this should fix the problem of processes getting stuck in > > reclaim that has been reported several times. > > I don't think it's the only source but I'm basing this on seeing > constant looping in balance_pgdat() and calling congestion_wait() a few > weeks ago that I haven't rechecked since. However, this looks like a > real fix for a real problem. Agreed. Just yesterday I spent some time on the lumpy compaction changes after wondering about Michal's khugepaged 100% report, and I expected some fix was needed in this area (as I couldn't find any bug in khugepaged yet, so the lumpy compaction looked the next candidate for bugs). I've also been wondering about the !nr_scanned check in should_continue_reclaim too but I didn't look too much into the caller (I was tempted to remove it all together). I don't see how checking nr_scanned can be safe even after we fix the caller to avoid passing non-zero values if "goto restart". nr_scanned is incremented even for !page_evictable... so it's not really useful to insist, just because we scanned something, in my view. It looks bogus... So my proposal would be below. ==== Subject: mm: stop checking nr_scanned in should_continue_reclaim From: Andrea Arcangeli <aarcange@redhat.com> nr_scanned is incremented even for !page_evictable... so it's not really useful to insist, just because we scanned something. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..9741884 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1831,7 +1831,6 @@ out: */ static inline bool should_continue_reclaim(struct zone *zone, unsigned long nr_reclaimed, - unsigned long nr_scanned, struct scan_control *sc) { unsigned long pages_for_compaction; @@ -1841,15 +1840,8 @@ static inline bool should_continue_reclaim(struct zone *zone, if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; - /* - * If we failed to reclaim and have scanned the full list, stop. - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far - * faster but obviously would be less likely to succeed - * allocation. If this is desirable, use GFP_REPEAT to decide - * if both reclaimed and scanned should be checked or just - * reclaimed - */ - if (!nr_reclaimed && !nr_scanned) + /* If we failed to reclaim stop. */ + if (!nr_reclaimed) return false; /* @@ -1884,7 +1876,6 @@ static void shrink_zone(int priority, struct zone *zone, enum lru_list l; unsigned long nr_reclaimed; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - unsigned long nr_scanned = sc->nr_scanned; restart: nr_reclaimed = 0; @@ -1923,8 +1914,7 @@ restart: shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0); /* reclaim/compaction might need reclaim to continue */ - if (should_continue_reclaim(zone, nr_reclaimed, - sc->nr_scanned - nr_scanned, sc)) + if (should_continue_reclaim(zone, nr_reclaimed, sc)) goto restart; throttle_vm_writeout(sc->gfp_mask); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-09 18:28 ` Andrea Arcangeli @ 2011-02-09 20:05 ` Andrew Morton -1 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2011-02-09 20:05 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, 9 Feb 2011 19:28:46 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote: > On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > > Hi, > > > > > > I think this should fix the problem of processes getting stuck in > > > reclaim that has been reported several times. > > > > I don't think it's the only source but I'm basing this on seeing > > constant looping in balance_pgdat() and calling congestion_wait() a few > > weeks ago that I haven't rechecked since. However, this looks like a > > real fix for a real problem. > > Agreed. Just yesterday I spent some time on the lumpy compaction > changes after wondering about Michal's khugepaged 100% report, and I > expected some fix was needed in this area (as I couldn't find any bug > in khugepaged yet, so the lumpy compaction looked the next candidate > for bugs). > > I've also been wondering about the !nr_scanned check in > should_continue_reclaim too but I didn't look too much into the caller > (I was tempted to remove it all together). I don't see how checking > nr_scanned can be safe even after we fix the caller to avoid passing > non-zero values if "goto restart". > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something, in my > view. It looks bogus... So my proposal would be below. > > ==== > Subject: mm: stop checking nr_scanned in should_continue_reclaim > > From: Andrea Arcangeli <aarcange@redhat.com> > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something. So if reclaim has scanned 100% !page_evictable pages, should_continue_reclaim() can return true and we keep on scanning? That sounds like it's both good and bad :( Is this actually a problem? What sort of behaviour could it cause and under what circumstances? Johannes's patch is an obvious bugfix and I'll run with it for now, but please let's have a further think abut the impact of the !page_evictable pages. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-09 20:05 ` Andrew Morton 0 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2011-02-09 20:05 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, 9 Feb 2011 19:28:46 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote: > On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > > Hi, > > > > > > I think this should fix the problem of processes getting stuck in > > > reclaim that has been reported several times. > > > > I don't think it's the only source but I'm basing this on seeing > > constant looping in balance_pgdat() and calling congestion_wait() a few > > weeks ago that I haven't rechecked since. However, this looks like a > > real fix for a real problem. > > Agreed. Just yesterday I spent some time on the lumpy compaction > changes after wondering about Michal's khugepaged 100% report, and I > expected some fix was needed in this area (as I couldn't find any bug > in khugepaged yet, so the lumpy compaction looked the next candidate > for bugs). > > I've also been wondering about the !nr_scanned check in > should_continue_reclaim too but I didn't look too much into the caller > (I was tempted to remove it all together). I don't see how checking > nr_scanned can be safe even after we fix the caller to avoid passing > non-zero values if "goto restart". > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something, in my > view. It looks bogus... So my proposal would be below. > > ==== > Subject: mm: stop checking nr_scanned in should_continue_reclaim > > From: Andrea Arcangeli <aarcange@redhat.com> > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something. So if reclaim has scanned 100% !page_evictable pages, should_continue_reclaim() can return true and we keep on scanning? That sounds like it's both good and bad :( Is this actually a problem? What sort of behaviour could it cause and under what circumstances? Johannes's patch is an obvious bugfix and I'll run with it for now, but please let's have a further think abut the impact of the !page_evictable pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-09 18:28 ` Andrea Arcangeli @ 2011-02-10 10:21 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-10 10:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 09, 2011 at 07:28:46PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > > Hi, > > > > > > I think this should fix the problem of processes getting stuck in > > > reclaim that has been reported several times. > > > > I don't think it's the only source but I'm basing this on seeing > > constant looping in balance_pgdat() and calling congestion_wait() a few > > weeks ago that I haven't rechecked since. However, this looks like a > > real fix for a real problem. > > Agreed. Just yesterday I spent some time on the lumpy compaction > changes after wondering about Michal's khugepaged 100% report, and I > expected some fix was needed in this area (as I couldn't find any bug > in khugepaged yet, so the lumpy compaction looked the next candidate > for bugs). > Michal did report that disabling defrag did not help but the stack trace also showed that it was stuck in shrink_zone() which is what Johannes' patch targets. It's not unreasonable to test if Johannes' patch solves Michal's problem. Michal, I know that your workload is a bit random and may not be reproducible but do you think it'd be possible to determine if Johannes' patch helps? > I've also been wondering about the !nr_scanned check in > should_continue_reclaim too but I didn't look too much into the caller > (I was tempted to remove it all together). I don't see how checking > nr_scanned can be safe even after we fix the caller to avoid passing > non-zero values if "goto restart". > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something, in my > view. It looks bogus... So my proposal would be below. > We should not be ending up in a situation with the LRU list of only page_evictable pages and that situation persisting causing excessive (or infinite) looping. As unevictable pages are encountered on the LRU list, they should be moved to the unevictable lists by putback_lru_page(). Are you aware of a situation where this becomes broken? I recognise that SWAP_CLUSTER_MAX pages could all be unevictable and they are all get moved. In this case, nr_scanned is positive and we continue to scan but this is expected and desirable: Reclaim/compaction needs more pages to be freed before it starts compaction. If it stops scanning early, then it would just fail the allocation later. This is what the "NOTE" is about. I prefer Johannes' fix for the observed problem. > ==== > Subject: mm: stop checking nr_scanned in should_continue_reclaim > > From: Andrea Arcangeli <aarcange@redhat.com> > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 148c6e6..9741884 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1831,7 +1831,6 @@ out: > */ > static inline bool should_continue_reclaim(struct zone *zone, > unsigned long nr_reclaimed, > - unsigned long nr_scanned, > struct scan_control *sc) > { > unsigned long pages_for_compaction; > @@ -1841,15 +1840,8 @@ static inline bool should_continue_reclaim(struct zone *zone, > if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) > return false; > > - /* > - * If we failed to reclaim and have scanned the full list, stop. > - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far > - * faster but obviously would be less likely to succeed > - * allocation. If this is desirable, use GFP_REPEAT to decide > - * if both reclaimed and scanned should be checked or just > - * reclaimed > - */ > - if (!nr_reclaimed && !nr_scanned) > + /* If we failed to reclaim stop. */ > + if (!nr_reclaimed) > return false; > > /* > @@ -1884,7 +1876,6 @@ static void shrink_zone(int priority, struct zone *zone, > enum lru_list l; > unsigned long nr_reclaimed; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - unsigned long nr_scanned = sc->nr_scanned; > > restart: > nr_reclaimed = 0; > @@ -1923,8 +1914,7 @@ restart: > shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0); > > /* reclaim/compaction might need reclaim to continue */ > - if (should_continue_reclaim(zone, nr_reclaimed, > - sc->nr_scanned - nr_scanned, sc)) > + if (should_continue_reclaim(zone, nr_reclaimed, sc)) > goto restart; > > throttle_vm_writeout(sc->gfp_mask); > > -- Mel Gorman ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 10:21 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-10 10:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 09, 2011 at 07:28:46PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > > Hi, > > > > > > I think this should fix the problem of processes getting stuck in > > > reclaim that has been reported several times. > > > > I don't think it's the only source but I'm basing this on seeing > > constant looping in balance_pgdat() and calling congestion_wait() a few > > weeks ago that I haven't rechecked since. However, this looks like a > > real fix for a real problem. > > Agreed. Just yesterday I spent some time on the lumpy compaction > changes after wondering about Michal's khugepaged 100% report, and I > expected some fix was needed in this area (as I couldn't find any bug > in khugepaged yet, so the lumpy compaction looked the next candidate > for bugs). > Michal did report that disabling defrag did not help but the stack trace also showed that it was stuck in shrink_zone() which is what Johannes' patch targets. It's not unreasonable to test if Johannes' patch solves Michal's problem. Michal, I know that your workload is a bit random and may not be reproducible but do you think it'd be possible to determine if Johannes' patch helps? > I've also been wondering about the !nr_scanned check in > should_continue_reclaim too but I didn't look too much into the caller > (I was tempted to remove it all together). I don't see how checking > nr_scanned can be safe even after we fix the caller to avoid passing > non-zero values if "goto restart". > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something, in my > view. It looks bogus... So my proposal would be below. > We should not be ending up in a situation with the LRU list of only page_evictable pages and that situation persisting causing excessive (or infinite) looping. As unevictable pages are encountered on the LRU list, they should be moved to the unevictable lists by putback_lru_page(). Are you aware of a situation where this becomes broken? I recognise that SWAP_CLUSTER_MAX pages could all be unevictable and they are all get moved. In this case, nr_scanned is positive and we continue to scan but this is expected and desirable: Reclaim/compaction needs more pages to be freed before it starts compaction. If it stops scanning early, then it would just fail the allocation later. This is what the "NOTE" is about. I prefer Johannes' fix for the observed problem. > ==== > Subject: mm: stop checking nr_scanned in should_continue_reclaim > > From: Andrea Arcangeli <aarcange@redhat.com> > > nr_scanned is incremented even for !page_evictable... so it's not > really useful to insist, just because we scanned something. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 148c6e6..9741884 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1831,7 +1831,6 @@ out: > */ > static inline bool should_continue_reclaim(struct zone *zone, > unsigned long nr_reclaimed, > - unsigned long nr_scanned, > struct scan_control *sc) > { > unsigned long pages_for_compaction; > @@ -1841,15 +1840,8 @@ static inline bool should_continue_reclaim(struct zone *zone, > if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) > return false; > > - /* > - * If we failed to reclaim and have scanned the full list, stop. > - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far > - * faster but obviously would be less likely to succeed > - * allocation. If this is desirable, use GFP_REPEAT to decide > - * if both reclaimed and scanned should be checked or just > - * reclaimed > - */ > - if (!nr_reclaimed && !nr_scanned) > + /* If we failed to reclaim stop. */ > + if (!nr_reclaimed) > return false; > > /* > @@ -1884,7 +1876,6 @@ static void shrink_zone(int priority, struct zone *zone, > enum lru_list l; > unsigned long nr_reclaimed; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - unsigned long nr_scanned = sc->nr_scanned; > > restart: > nr_reclaimed = 0; > @@ -1923,8 +1914,7 @@ restart: > shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0); > > /* reclaim/compaction might need reclaim to continue */ > - if (should_continue_reclaim(zone, nr_reclaimed, > - sc->nr_scanned - nr_scanned, sc)) > + if (should_continue_reclaim(zone, nr_reclaimed, sc)) > goto restart; > > throttle_vm_writeout(sc->gfp_mask); > > -- Mel Gorman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-10 10:21 ` Mel Gorman @ 2011-02-10 10:41 ` Michal Hocko -1 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2011-02-10 10:41 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, Johannes Weiner, Andrew Morton, Rik van Riel, Kent Overstreet, linux-mm, linux-kernel On Thu 10-02-11 10:21:10, Mel Gorman wrote: > On Wed, Feb 09, 2011 at 07:28:46PM +0100, Andrea Arcangeli wrote: > > On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > > > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > > > Hi, > > > > > > > > I think this should fix the problem of processes getting stuck in > > > > reclaim that has been reported several times. > > > > > > I don't think it's the only source but I'm basing this on seeing > > > constant looping in balance_pgdat() and calling congestion_wait() a few > > > weeks ago that I haven't rechecked since. However, this looks like a > > > real fix for a real problem. > > > > Agreed. Just yesterday I spent some time on the lumpy compaction > > changes after wondering about Michal's khugepaged 100% report, and I > > expected some fix was needed in this area (as I couldn't find any bug > > in khugepaged yet, so the lumpy compaction looked the next candidate > > for bugs). > > > > Michal did report that disabling defrag did not help but the stack trace > also showed that it was stuck in shrink_zone() which is what Johannes' > patch targets. It's not unreasonable to test if Johannes' patch solves > Michal's problem. Michal, I know that your workload is a bit random and > may not be reproducible but do you think it'd be possible to determine > if Johannes' patch helps? Sure, I can test it. Nevertheless, I haven't seen the problem again. I have tried to make some memory pressure on the machine but no "luck". -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 10:41 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2011-02-10 10:41 UTC (permalink / raw) To: Mel Gorman Cc: Andrea Arcangeli, Johannes Weiner, Andrew Morton, Rik van Riel, Kent Overstreet, linux-mm, linux-kernel On Thu 10-02-11 10:21:10, Mel Gorman wrote: > On Wed, Feb 09, 2011 at 07:28:46PM +0100, Andrea Arcangeli wrote: > > On Wed, Feb 09, 2011 at 04:46:56PM +0000, Mel Gorman wrote: > > > On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote: > > > > Hi, > > > > > > > > I think this should fix the problem of processes getting stuck in > > > > reclaim that has been reported several times. > > > > > > I don't think it's the only source but I'm basing this on seeing > > > constant looping in balance_pgdat() and calling congestion_wait() a few > > > weeks ago that I haven't rechecked since. However, this looks like a > > > real fix for a real problem. > > > > Agreed. Just yesterday I spent some time on the lumpy compaction > > changes after wondering about Michal's khugepaged 100% report, and I > > expected some fix was needed in this area (as I couldn't find any bug > > in khugepaged yet, so the lumpy compaction looked the next candidate > > for bugs). > > > > Michal did report that disabling defrag did not help but the stack trace > also showed that it was stuck in shrink_zone() which is what Johannes' > patch targets. It's not unreasonable to test if Johannes' patch solves > Michal's problem. Michal, I know that your workload is a bit random and > may not be reproducible but do you think it'd be possible to determine > if Johannes' patch helps? Sure, I can test it. Nevertheless, I haven't seen the problem again. I have tried to make some memory pressure on the machine but no "luck". -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-10 10:21 ` Mel Gorman @ 2011-02-10 12:48 ` Andrea Arcangeli -1 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-10 12:48 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 10:21:10AM +0000, Mel Gorman wrote: > We should not be ending up in a situation with the LRU list of only > page_evictable pages and that situation persisting causing excessive (or > infinite) looping. As unevictable pages are encountered on the LRU list, > they should be moved to the unevictable lists by putback_lru_page(). Are you > aware of a situation where this becomes broken? > > I recognise that SWAP_CLUSTER_MAX pages could all be unevictable and they > are all get moved. In this case, nr_scanned is positive and we continue > to scan but this is expected and desirable: Reclaim/compaction needs more > pages to be freed before it starts compaction. If it stops scanning early, > then it would just fail the allocation later. This is what the "NOTE" is about. > > I prefer Johannes' fix for the observed problem. should_continue_reclaim is only needed for compaction. It tries to free enough pages so that compaction can succeed in its defrag attempt. So breaking the loop faster isn't going to cause failures for 0 order pages. My worry is that we loop too much in shrink_zone just for compaction even when we don't do any progress. shrink_zone would never scan more than SWAP_CLUSTER_MAX pages, before this change. Now it can loop over the whole lru as long as we're scanning stuff. Ok to overboost shrink_zone if we're making progress to allow compaction at the next round, but if we don't visibly make progress, I'm concerned that it may be too aggressive to scan the whole list. The performance benefit of having an hugepage isn't as huge as scanning all pages in the lru when before we would have broken the loop and declared failure after only SWAP_CLUSTER_MAX pages, and then we would have fallen back in a order 0 allocation. The fix may help of course, maybe it's enough for his case I don't know, but I don't see it making a whole lot of difference, except now it will stop when the lru is practically empty which clearly is an improvement. I think we shouldn't be so worried about succeeding compaction, the important thing is we don't waste time in compaction if there's not enough free memory but compaction_suitable used by both logics should be enough for that. I'd rather prefer that if hugetlbfs has special needs it uses a __GFP_ flag or similar that increases how compaction is strict in succeeding, up to scanning the whole lru in one go in order to make some free memory for compaction to succeed. Going ahead with the scan until compaction_suitable is true instead makes sense when there's absence of memory pressure and nr_reclaimed is never zero. Maybe we should try a bit more times than just nr_reclaim but going over the whole lru, sounds a bit extreme. The issue isn't just for unevictable pages, that will be refiled during the scan but it will also happen in presence of lots of referenced pages. For example if we don't apply my fix, the current code can take down all young bits in all ptes in one go in the whole system before returning from shrink_zone, that is too much in my view, and losing all that information in one go (not even to tell the cost associated with losing it) can hardly be offseted by the improvement given by 1 more hugepage. But please let me know if I've misread something... Thanks, Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 12:48 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-10 12:48 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 10:21:10AM +0000, Mel Gorman wrote: > We should not be ending up in a situation with the LRU list of only > page_evictable pages and that situation persisting causing excessive (or > infinite) looping. As unevictable pages are encountered on the LRU list, > they should be moved to the unevictable lists by putback_lru_page(). Are you > aware of a situation where this becomes broken? > > I recognise that SWAP_CLUSTER_MAX pages could all be unevictable and they > are all get moved. In this case, nr_scanned is positive and we continue > to scan but this is expected and desirable: Reclaim/compaction needs more > pages to be freed before it starts compaction. If it stops scanning early, > then it would just fail the allocation later. This is what the "NOTE" is about. > > I prefer Johannes' fix for the observed problem. should_continue_reclaim is only needed for compaction. It tries to free enough pages so that compaction can succeed in its defrag attempt. So breaking the loop faster isn't going to cause failures for 0 order pages. My worry is that we loop too much in shrink_zone just for compaction even when we don't do any progress. shrink_zone would never scan more than SWAP_CLUSTER_MAX pages, before this change. Now it can loop over the whole lru as long as we're scanning stuff. Ok to overboost shrink_zone if we're making progress to allow compaction at the next round, but if we don't visibly make progress, I'm concerned that it may be too aggressive to scan the whole list. The performance benefit of having an hugepage isn't as huge as scanning all pages in the lru when before we would have broken the loop and declared failure after only SWAP_CLUSTER_MAX pages, and then we would have fallen back in a order 0 allocation. The fix may help of course, maybe it's enough for his case I don't know, but I don't see it making a whole lot of difference, except now it will stop when the lru is practically empty which clearly is an improvement. I think we shouldn't be so worried about succeeding compaction, the important thing is we don't waste time in compaction if there's not enough free memory but compaction_suitable used by both logics should be enough for that. I'd rather prefer that if hugetlbfs has special needs it uses a __GFP_ flag or similar that increases how compaction is strict in succeeding, up to scanning the whole lru in one go in order to make some free memory for compaction to succeed. Going ahead with the scan until compaction_suitable is true instead makes sense when there's absence of memory pressure and nr_reclaimed is never zero. Maybe we should try a bit more times than just nr_reclaim but going over the whole lru, sounds a bit extreme. The issue isn't just for unevictable pages, that will be refiled during the scan but it will also happen in presence of lots of referenced pages. For example if we don't apply my fix, the current code can take down all young bits in all ptes in one go in the whole system before returning from shrink_zone, that is too much in my view, and losing all that information in one go (not even to tell the cost associated with losing it) can hardly be offseted by the improvement given by 1 more hugepage. But please let me know if I've misread something... Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-10 12:48 ` Andrea Arcangeli @ 2011-02-10 13:33 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-10 13:33 UTC (permalink / raw) To: Andrea Arcangeli Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 01:48:38PM +0100, Andrea Arcangeli wrote: > On Thu, Feb 10, 2011 at 10:21:10AM +0000, Mel Gorman wrote: > > We should not be ending up in a situation with the LRU list of only > > page_evictable pages and that situation persisting causing excessive (or > > infinite) looping. As unevictable pages are encountered on the LRU list, > > they should be moved to the unevictable lists by putback_lru_page(). Are you > > aware of a situation where this becomes broken? > > > > I recognise that SWAP_CLUSTER_MAX pages could all be unevictable and they > > are all get moved. In this case, nr_scanned is positive and we continue > > to scan but this is expected and desirable: Reclaim/compaction needs more > > pages to be freed before it starts compaction. If it stops scanning early, > > then it would just fail the allocation later. This is what the "NOTE" is about. > > > > I prefer Johannes' fix for the observed problem. > > should_continue_reclaim is only needed for compaction. It tries to > free enough pages so that compaction can succeed in its defrag > attempt. Correct. > So breaking the loop faster isn't going to cause failures for > 0 order pages. Also true, I commented on this in the "Note" your patch deletes and a suggestion on how an alternative would be to break early unless GFP_REPEAT. > My worry is that we loop too much in shrink_zone just > for compaction even when we don't do any progress. shrink_zone would > never scan more than SWAP_CLUSTER_MAX pages, before this change. Sortof. Lumpy reclaim would have scanned more than SWAP_CLUSTER_MAX so scanning was still pretty high. The other costs of lumpy reclaim would hide it of course. > Now > it can loop over the whole lru as long as we're scanning stuff. True, the alternative being failing the allocation. Returning sooner is of course an option, but it would be preferable to see a case where the logic after Johannes' patch is failing. > Ok to > overboost shrink_zone if we're making progress to allow compaction at > the next round, but if we don't visibly make progress, I'm concerned > that it may be too aggressive to scan the whole list. The performance > benefit of having an hugepage isn't as huge as scanning all pages in > the lru when before we would have broken the loop and declared failure > after only SWAP_CLUSTER_MAX pages, and then we would have fallen back > in a order 0 allocation. What about other cases such as order-1 allocations for stack or order-3 allocations for those network cards using jumbo frames without scatter/gather? Don't get me wrong, I see your point but I'm wondering if there really are cases where we routinely scan an entire LRU list of unevictable pages that are somehow not being migrated properly to the unevictable lists. If this is happening, we are also in trouble for reclaiming for order-0 pages, right? > The fix may help of course, maybe it's enough > for his case I don't know, but I don't see it making a whole lot of > difference, except now it will stop when the lru is practically empty > which clearly is an improvement. I think we shouldn't be so worried > about succeeding compaction, the important thing is we don't waste > time in compaction if there's not enough free memory but > compaction_suitable used by both logics should be enough for that. > > I'd rather prefer that if hugetlbfs has special needs it uses a __GFP_ It uses GFP_REPEAT. That is why I specifically mentioned it in the "NOTE" as an alternative to how we could break early while still being agressive when required. The only reason it's not that way now is because a) I didn't consider an LRU mostly full of unevictable pages to be the normal case and b) for allocations such as order-3 that are preferable not to fail. > flag or similar that increases how compaction is strict in succeeding, > up to scanning the whole lru in one go in order to make some free > memory for compaction to succeed. > > Going ahead with the scan until compaction_suitable is true instead > makes sense when there's absence of memory pressure and nr_reclaimed > is never zero. > > Maybe we should try a bit more times than just nr_reclaim but going > over the whole lru, sounds a bit extreme. > Where should be draw the line? We could come up with ratio of the lists depending on priority but it'd be hard to measure the gain or loss without having a profile of a problem case to look at. > The issue isn't just for unevictable pages, that will be refiled > during the scan but it will also happen in presence of lots of > referenced pages. For example if we don't apply my fix, the current > code can take down all young bits in all ptes in one go in the whole > system before returning from shrink_zone, that is too much in my view, > and losing all that information in one go (not even to tell the cost > associated with losing it) can hardly be offseted by the improvement > given by 1 more hugepage. > > But please let me know if I've misread something... > I don't think you have misread anything but if we're going to weaken this logic, I'd at least like to see the GFP_REPEAT option tried - i.e. preserve being aggressive if set. I'm also not convinced we routinely get into a situation where the LRU consists of almost all unevictable pages and if we are in this situation, that is a serious problem on its own. It would also be preferable if we could get latency figures on alloc_pages for hugepage-sized allocations and a count of how many are succeeding or failing to measure the impact (if any). -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 13:33 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-10 13:33 UTC (permalink / raw) To: Andrea Arcangeli Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 01:48:38PM +0100, Andrea Arcangeli wrote: > On Thu, Feb 10, 2011 at 10:21:10AM +0000, Mel Gorman wrote: > > We should not be ending up in a situation with the LRU list of only > > page_evictable pages and that situation persisting causing excessive (or > > infinite) looping. As unevictable pages are encountered on the LRU list, > > they should be moved to the unevictable lists by putback_lru_page(). Are you > > aware of a situation where this becomes broken? > > > > I recognise that SWAP_CLUSTER_MAX pages could all be unevictable and they > > are all get moved. In this case, nr_scanned is positive and we continue > > to scan but this is expected and desirable: Reclaim/compaction needs more > > pages to be freed before it starts compaction. If it stops scanning early, > > then it would just fail the allocation later. This is what the "NOTE" is about. > > > > I prefer Johannes' fix for the observed problem. > > should_continue_reclaim is only needed for compaction. It tries to > free enough pages so that compaction can succeed in its defrag > attempt. Correct. > So breaking the loop faster isn't going to cause failures for > 0 order pages. Also true, I commented on this in the "Note" your patch deletes and a suggestion on how an alternative would be to break early unless GFP_REPEAT. > My worry is that we loop too much in shrink_zone just > for compaction even when we don't do any progress. shrink_zone would > never scan more than SWAP_CLUSTER_MAX pages, before this change. Sortof. Lumpy reclaim would have scanned more than SWAP_CLUSTER_MAX so scanning was still pretty high. The other costs of lumpy reclaim would hide it of course. > Now > it can loop over the whole lru as long as we're scanning stuff. True, the alternative being failing the allocation. Returning sooner is of course an option, but it would be preferable to see a case where the logic after Johannes' patch is failing. > Ok to > overboost shrink_zone if we're making progress to allow compaction at > the next round, but if we don't visibly make progress, I'm concerned > that it may be too aggressive to scan the whole list. The performance > benefit of having an hugepage isn't as huge as scanning all pages in > the lru when before we would have broken the loop and declared failure > after only SWAP_CLUSTER_MAX pages, and then we would have fallen back > in a order 0 allocation. What about other cases such as order-1 allocations for stack or order-3 allocations for those network cards using jumbo frames without scatter/gather? Don't get me wrong, I see your point but I'm wondering if there really are cases where we routinely scan an entire LRU list of unevictable pages that are somehow not being migrated properly to the unevictable lists. If this is happening, we are also in trouble for reclaiming for order-0 pages, right? > The fix may help of course, maybe it's enough > for his case I don't know, but I don't see it making a whole lot of > difference, except now it will stop when the lru is practically empty > which clearly is an improvement. I think we shouldn't be so worried > about succeeding compaction, the important thing is we don't waste > time in compaction if there's not enough free memory but > compaction_suitable used by both logics should be enough for that. > > I'd rather prefer that if hugetlbfs has special needs it uses a __GFP_ It uses GFP_REPEAT. That is why I specifically mentioned it in the "NOTE" as an alternative to how we could break early while still being agressive when required. The only reason it's not that way now is because a) I didn't consider an LRU mostly full of unevictable pages to be the normal case and b) for allocations such as order-3 that are preferable not to fail. > flag or similar that increases how compaction is strict in succeeding, > up to scanning the whole lru in one go in order to make some free > memory for compaction to succeed. > > Going ahead with the scan until compaction_suitable is true instead > makes sense when there's absence of memory pressure and nr_reclaimed > is never zero. > > Maybe we should try a bit more times than just nr_reclaim but going > over the whole lru, sounds a bit extreme. > Where should be draw the line? We could come up with ratio of the lists depending on priority but it'd be hard to measure the gain or loss without having a profile of a problem case to look at. > The issue isn't just for unevictable pages, that will be refiled > during the scan but it will also happen in presence of lots of > referenced pages. For example if we don't apply my fix, the current > code can take down all young bits in all ptes in one go in the whole > system before returning from shrink_zone, that is too much in my view, > and losing all that information in one go (not even to tell the cost > associated with losing it) can hardly be offseted by the improvement > given by 1 more hugepage. > > But please let me know if I've misread something... > I don't think you have misread anything but if we're going to weaken this logic, I'd at least like to see the GFP_REPEAT option tried - i.e. preserve being aggressive if set. I'm also not convinced we routinely get into a situation where the LRU consists of almost all unevictable pages and if we are in this situation, that is a serious problem on its own. It would also be preferable if we could get latency figures on alloc_pages for hugepage-sized allocations and a count of how many are succeeding or failing to measure the impact (if any). -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-10 13:33 ` Mel Gorman @ 2011-02-10 14:14 ` Andrea Arcangeli -1 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-10 14:14 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 01:33:24PM +0000, Mel Gorman wrote: > Also true, I commented on this in the "Note" your patch deletes and a > suggestion on how an alternative would be to break early unless GFP_REPEAT. Yep noticed that ;), doing that with __GFP_REPEAT sounds just fine to me. > Sortof. Lumpy reclaim would have scanned more than SWAP_CLUSTER_MAX so > scanning was still pretty high. The other costs of lumpy reclaim would hide > it of course. Ok but we know lumpy reclaim was not ok to start with. > What about other cases such as order-1 allocations for stack or order-3 > allocations for those network cards using jumbo frames without > scatter/gather? stack order 1 is one of the few cases that come to mind where failing an allocation becomes fatal. Maybe we should use __GFP_REPEAT there too. But we probably need a way to discriminate callers that can gracefully fallback. I'd be extremely surprised if the cost of looping all over the lru taking down all young bits could ever be offseted by the jumbo frame. In fact the jumbo frame I'm afraid might be better off without using compaction at all because it's probably very latency sensitive. We need a 'lowlatency' version of compaction for these users where the improvement of having a compound page instead of a regular page isn't very significant. On a separated topic, I'm currently trying to use the new async compaction code upstream with jumbo frames. I'm also wondering if I'll have to set sync=0 by default unless __GFP_REPEAT is set. It seems adding compaction to jumbo frames is increasing latency to certain workloads in a measurable way. Things were fine when compaction was only used by THP and not for all order allocations (but I didn't test the async mode yet plus the other optimizations for compaction you did recently, I hope they're enough to jumbo frames). > Don't get me wrong, I see your point but I'm wondering if there really are > cases where we routinely scan an entire LRU list of unevictable pages that > are somehow not being migrated properly to the unevictable lists. If > this is happening, we are also in trouble for reclaiming for order-0 > pages, right? Well unevictable pages are just an example and like you said they last only one round of the loop at most. But other caching bits like the referenced bits and all young bits will get all taken down during all later loops too. We definitely don't want to swap just to allow compaction to succeed! I think this argument explains it pretty well, if you takedown all young bits in a constant loop, then system may end up swapping. That's definitely something we don't want. Things may be different if this is a stack allocation without fallback, or if it's hugetlbfs again without kernel fallback (only userland fallback). > It uses GFP_REPEAT. That is why I specifically mentioned it in the "NOTE" > as an alternative to how we could break early while still being agressive > when required. The only reason it's not that way now is because a) I didn't > consider an LRU mostly full of unevictable pages to be the normal case and b) > for allocations such as order-3 that are preferable not to fail. Ok. > Where should be draw the line? We could come up with ratio of the lists > depending on priority but it'd be hard to measure the gain or loss > without having a profile of a problem case to look at. I would just stick to !nr_reclaimed to break the loop, and ignore the nr_scanned unless __GFP_REPEAT is set, in which case you're welcome to scan everything. Then we've to decide if to add __GFP_REPEAT to the stack allocation... > I don't think you have misread anything but if we're going to weaken > this logic, I'd at least like to see the GFP_REPEAT option tried - i.e. I see the point of __GFP_REPEAT, that sounds the best, I should have just followed your comment but I felt scanning everything was too heavyweight regardless, but ok I see you want as much accuracy as possible in that case, even if that end up in a swap storm. > preserve being aggressive if set. I'm also not convinced we routinely get > into a situation where the LRU consists of almost all unevictable pages > and if we are in this situation, that is a serious problem on its own. It > would also be preferable if we could get latency figures on alloc_pages for > hugepage-sized allocations and a count of how many are succeeding or failing > to measure the impact (if any). I think I made not the best example talking about unevictable pages. I said that because the code is like below, but to me all the goto something following the !page_evictable check were also relevant for this shrink_zone loop. The real life issue is avoid swap storm (or expensive loop flooding the whole system of ipis to takedown all young bits in all ptes), to allocate an hugepage or jumboframe that has a graceful fallback that performs not hugely slower than the hugepage/jumboframe. sc->nr_scanned++; if (unlikely(!page_evictable(page, NULL))) goto cull_mlocked; I think making the !nr_scanned check conditional to __GFP_REPEAT as the comment suggested, for now is the best way to go. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 14:14 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-10 14:14 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 01:33:24PM +0000, Mel Gorman wrote: > Also true, I commented on this in the "Note" your patch deletes and a > suggestion on how an alternative would be to break early unless GFP_REPEAT. Yep noticed that ;), doing that with __GFP_REPEAT sounds just fine to me. > Sortof. Lumpy reclaim would have scanned more than SWAP_CLUSTER_MAX so > scanning was still pretty high. The other costs of lumpy reclaim would hide > it of course. Ok but we know lumpy reclaim was not ok to start with. > What about other cases such as order-1 allocations for stack or order-3 > allocations for those network cards using jumbo frames without > scatter/gather? stack order 1 is one of the few cases that come to mind where failing an allocation becomes fatal. Maybe we should use __GFP_REPEAT there too. But we probably need a way to discriminate callers that can gracefully fallback. I'd be extremely surprised if the cost of looping all over the lru taking down all young bits could ever be offseted by the jumbo frame. In fact the jumbo frame I'm afraid might be better off without using compaction at all because it's probably very latency sensitive. We need a 'lowlatency' version of compaction for these users where the improvement of having a compound page instead of a regular page isn't very significant. On a separated topic, I'm currently trying to use the new async compaction code upstream with jumbo frames. I'm also wondering if I'll have to set sync=0 by default unless __GFP_REPEAT is set. It seems adding compaction to jumbo frames is increasing latency to certain workloads in a measurable way. Things were fine when compaction was only used by THP and not for all order allocations (but I didn't test the async mode yet plus the other optimizations for compaction you did recently, I hope they're enough to jumbo frames). > Don't get me wrong, I see your point but I'm wondering if there really are > cases where we routinely scan an entire LRU list of unevictable pages that > are somehow not being migrated properly to the unevictable lists. If > this is happening, we are also in trouble for reclaiming for order-0 > pages, right? Well unevictable pages are just an example and like you said they last only one round of the loop at most. But other caching bits like the referenced bits and all young bits will get all taken down during all later loops too. We definitely don't want to swap just to allow compaction to succeed! I think this argument explains it pretty well, if you takedown all young bits in a constant loop, then system may end up swapping. That's definitely something we don't want. Things may be different if this is a stack allocation without fallback, or if it's hugetlbfs again without kernel fallback (only userland fallback). > It uses GFP_REPEAT. That is why I specifically mentioned it in the "NOTE" > as an alternative to how we could break early while still being agressive > when required. The only reason it's not that way now is because a) I didn't > consider an LRU mostly full of unevictable pages to be the normal case and b) > for allocations such as order-3 that are preferable not to fail. Ok. > Where should be draw the line? We could come up with ratio of the lists > depending on priority but it'd be hard to measure the gain or loss > without having a profile of a problem case to look at. I would just stick to !nr_reclaimed to break the loop, and ignore the nr_scanned unless __GFP_REPEAT is set, in which case you're welcome to scan everything. Then we've to decide if to add __GFP_REPEAT to the stack allocation... > I don't think you have misread anything but if we're going to weaken > this logic, I'd at least like to see the GFP_REPEAT option tried - i.e. I see the point of __GFP_REPEAT, that sounds the best, I should have just followed your comment but I felt scanning everything was too heavyweight regardless, but ok I see you want as much accuracy as possible in that case, even if that end up in a swap storm. > preserve being aggressive if set. I'm also not convinced we routinely get > into a situation where the LRU consists of almost all unevictable pages > and if we are in this situation, that is a serious problem on its own. It > would also be preferable if we could get latency figures on alloc_pages for > hugepage-sized allocations and a count of how many are succeeding or failing > to measure the impact (if any). I think I made not the best example talking about unevictable pages. I said that because the code is like below, but to me all the goto something following the !page_evictable check were also relevant for this shrink_zone loop. The real life issue is avoid swap storm (or expensive loop flooding the whole system of ipis to takedown all young bits in all ptes), to allocate an hugepage or jumboframe that has a graceful fallback that performs not hugely slower than the hugepage/jumboframe. sc->nr_scanned++; if (unlikely(!page_evictable(page, NULL))) goto cull_mlocked; I think making the !nr_scanned check conditional to __GFP_REPEAT as the comment suggested, for now is the best way to go. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-10 14:14 ` Andrea Arcangeli @ 2011-02-10 14:58 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-10 14:58 UTC (permalink / raw) To: Andrea Arcangeli Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 03:14:47PM +0100, Andrea Arcangeli wrote: > On Thu, Feb 10, 2011 at 01:33:24PM +0000, Mel Gorman wrote: > > Also true, I commented on this in the "Note" your patch deletes and a > > suggestion on how an alternative would be to break early unless GFP_REPEAT. > > Yep noticed that ;), doing that with __GFP_REPEAT sounds just fine to me. > Great. > > Sortof. Lumpy reclaim would have scanned more than SWAP_CLUSTER_MAX so > > scanning was still pretty high. The other costs of lumpy reclaim would hide > > it of course. > > Ok but we know lumpy reclaim was not ok to start with. > Sure. > > What about other cases such as order-1 allocations for stack or order-3 > > allocations for those network cards using jumbo frames without > > scatter/gather? > > stack order 1 is one of the few cases that come to mind where failing > an allocation becomes fatal. Maybe we should use __GFP_REPEAT there > too. > Actually, there shouldn't be need. Small allocations such as order-1 effectively loop indefinitely due to the check in should_alloc_retry(). This means that even if reclaim/compaction breaks earlier than it should, it'll get tried again. > But we probably need a way to discriminate callers that can gracefully > fallback. I'd be extremely surprised if the cost of looping all over > the lru taking down all young bits could ever be offseted by the jumbo > frame. In fact the jumbo frame I'm afraid might be better off without > using compaction at all because it's probably very latency > sensitive. It depends entirely on whether the jumbo frame can be received with order-0 pages. If not, it means it's dropping packets which as worse latency. > We need a 'lowlatency' version of compaction for these > users where the improvement of having a compound page instead of a > regular page isn't very significant. > It's not impossible to pass this information in once the cases where it is required are identified. > On a separated topic, I'm currently trying to use the new async > compaction code upstream with jumbo frames. I'm also wondering if I'll > have to set sync=0 by default unless __GFP_REPEAT is set. It seems > adding compaction to jumbo frames is increasing latency to certain > workloads in a measurable way. This is interesting. Any profiles showing where the time is being spent? In the event an order-3 allocation fails with the particular network card, is it able to fallback to order-0 pages? > Things were fine when compaction was > only used by THP and not for all order allocations (but I didn't test > the async mode yet plus the other optimizations for compaction you did > recently, I hope they're enough to jumbo frames). > Wish I had your test rig :/ > > Don't get me wrong, I see your point but I'm wondering if there really are > > cases where we routinely scan an entire LRU list of unevictable pages that > > are somehow not being migrated properly to the unevictable lists. If > > this is happening, we are also in trouble for reclaiming for order-0 > > pages, right? > > Well unevictable pages are just an example and like you said they last > only one round of the loop at most. But other caching bits like the > referenced bits and all young bits will get all taken down during all > later loops too. We definitely don't want to swap just to allow > compaction to succeed! I think this argument explains it pretty well, > if you takedown all young bits in a constant loop, then system may end > up swapping. That's definitely something we don't want. > Avoiding the clearing of young bits is a much stronger arguement. > Things may be different if this is a stack allocation without > fallback, or if it's hugetlbfs again without kernel fallback (only > userland fallback). > > > It uses GFP_REPEAT. That is why I specifically mentioned it in the "NOTE" > > as an alternative to how we could break early while still being agressive > > when required. The only reason it's not that way now is because a) I didn't > > consider an LRU mostly full of unevictable pages to be the normal case and b) > > for allocations such as order-3 that are preferable not to fail. > > Ok. > > > Where should be draw the line? We could come up with ratio of the lists > > depending on priority but it'd be hard to measure the gain or loss > > without having a profile of a problem case to look at. > > I would just stick to !nr_reclaimed to break the loop, and ignore the > nr_scanned unless __GFP_REPEAT is set, in which case you're welcome to > scan everything. > Patch that should do this is below. > Then we've to decide if to add __GFP_REPEAT to the stack allocation... > It shouldn't be necessary as the allocator will continue looping. > > I don't think you have misread anything but if we're going to weaken > > this logic, I'd at least like to see the GFP_REPEAT option tried - i.e. > > I see the point of __GFP_REPEAT, that sounds the best, I should have > just followed your comment but I felt scanning everything was too > heavyweight regardless, but ok I see you want as much accuracy as > possible in that case, even if that end up in a swap storm. > > > preserve being aggressive if set. I'm also not convinced we routinely get > > into a situation where the LRU consists of almost all unevictable pages > > and if we are in this situation, that is a serious problem on its own. It > > would also be preferable if we could get latency figures on alloc_pages for > > hugepage-sized allocations and a count of how many are succeeding or failing > > to measure the impact (if any). > > I think I made not the best example talking about unevictable pages. I > said that because the code is like below, but to me all the goto > something following the !page_evictable check were also relevant for > this shrink_zone loop. The real life issue is avoid swap storm (or > expensive loop flooding the whole system of ipis to takedown all young > bits in all ptes), to allocate an hugepage or jumboframe that has a > graceful fallback that performs not hugely slower than the > hugepage/jumboframe. > > sc->nr_scanned++; > > if (unlikely(!page_evictable(page, NULL))) > goto cull_mlocked; > > I think making the !nr_scanned check conditional to __GFP_REPEAT as > the comment suggested, for now is the best way to go. > Ok, here is a patch that should do that. This does *not* replace Johannes' patch which I think should still be merged. However, I am unable to test this at the moment. My laptop and test machines are 200KM away and inaccessible until next Tuesday at the earliest. The machine I'm typing this mail from is unsuitable for testing with. Are you in the position to test THP with it applied for me please? ==== CUT HERE ==== mm: vmscan: Stop reclaim/compaction when reclaim is failing for !__GFP_REPEAT allocations should_continue_reclaim() for reclaim/compaction allows scanning to continue even if pages are not being reclaimed until the full list is scanned. In terms of allocation success, this makes sense but potentially it introduces unwanted latency for transparent hugepages and network jumbo frames that would prefer to fail the allocation attempt and fallback to order-0 pages. Worse, there is a potential that the full LRU scan will clear all the young bits, distort page aging information and potentially push pages into swap that would have otherwise remained resident. This patch will stop reclaim/compaction if no pages were reclaimed in the last SWAP_CLUSTER_MAX pages that were considered. For allocations such as hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/vmscan.c | 32 ++++++++++++++++++++++---------- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..591b907 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone, if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; - /* - * If we failed to reclaim and have scanned the full list, stop. - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far - * faster but obviously would be less likely to succeed - * allocation. If this is desirable, use GFP_REPEAT to decide - * if both reclaimed and scanned should be checked or just - * reclaimed - */ - if (!nr_reclaimed && !nr_scanned) - return false; + /* Consider stopping depending on scan and reclaim activity */ + if (sc->gfp_mask & __GFP_REPEAT) { + /* + * For GFP_REPEAT allocations, stop reclaiming if the + * full LRU list has been scanned and we are still failing + * to reclaim pages. This full LRU scan is potentially + * expensive but a GFP_REPEAT caller really wants to succeed + */ + if (!nr_reclaimed && !nr_scanned) + return false; + } else { + /* + * For non-GFP_REPEAT allocations which can presumably + * fail without consequence, stop if we failed to reclaim + * any pages from the last SWAP_CLUSTER_MAX number of + * pages that were scanned. This will return to the + * caller faster at the risk reclaim/compaction and + * the resulting allocation attempt will fail + */ + if (!nr_reclaimed) + return false; + } /* * If we have not reclaimed enough pages for compaction and the ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 14:58 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-10 14:58 UTC (permalink / raw) To: Andrea Arcangeli Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 03:14:47PM +0100, Andrea Arcangeli wrote: > On Thu, Feb 10, 2011 at 01:33:24PM +0000, Mel Gorman wrote: > > Also true, I commented on this in the "Note" your patch deletes and a > > suggestion on how an alternative would be to break early unless GFP_REPEAT. > > Yep noticed that ;), doing that with __GFP_REPEAT sounds just fine to me. > Great. > > Sortof. Lumpy reclaim would have scanned more than SWAP_CLUSTER_MAX so > > scanning was still pretty high. The other costs of lumpy reclaim would hide > > it of course. > > Ok but we know lumpy reclaim was not ok to start with. > Sure. > > What about other cases such as order-1 allocations for stack or order-3 > > allocations for those network cards using jumbo frames without > > scatter/gather? > > stack order 1 is one of the few cases that come to mind where failing > an allocation becomes fatal. Maybe we should use __GFP_REPEAT there > too. > Actually, there shouldn't be need. Small allocations such as order-1 effectively loop indefinitely due to the check in should_alloc_retry(). This means that even if reclaim/compaction breaks earlier than it should, it'll get tried again. > But we probably need a way to discriminate callers that can gracefully > fallback. I'd be extremely surprised if the cost of looping all over > the lru taking down all young bits could ever be offseted by the jumbo > frame. In fact the jumbo frame I'm afraid might be better off without > using compaction at all because it's probably very latency > sensitive. It depends entirely on whether the jumbo frame can be received with order-0 pages. If not, it means it's dropping packets which as worse latency. > We need a 'lowlatency' version of compaction for these > users where the improvement of having a compound page instead of a > regular page isn't very significant. > It's not impossible to pass this information in once the cases where it is required are identified. > On a separated topic, I'm currently trying to use the new async > compaction code upstream with jumbo frames. I'm also wondering if I'll > have to set sync=0 by default unless __GFP_REPEAT is set. It seems > adding compaction to jumbo frames is increasing latency to certain > workloads in a measurable way. This is interesting. Any profiles showing where the time is being spent? In the event an order-3 allocation fails with the particular network card, is it able to fallback to order-0 pages? > Things were fine when compaction was > only used by THP and not for all order allocations (but I didn't test > the async mode yet plus the other optimizations for compaction you did > recently, I hope they're enough to jumbo frames). > Wish I had your test rig :/ > > Don't get me wrong, I see your point but I'm wondering if there really are > > cases where we routinely scan an entire LRU list of unevictable pages that > > are somehow not being migrated properly to the unevictable lists. If > > this is happening, we are also in trouble for reclaiming for order-0 > > pages, right? > > Well unevictable pages are just an example and like you said they last > only one round of the loop at most. But other caching bits like the > referenced bits and all young bits will get all taken down during all > later loops too. We definitely don't want to swap just to allow > compaction to succeed! I think this argument explains it pretty well, > if you takedown all young bits in a constant loop, then system may end > up swapping. That's definitely something we don't want. > Avoiding the clearing of young bits is a much stronger arguement. > Things may be different if this is a stack allocation without > fallback, or if it's hugetlbfs again without kernel fallback (only > userland fallback). > > > It uses GFP_REPEAT. That is why I specifically mentioned it in the "NOTE" > > as an alternative to how we could break early while still being agressive > > when required. The only reason it's not that way now is because a) I didn't > > consider an LRU mostly full of unevictable pages to be the normal case and b) > > for allocations such as order-3 that are preferable not to fail. > > Ok. > > > Where should be draw the line? We could come up with ratio of the lists > > depending on priority but it'd be hard to measure the gain or loss > > without having a profile of a problem case to look at. > > I would just stick to !nr_reclaimed to break the loop, and ignore the > nr_scanned unless __GFP_REPEAT is set, in which case you're welcome to > scan everything. > Patch that should do this is below. > Then we've to decide if to add __GFP_REPEAT to the stack allocation... > It shouldn't be necessary as the allocator will continue looping. > > I don't think you have misread anything but if we're going to weaken > > this logic, I'd at least like to see the GFP_REPEAT option tried - i.e. > > I see the point of __GFP_REPEAT, that sounds the best, I should have > just followed your comment but I felt scanning everything was too > heavyweight regardless, but ok I see you want as much accuracy as > possible in that case, even if that end up in a swap storm. > > > preserve being aggressive if set. I'm also not convinced we routinely get > > into a situation where the LRU consists of almost all unevictable pages > > and if we are in this situation, that is a serious problem on its own. It > > would also be preferable if we could get latency figures on alloc_pages for > > hugepage-sized allocations and a count of how many are succeeding or failing > > to measure the impact (if any). > > I think I made not the best example talking about unevictable pages. I > said that because the code is like below, but to me all the goto > something following the !page_evictable check were also relevant for > this shrink_zone loop. The real life issue is avoid swap storm (or > expensive loop flooding the whole system of ipis to takedown all young > bits in all ptes), to allocate an hugepage or jumboframe that has a > graceful fallback that performs not hugely slower than the > hugepage/jumboframe. > > sc->nr_scanned++; > > if (unlikely(!page_evictable(page, NULL))) > goto cull_mlocked; > > I think making the !nr_scanned check conditional to __GFP_REPEAT as > the comment suggested, for now is the best way to go. > Ok, here is a patch that should do that. This does *not* replace Johannes' patch which I think should still be merged. However, I am unable to test this at the moment. My laptop and test machines are 200KM away and inaccessible until next Tuesday at the earliest. The machine I'm typing this mail from is unsuitable for testing with. Are you in the position to test THP with it applied for me please? ==== CUT HERE ==== mm: vmscan: Stop reclaim/compaction when reclaim is failing for !__GFP_REPEAT allocations should_continue_reclaim() for reclaim/compaction allows scanning to continue even if pages are not being reclaimed until the full list is scanned. In terms of allocation success, this makes sense but potentially it introduces unwanted latency for transparent hugepages and network jumbo frames that would prefer to fail the allocation attempt and fallback to order-0 pages. Worse, there is a potential that the full LRU scan will clear all the young bits, distort page aging information and potentially push pages into swap that would have otherwise remained resident. This patch will stop reclaim/compaction if no pages were reclaimed in the last SWAP_CLUSTER_MAX pages that were considered. For allocations such as hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/vmscan.c | 32 ++++++++++++++++++++++---------- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..591b907 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone, if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; - /* - * If we failed to reclaim and have scanned the full list, stop. - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far - * faster but obviously would be less likely to succeed - * allocation. If this is desirable, use GFP_REPEAT to decide - * if both reclaimed and scanned should be checked or just - * reclaimed - */ - if (!nr_reclaimed && !nr_scanned) - return false; + /* Consider stopping depending on scan and reclaim activity */ + if (sc->gfp_mask & __GFP_REPEAT) { + /* + * For GFP_REPEAT allocations, stop reclaiming if the + * full LRU list has been scanned and we are still failing + * to reclaim pages. This full LRU scan is potentially + * expensive but a GFP_REPEAT caller really wants to succeed + */ + if (!nr_reclaimed && !nr_scanned) + return false; + } else { + /* + * For non-GFP_REPEAT allocations which can presumably + * fail without consequence, stop if we failed to reclaim + * any pages from the last SWAP_CLUSTER_MAX number of + * pages that were scanned. This will return to the + * caller faster at the risk reclaim/compaction and + * the resulting allocation attempt will fail + */ + if (!nr_reclaimed) + return false; + } /* * If we have not reclaimed enough pages for compaction and the -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-10 14:58 ` Mel Gorman @ 2011-02-16 9:50 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-16 9:50 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel should_continue_reclaim() for reclaim/compaction allows scanning to continue even if pages are not being reclaimed until the full list is scanned. In terms of allocation success, this makes sense but potentially it introduces unwanted latency for high-order allocations such as transparent hugepages and network jumbo frames that would prefer to fail the allocation attempt and fallback to order-0 pages. Worse, there is a potential that the full LRU scan will clear all the young bits, distort page aging information and potentially push pages into swap that would have otherwise remained resident. This patch will stop reclaim/compaction if no pages were reclaimed in the last SWAP_CLUSTER_MAX pages that were considered. For allocations such as hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. To test this, a tool was developed based on ftrace that tracked the latency of high-order allocations while transparent hugepage support was enabled and three benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's patch "vmscan: fix zone shrinking exit when scan work is done" applied. STREAM Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 10298 10229 1 :: Min 0.4560 0.4640 1 :: Mean 1.0589 1.0183 1 :: Max 14.5990 11.7510 1 :: Stddev 0.5208 0.4719 2 :: Count 2 1 2 :: Min 1.8610 3.7240 2 :: Mean 3.4325 3.7240 2 :: Max 5.0040 3.7240 2 :: Stddev 1.5715 0.0000 9 :: Count 111696 111694 9 :: Min 0.5230 0.4110 9 :: Mean 10.5831 10.5718 9 :: Max 38.4480 43.2900 9 :: Stddev 1.1147 1.1325 Mean time for order-1 allocations is reduced. order-2 looks increased but with so few allocations, it's not particularly significant. THP mean allocation latency is also reduced. That said, allocation time varies so significantly that the reductions are within noise. Max allocation time is reduced by a significant amount for low-order allocations but reduced for THP allocations which presumably are now breaking before reclaim has done enough work. SysBench Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 15745 15677 1 :: Min 0.4250 0.4550 1 :: Mean 1.1023 1.0810 1 :: Max 14.4590 10.8220 1 :: Stddev 0.5117 0.5100 2 :: Count 1 1 2 :: Min 3.0040 2.1530 2 :: Mean 3.0040 2.1530 2 :: Max 3.0040 2.1530 2 :: Stddev 0.0000 0.0000 9 :: Count 2017 1931 9 :: Min 0.4980 0.7480 9 :: Mean 10.4717 10.3840 9 :: Max 24.9460 26.2500 9 :: Stddev 1.1726 1.1966 Again, mean time for order-1 allocations is reduced while order-2 allocations are too few to draw conclusions from. The mean time for THP allocations is also slightly reduced albeit the reductions are within varianes. Once again, our maximum allocation time is significantly reduced for low-order allocations and slightly increased for THP allocations. Anon stream mmap reference Highorder Allocation Latency Statistics 1 :: Count 1376 1790 1 :: Min 0.4940 0.5010 1 :: Mean 1.0289 0.9732 1 :: Max 6.2670 4.2540 1 :: Stddev 0.4142 0.2785 2 :: Count 1 - 2 :: Min 1.9060 - 2 :: Mean 1.9060 - 2 :: Max 1.9060 - 2 :: Stddev 0.0000 - 9 :: Count 11266 11257 9 :: Min 0.4990 0.4940 9 :: Mean 27250.4669 24256.1919 9 :: Max 11439211.0000 6008885.0000 9 :: Stddev 226427.4624 186298.1430 This benchmark creates one thread per CPU which references an amount of anonymous memory 1.5 times the size of physical RAM. This pounds swap quite heavily and is intended to exercise THP a bit. Mean allocation time for order-1 is reduced as before. It's also reduced for THP allocations but the variations here are pretty massive due to swap. As before, maximum allocation times are significantly reduced. Overall, the patch reduces the mean and maximum allocation latencies for the smaller high-order allocations. This was with Slab configured so it would be expected to be more significant with Slub which uses these size allocations more aggressively. The mean allocation times for THP allocations are also slightly reduced. The maximum latency was slightly increased as predicted by the comments due to reclaim/compaction breaking early. However, workloads care more about the latency of lower-order allocations than THP so it's an acceptable trade-off. Please consider merging for 2.6.38. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/vmscan.c | 32 ++++++++++++++++++++++---------- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..591b907 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone, if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; - /* - * If we failed to reclaim and have scanned the full list, stop. - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far - * faster but obviously would be less likely to succeed - * allocation. If this is desirable, use GFP_REPEAT to decide - * if both reclaimed and scanned should be checked or just - * reclaimed - */ - if (!nr_reclaimed && !nr_scanned) - return false; + /* Consider stopping depending on scan and reclaim activity */ + if (sc->gfp_mask & __GFP_REPEAT) { + /* + * For GFP_REPEAT allocations, stop reclaiming if the + * full LRU list has been scanned and we are still failing + * to reclaim pages. This full LRU scan is potentially + * expensive but a GFP_REPEAT caller really wants to succeed + */ + if (!nr_reclaimed && !nr_scanned) + return false; + } else { + /* + * For non-GFP_REPEAT allocations which can presumably + * fail without consequence, stop if we failed to reclaim + * any pages from the last SWAP_CLUSTER_MAX number of + * pages that were scanned. This will return to the + * caller faster at the risk reclaim/compaction and + * the resulting allocation attempt fails + */ + if (!nr_reclaimed) + return false; + } /* * If we have not reclaimed enough pages for compaction and the ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 9:50 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-16 9:50 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel should_continue_reclaim() for reclaim/compaction allows scanning to continue even if pages are not being reclaimed until the full list is scanned. In terms of allocation success, this makes sense but potentially it introduces unwanted latency for high-order allocations such as transparent hugepages and network jumbo frames that would prefer to fail the allocation attempt and fallback to order-0 pages. Worse, there is a potential that the full LRU scan will clear all the young bits, distort page aging information and potentially push pages into swap that would have otherwise remained resident. This patch will stop reclaim/compaction if no pages were reclaimed in the last SWAP_CLUSTER_MAX pages that were considered. For allocations such as hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. To test this, a tool was developed based on ftrace that tracked the latency of high-order allocations while transparent hugepage support was enabled and three benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's patch "vmscan: fix zone shrinking exit when scan work is done" applied. STREAM Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 10298 10229 1 :: Min 0.4560 0.4640 1 :: Mean 1.0589 1.0183 1 :: Max 14.5990 11.7510 1 :: Stddev 0.5208 0.4719 2 :: Count 2 1 2 :: Min 1.8610 3.7240 2 :: Mean 3.4325 3.7240 2 :: Max 5.0040 3.7240 2 :: Stddev 1.5715 0.0000 9 :: Count 111696 111694 9 :: Min 0.5230 0.4110 9 :: Mean 10.5831 10.5718 9 :: Max 38.4480 43.2900 9 :: Stddev 1.1147 1.1325 Mean time for order-1 allocations is reduced. order-2 looks increased but with so few allocations, it's not particularly significant. THP mean allocation latency is also reduced. That said, allocation time varies so significantly that the reductions are within noise. Max allocation time is reduced by a significant amount for low-order allocations but reduced for THP allocations which presumably are now breaking before reclaim has done enough work. SysBench Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 15745 15677 1 :: Min 0.4250 0.4550 1 :: Mean 1.1023 1.0810 1 :: Max 14.4590 10.8220 1 :: Stddev 0.5117 0.5100 2 :: Count 1 1 2 :: Min 3.0040 2.1530 2 :: Mean 3.0040 2.1530 2 :: Max 3.0040 2.1530 2 :: Stddev 0.0000 0.0000 9 :: Count 2017 1931 9 :: Min 0.4980 0.7480 9 :: Mean 10.4717 10.3840 9 :: Max 24.9460 26.2500 9 :: Stddev 1.1726 1.1966 Again, mean time for order-1 allocations is reduced while order-2 allocations are too few to draw conclusions from. The mean time for THP allocations is also slightly reduced albeit the reductions are within varianes. Once again, our maximum allocation time is significantly reduced for low-order allocations and slightly increased for THP allocations. Anon stream mmap reference Highorder Allocation Latency Statistics 1 :: Count 1376 1790 1 :: Min 0.4940 0.5010 1 :: Mean 1.0289 0.9732 1 :: Max 6.2670 4.2540 1 :: Stddev 0.4142 0.2785 2 :: Count 1 - 2 :: Min 1.9060 - 2 :: Mean 1.9060 - 2 :: Max 1.9060 - 2 :: Stddev 0.0000 - 9 :: Count 11266 11257 9 :: Min 0.4990 0.4940 9 :: Mean 27250.4669 24256.1919 9 :: Max 11439211.0000 6008885.0000 9 :: Stddev 226427.4624 186298.1430 This benchmark creates one thread per CPU which references an amount of anonymous memory 1.5 times the size of physical RAM. This pounds swap quite heavily and is intended to exercise THP a bit. Mean allocation time for order-1 is reduced as before. It's also reduced for THP allocations but the variations here are pretty massive due to swap. As before, maximum allocation times are significantly reduced. Overall, the patch reduces the mean and maximum allocation latencies for the smaller high-order allocations. This was with Slab configured so it would be expected to be more significant with Slub which uses these size allocations more aggressively. The mean allocation times for THP allocations are also slightly reduced. The maximum latency was slightly increased as predicted by the comments due to reclaim/compaction breaking early. However, workloads care more about the latency of lower-order allocations than THP so it's an acceptable trade-off. Please consider merging for 2.6.38. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/vmscan.c | 32 ++++++++++++++++++++++---------- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..591b907 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone, if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; - /* - * If we failed to reclaim and have scanned the full list, stop. - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far - * faster but obviously would be less likely to succeed - * allocation. If this is desirable, use GFP_REPEAT to decide - * if both reclaimed and scanned should be checked or just - * reclaimed - */ - if (!nr_reclaimed && !nr_scanned) - return false; + /* Consider stopping depending on scan and reclaim activity */ + if (sc->gfp_mask & __GFP_REPEAT) { + /* + * For GFP_REPEAT allocations, stop reclaiming if the + * full LRU list has been scanned and we are still failing + * to reclaim pages. This full LRU scan is potentially + * expensive but a GFP_REPEAT caller really wants to succeed + */ + if (!nr_reclaimed && !nr_scanned) + return false; + } else { + /* + * For non-GFP_REPEAT allocations which can presumably + * fail without consequence, stop if we failed to reclaim + * any pages from the last SWAP_CLUSTER_MAX number of + * pages that were scanned. This will return to the + * caller faster at the risk reclaim/compaction and + * the resulting allocation attempt fails + */ + if (!nr_reclaimed) + return false; + } /* * If we have not reclaimed enough pages for compaction and the -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 9:50 ` Mel Gorman @ 2011-02-16 10:13 ` Andrea Arcangeli -1 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-16 10:13 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > The mean allocation times for THP allocations are also slightly reduced. > The maximum latency was slightly increased as predicted by the comments due > to reclaim/compaction breaking early. However, workloads care more about the > latency of lower-order allocations than THP so it's an acceptable trade-off. > Please consider merging for 2.6.38. Full agreement. I'm currently dealing with latency issues (nothing major! but still not nice to see a reproducible regression, even if a small one only visible in the benchmark) with compaction and jumbo frames. This won't be enough to close them completely though because I didn't backport the change to vmscan.c and should_continue_reclaim (I backported all the other compaction improvements though, so this practically is the only missing bit). I also suspected the e1000 driver, that sets the NAPI latency to bulk_latency when it uses jumbo frames, so I wonder if it could be that with compaction we get more jumbo frames and the latency then gets reduced by the driver as side effect. Not sure yet. I like the above because it's less likely to give us compaction issues with jumbo frames when I add should_continue_reclaim on top. It seems anonymous memory allocation are orders of magnitude more long lived than jumbo frames could ever be, so at this point I'm not even entirely certain it's ok to enable compaction at all for jumbo frames. But I still like the above regardless of my current issue (just because of the young bits going nuked in one go the lumpy hammer way, even if it actually increases latency a bit for THP allocations). One issue with compaction for jumbo frames, is the potentially very long loop, for the scan in isolated_migratepages. I added a counter to break the loop after 1024 pages scanned. This is extreme but this is a debug patch for now, I also did if (retval == bulk_latency) reval = low_latency in the e1000* drivers to see if it makes a difference. If any of the two will help I will track down how much each change contributes to lowering the network latency to pre-compaction levels. It may very well be only a compaction issue, or only a driver issue, I don't know yet (the latter less likely because this very compaction loop spikes at the top of oprofile output, but maybe that only affects throughput and the driver is to blame for the latency reduction... this is what I'm going to find pretty soon). Also this isolate_migratepages loop I think needs a cond_resched() (I didn't add that yet ;). 1024 pages scanned is too few, I just want to see how it behaves with an extremely permissive setting. I'll let you know when I come to some more reliable conclusion. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 10:13 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-16 10:13 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > The mean allocation times for THP allocations are also slightly reduced. > The maximum latency was slightly increased as predicted by the comments due > to reclaim/compaction breaking early. However, workloads care more about the > latency of lower-order allocations than THP so it's an acceptable trade-off. > Please consider merging for 2.6.38. Full agreement. I'm currently dealing with latency issues (nothing major! but still not nice to see a reproducible regression, even if a small one only visible in the benchmark) with compaction and jumbo frames. This won't be enough to close them completely though because I didn't backport the change to vmscan.c and should_continue_reclaim (I backported all the other compaction improvements though, so this practically is the only missing bit). I also suspected the e1000 driver, that sets the NAPI latency to bulk_latency when it uses jumbo frames, so I wonder if it could be that with compaction we get more jumbo frames and the latency then gets reduced by the driver as side effect. Not sure yet. I like the above because it's less likely to give us compaction issues with jumbo frames when I add should_continue_reclaim on top. It seems anonymous memory allocation are orders of magnitude more long lived than jumbo frames could ever be, so at this point I'm not even entirely certain it's ok to enable compaction at all for jumbo frames. But I still like the above regardless of my current issue (just because of the young bits going nuked in one go the lumpy hammer way, even if it actually increases latency a bit for THP allocations). One issue with compaction for jumbo frames, is the potentially very long loop, for the scan in isolated_migratepages. I added a counter to break the loop after 1024 pages scanned. This is extreme but this is a debug patch for now, I also did if (retval == bulk_latency) reval = low_latency in the e1000* drivers to see if it makes a difference. If any of the two will help I will track down how much each change contributes to lowering the network latency to pre-compaction levels. It may very well be only a compaction issue, or only a driver issue, I don't know yet (the latter less likely because this very compaction loop spikes at the top of oprofile output, but maybe that only affects throughput and the driver is to blame for the latency reduction... this is what I'm going to find pretty soon). Also this isolate_migratepages loop I think needs a cond_resched() (I didn't add that yet ;). 1024 pages scanned is too few, I just want to see how it behaves with an extremely permissive setting. I'll let you know when I come to some more reliable conclusion. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 10:13 ` Andrea Arcangeli @ 2011-02-16 11:22 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-16 11:22 UTC (permalink / raw) To: Andrea Arcangeli Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 11:13:01AM +0100, Andrea Arcangeli wrote: > On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > > The mean allocation times for THP allocations are also slightly reduced. > > The maximum latency was slightly increased as predicted by the comments due > > to reclaim/compaction breaking early. However, workloads care more about the > > latency of lower-order allocations than THP so it's an acceptable trade-off. > > Please consider merging for 2.6.38. > > Full agreement. I'm currently dealing with latency issues (nothing > major! but still not nice to see a reproducible regression, even if a > small one only visible in the benchmark) with compaction and jumbo > frames. Out of curiousity, what are you measuring the latency of and how? I used a combination of the function_graph ftrace analyser and the mm_page_alloc tracepoint myself to avoid any additional patching and it was easier than cobbling together something with kprobes. A perl script configures ftrace and then parses the contents of trace_pipe - crude but does the job without patching the kernel. > This won't be enough to close them completely though because I > didn't backport the change to vmscan.c and should_continue_reclaim (I > backported all the other compaction improvements though, so this > practically is the only missing bit). How big are the discrepancies? > I also suspected the e1000 > driver, that sets the NAPI latency to bulk_latency when it uses jumbo > frames, so I wonder if it could be that with compaction we get more > jumbo frames and the latency then gets reduced by the driver as side > effect. Not sure yet. > No idea. > I like the above because it's less likely to give us compaction issues > with jumbo frames when I add should_continue_reclaim on top. It seems > anonymous memory allocation are orders of magnitude more long lived > than jumbo frames could ever be, so at this point I'm not even > entirely certain it's ok to enable compaction at all for jumbo > frames. But I still like the above regardless of my current issue > (just because of the young bits going nuked in one go the lumpy hammer > way, even if it actually increases latency a bit for THP allocations). > Can I have your ack on the patch then? Even if it doesn't resolve the jumbo frame problems, it's in the right direction. Measuring how it currently behaves and what direction should be taken may be something still worth discussing at LSF/MM. > One issue with compaction for jumbo frames, is the potentially very > long loop, for the scan in isolated_migratepages. Yes, the scanner is poor. The scanner for free pages is potentially just as bad. I prototyped some designs that should have been faster but they didn't make any significant difference so they got discarded. > I added a counter to > break the loop after 1024 pages scanned. This is extreme but this is a > debug patch for now, I also did if (retval == bulk_latency) reval = > low_latency in the e1000* drivers to see if it makes a difference. If > any of the two will help I will track down how much each change > contributes to lowering the network latency to pre-compaction > levels. It may very well be only a compaction issue, or only a driver > issue, I don't know yet (the latter less likely because this very > compaction loop spikes at the top of oprofile output, but maybe that > only affects throughput and the driver is to blame for the latency > reduction... this is what I'm going to find pretty soon). Also this > isolate_migratepages loop I think needs a cond_resched() This surprises me. In my own tests at least, the compaction stuff was way down in the profile and I wouldn't have expected scanning to take so long as to require a cond_resched. I was depending on the cond_resched() in migrate_pages() to yield the processor as necessary. > (I didn't add > that yet ;). 1024 pages scanned is too few, I just want to see how it > behaves with an extremely permissive setting. I'll let you know when I > come to some more reliable conclusion. > Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 11:22 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-16 11:22 UTC (permalink / raw) To: Andrea Arcangeli Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 11:13:01AM +0100, Andrea Arcangeli wrote: > On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > > The mean allocation times for THP allocations are also slightly reduced. > > The maximum latency was slightly increased as predicted by the comments due > > to reclaim/compaction breaking early. However, workloads care more about the > > latency of lower-order allocations than THP so it's an acceptable trade-off. > > Please consider merging for 2.6.38. > > Full agreement. I'm currently dealing with latency issues (nothing > major! but still not nice to see a reproducible regression, even if a > small one only visible in the benchmark) with compaction and jumbo > frames. Out of curiousity, what are you measuring the latency of and how? I used a combination of the function_graph ftrace analyser and the mm_page_alloc tracepoint myself to avoid any additional patching and it was easier than cobbling together something with kprobes. A perl script configures ftrace and then parses the contents of trace_pipe - crude but does the job without patching the kernel. > This won't be enough to close them completely though because I > didn't backport the change to vmscan.c and should_continue_reclaim (I > backported all the other compaction improvements though, so this > practically is the only missing bit). How big are the discrepancies? > I also suspected the e1000 > driver, that sets the NAPI latency to bulk_latency when it uses jumbo > frames, so I wonder if it could be that with compaction we get more > jumbo frames and the latency then gets reduced by the driver as side > effect. Not sure yet. > No idea. > I like the above because it's less likely to give us compaction issues > with jumbo frames when I add should_continue_reclaim on top. It seems > anonymous memory allocation are orders of magnitude more long lived > than jumbo frames could ever be, so at this point I'm not even > entirely certain it's ok to enable compaction at all for jumbo > frames. But I still like the above regardless of my current issue > (just because of the young bits going nuked in one go the lumpy hammer > way, even if it actually increases latency a bit for THP allocations). > Can I have your ack on the patch then? Even if it doesn't resolve the jumbo frame problems, it's in the right direction. Measuring how it currently behaves and what direction should be taken may be something still worth discussing at LSF/MM. > One issue with compaction for jumbo frames, is the potentially very > long loop, for the scan in isolated_migratepages. Yes, the scanner is poor. The scanner for free pages is potentially just as bad. I prototyped some designs that should have been faster but they didn't make any significant difference so they got discarded. > I added a counter to > break the loop after 1024 pages scanned. This is extreme but this is a > debug patch for now, I also did if (retval == bulk_latency) reval = > low_latency in the e1000* drivers to see if it makes a difference. If > any of the two will help I will track down how much each change > contributes to lowering the network latency to pre-compaction > levels. It may very well be only a compaction issue, or only a driver > issue, I don't know yet (the latter less likely because this very > compaction loop spikes at the top of oprofile output, but maybe that > only affects throughput and the driver is to blame for the latency > reduction... this is what I'm going to find pretty soon). Also this > isolate_migratepages loop I think needs a cond_resched() This surprises me. In my own tests at least, the compaction stuff was way down in the profile and I wouldn't have expected scanning to take so long as to require a cond_resched. I was depending on the cond_resched() in migrate_pages() to yield the processor as necessary. > (I didn't add > that yet ;). 1024 pages scanned is too few, I just want to see how it > behaves with an extremely permissive setting. I'll let you know when I > come to some more reliable conclusion. > Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 11:22 ` Mel Gorman @ 2011-02-16 14:44 ` Andrea Arcangeli -1 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-16 14:44 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 11:22:32AM +0000, Mel Gorman wrote: > Out of curiousity, what are you measuring the latency of and how? I used > a combination of the function_graph ftrace analyser and the mm_page_alloc > tracepoint myself to avoid any additional patching and it was easier than > cobbling together something with kprobes. A perl script configures ftrace > and then parses the contents of trace_pipe - crude but does the job without > patching the kernel. It's some complex benchmark that is measuring the latency from userland, I think latency is measured from clients (not the server running compaction). > How big are the discrepancies? Latency in msec/op goes up from 1.1 to 5.4 starting from half the peak load. But then latency stays flat with compaction, eventually the peak load latency is similar. It just goes immediately from 1.1 to 5.4 in the middle and it's slightly higher even for the light load runs. > No idea. I guess it's very hard to tell unless we try. I just nuked the bulk_latency for the jumbo frames and forced the driver to always stay in low_latency mode (in NAPI ->poll method of the driver), just in case it's not compaction to blame but a side effect of compaction providing jumbo frames much more frequently to the driver. > Can I have your ack on the patch then? Even if it doesn't resolve the Sure, I acked it explicitly in separate email ;). > jumbo frame problems, it's in the right direction. Measuring how it > currently behaves and what direction should be taken may be something > still worth discussing at LSF/MM. Agreed! > > One issue with compaction for jumbo frames, is the potentially very > > long loop, for the scan in isolated_migratepages. > > Yes, the scanner is poor. The scanner for free pages is potentially just > as bad. I prototyped some designs that should have been faster but they > didn't make any significant difference so they got discarded. But the scanner for free pages a nr_scanned countdown and breaks the loop way sooner. Also most of the >order allocations must have a fallback so scanning everything for succeeding order 0 is much more obviously safe than scanning everything to provide an order 2 allocation, if the order 0 allocation could be provided immediately without scanning anything. It's not a trivial problem when we deal with short lived allocations. Also the throughput is equal or a little higher (not necessarily related to compaction though), the latency is the real measurable regression. > This surprises me. In my own tests at least, the compaction stuff was > way down in the profile and I wouldn't have expected scanning to take so > long as to require a cond_resched. I was depending on the cond_resched() > in migrate_pages() to yield the processor as necessary. If migrate_pages runs often likely won't need to scan too many pages in the first place. I think cond_resched is good idea in that loop considering the current possible worst case. This is the profiling. This is with basically 2.6.37 compaction code so only enabled for THP sized allocations and not for order <= PAGE_ALLOC_COSTLY_ORDER and not for kswapd. Samples % of Total Cum. Samples Cum. % of Total module:function ------------------------------------------------------------------------------------------------- 177786 6.178 177786 6.178 sunrpc:svc_recv 128779 4.475 306565 10.654 sunrpc:svc_xprt_enqueue 80786 2.807 387351 13.462 vmlinux:__d_lookup 62272 2.164 449623 15.626 ext4:ext4_htree_store_dirent 55896 1.942 505519 17.569 jbd2:journal_clean_one_cp_list 43868 1.524 549387 19.093 vmlinux:task_rq_lock 43572 1.514 592959 20.608 vmlinux:kfree 37620 1.307 630579 21.915 vmlinux:mwait_idle 36169 1.257 666748 23.172 vmlinux:schedule 34037 1.182 700785 24.355 e1000:e1000_clean 31945 1.110 732730 25.465 vmlinux:find_busiest_group 31491 1.094 764221 26.560 qla2xxx:qla24xx_intr_handler 30681 1.066 794902 27.626 vmlinux:_atomic_dec_and_lock 7425 0.258 xxxxxx xxxxxx vmlinux:get_page_from_freelist This is with 2.6.38 compaction code enabled for all !order in both direct compaction and kswapd (it includes async compaction/migrate and the preferred pageblock selection in !cc->sync mode). It basically only doesn't include the should_continue_reclaim loop as that could only potentially increase the latency even further so I skipped it for now (I'll add it later with your __GFP_RECLAIM new patch). Samples % of Total Cum. Samples Cum. % of Total module:function ------------------------------------------------------------------------------------------------- 1182928 17.358 1182928 17.358 vmlinux:get_page_from_freelist 657802 9.652 1840730 27.011 vmlinux:free_pcppages_bulk 579976 8.510 2420706 35.522 sunrpc:svc_xprt_enqueue 508953 7.468 2929659 42.991 sunrpc:svc_recv 490538 7.198 3420197 50.189 vmlinux:compaction_alloc 188620 2.767 3608817 52.957 vmlinux:tg_shares_up 97527 1.431 3706344 54.388 vmlinux:__d_lookup 85670 1.257 3792014 55.646 jbd2:journal_clean_one_cp_list 71738 1.052 3863752 56.698 vmlinux:mutex_spin_on_owner 71037 1.042 3934789 57.741 vmlinux:kfree Basically it was my patch that enabled compaction for all order sized allocations and in kswapd as well that started this but I think I only exposed the problem and if the jumbo frame would have order 4 instead of order 1/2/3, it'd happen regardless of my patch. Later I'm also going to check if it's the kswapd invocation that causes the problem (so trying with only direct compaction) but I doubt it'll help. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 14:44 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-16 14:44 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 11:22:32AM +0000, Mel Gorman wrote: > Out of curiousity, what are you measuring the latency of and how? I used > a combination of the function_graph ftrace analyser and the mm_page_alloc > tracepoint myself to avoid any additional patching and it was easier than > cobbling together something with kprobes. A perl script configures ftrace > and then parses the contents of trace_pipe - crude but does the job without > patching the kernel. It's some complex benchmark that is measuring the latency from userland, I think latency is measured from clients (not the server running compaction). > How big are the discrepancies? Latency in msec/op goes up from 1.1 to 5.4 starting from half the peak load. But then latency stays flat with compaction, eventually the peak load latency is similar. It just goes immediately from 1.1 to 5.4 in the middle and it's slightly higher even for the light load runs. > No idea. I guess it's very hard to tell unless we try. I just nuked the bulk_latency for the jumbo frames and forced the driver to always stay in low_latency mode (in NAPI ->poll method of the driver), just in case it's not compaction to blame but a side effect of compaction providing jumbo frames much more frequently to the driver. > Can I have your ack on the patch then? Even if it doesn't resolve the Sure, I acked it explicitly in separate email ;). > jumbo frame problems, it's in the right direction. Measuring how it > currently behaves and what direction should be taken may be something > still worth discussing at LSF/MM. Agreed! > > One issue with compaction for jumbo frames, is the potentially very > > long loop, for the scan in isolated_migratepages. > > Yes, the scanner is poor. The scanner for free pages is potentially just > as bad. I prototyped some designs that should have been faster but they > didn't make any significant difference so they got discarded. But the scanner for free pages a nr_scanned countdown and breaks the loop way sooner. Also most of the >order allocations must have a fallback so scanning everything for succeeding order 0 is much more obviously safe than scanning everything to provide an order 2 allocation, if the order 0 allocation could be provided immediately without scanning anything. It's not a trivial problem when we deal with short lived allocations. Also the throughput is equal or a little higher (not necessarily related to compaction though), the latency is the real measurable regression. > This surprises me. In my own tests at least, the compaction stuff was > way down in the profile and I wouldn't have expected scanning to take so > long as to require a cond_resched. I was depending on the cond_resched() > in migrate_pages() to yield the processor as necessary. If migrate_pages runs often likely won't need to scan too many pages in the first place. I think cond_resched is good idea in that loop considering the current possible worst case. This is the profiling. This is with basically 2.6.37 compaction code so only enabled for THP sized allocations and not for order <= PAGE_ALLOC_COSTLY_ORDER and not for kswapd. Samples % of Total Cum. Samples Cum. % of Total module:function ------------------------------------------------------------------------------------------------- 177786 6.178 177786 6.178 sunrpc:svc_recv 128779 4.475 306565 10.654 sunrpc:svc_xprt_enqueue 80786 2.807 387351 13.462 vmlinux:__d_lookup 62272 2.164 449623 15.626 ext4:ext4_htree_store_dirent 55896 1.942 505519 17.569 jbd2:journal_clean_one_cp_list 43868 1.524 549387 19.093 vmlinux:task_rq_lock 43572 1.514 592959 20.608 vmlinux:kfree 37620 1.307 630579 21.915 vmlinux:mwait_idle 36169 1.257 666748 23.172 vmlinux:schedule 34037 1.182 700785 24.355 e1000:e1000_clean 31945 1.110 732730 25.465 vmlinux:find_busiest_group 31491 1.094 764221 26.560 qla2xxx:qla24xx_intr_handler 30681 1.066 794902 27.626 vmlinux:_atomic_dec_and_lock 7425 0.258 xxxxxx xxxxxx vmlinux:get_page_from_freelist This is with 2.6.38 compaction code enabled for all !order in both direct compaction and kswapd (it includes async compaction/migrate and the preferred pageblock selection in !cc->sync mode). It basically only doesn't include the should_continue_reclaim loop as that could only potentially increase the latency even further so I skipped it for now (I'll add it later with your __GFP_RECLAIM new patch). Samples % of Total Cum. Samples Cum. % of Total module:function ------------------------------------------------------------------------------------------------- 1182928 17.358 1182928 17.358 vmlinux:get_page_from_freelist 657802 9.652 1840730 27.011 vmlinux:free_pcppages_bulk 579976 8.510 2420706 35.522 sunrpc:svc_xprt_enqueue 508953 7.468 2929659 42.991 sunrpc:svc_recv 490538 7.198 3420197 50.189 vmlinux:compaction_alloc 188620 2.767 3608817 52.957 vmlinux:tg_shares_up 97527 1.431 3706344 54.388 vmlinux:__d_lookup 85670 1.257 3792014 55.646 jbd2:journal_clean_one_cp_list 71738 1.052 3863752 56.698 vmlinux:mutex_spin_on_owner 71037 1.042 3934789 57.741 vmlinux:kfree Basically it was my patch that enabled compaction for all order sized allocations and in kswapd as well that started this but I think I only exposed the problem and if the jumbo frame would have order 4 instead of order 1/2/3, it'd happen regardless of my patch. Later I'm also going to check if it's the kswapd invocation that causes the problem (so trying with only direct compaction) but I doubt it'll help. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 9:50 ` Mel Gorman @ 2011-02-16 12:03 ` Andrea Arcangeli -1 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-16 12:03 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/vmscan.c | 32 ++++++++++++++++++++++---------- > 1 files changed, 22 insertions(+), 10 deletions(-) Acked-by: Andrea Arcangeli <aarcange@redhat.com> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 12:03 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2011-02-16 12:03 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/vmscan.c | 32 ++++++++++++++++++++++---------- > 1 files changed, 22 insertions(+), 10 deletions(-) Acked-by: Andrea Arcangeli <aarcange@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 9:50 ` Mel Gorman @ 2011-02-16 12:14 ` Rik van Riel -1 siblings, 0 replies; 44+ messages in thread From: Rik van Riel @ 2011-02-16 12:14 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On 02/16/2011 04:50 AM, Mel Gorman wrote: > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. For allocations such as > hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU > list may still be scanned. > Signed-off-by: Mel Gorman<mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 12:14 ` Rik van Riel 0 siblings, 0 replies; 44+ messages in thread From: Rik van Riel @ 2011-02-16 12:14 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On 02/16/2011 04:50 AM, Mel Gorman wrote: > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. For allocations such as > hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU > list may still be scanned. > Signed-off-by: Mel Gorman<mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 9:50 ` Mel Gorman @ 2011-02-16 12:38 ` Johannes Weiner -1 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2011-02-16 12:38 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > should_continue_reclaim() for reclaim/compaction allows scanning to continue > even if pages are not being reclaimed until the full list is scanned. In > terms of allocation success, this makes sense but potentially it introduces > unwanted latency for high-order allocations such as transparent hugepages > and network jumbo frames that would prefer to fail the allocation attempt > and fallback to order-0 pages. Worse, there is a potential that the full > LRU scan will clear all the young bits, distort page aging information and > potentially push pages into swap that would have otherwise remained resident. > > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. For allocations such as > hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU > list may still be scanned. > > To test this, a tool was developed based on ftrace that tracked the latency of > high-order allocations while transparent hugepage support was enabled and three > benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's > patch "vmscan: fix zone shrinking exit when scan work is done" applied. > > STREAM Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 10298 10229 > 1 :: Min 0.4560 0.4640 > 1 :: Mean 1.0589 1.0183 > 1 :: Max 14.5990 11.7510 > 1 :: Stddev 0.5208 0.4719 > 2 :: Count 2 1 > 2 :: Min 1.8610 3.7240 > 2 :: Mean 3.4325 3.7240 > 2 :: Max 5.0040 3.7240 > 2 :: Stddev 1.5715 0.0000 > 9 :: Count 111696 111694 > 9 :: Min 0.5230 0.4110 > 9 :: Mean 10.5831 10.5718 > 9 :: Max 38.4480 43.2900 > 9 :: Stddev 1.1147 1.1325 > > Mean time for order-1 allocations is reduced. order-2 looks increased > but with so few allocations, it's not particularly significant. THP mean > allocation latency is also reduced. That said, allocation time varies so > significantly that the reductions are within noise. > > Max allocation time is reduced by a significant amount for low-order > allocations but reduced for THP allocations which presumably are now > breaking before reclaim has done enough work. > > SysBench Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 15745 15677 > 1 :: Min 0.4250 0.4550 > 1 :: Mean 1.1023 1.0810 > 1 :: Max 14.4590 10.8220 > 1 :: Stddev 0.5117 0.5100 > 2 :: Count 1 1 > 2 :: Min 3.0040 2.1530 > 2 :: Mean 3.0040 2.1530 > 2 :: Max 3.0040 2.1530 > 2 :: Stddev 0.0000 0.0000 > 9 :: Count 2017 1931 > 9 :: Min 0.4980 0.7480 > 9 :: Mean 10.4717 10.3840 > 9 :: Max 24.9460 26.2500 > 9 :: Stddev 1.1726 1.1966 > > Again, mean time for order-1 allocations is reduced while order-2 allocations > are too few to draw conclusions from. The mean time for THP allocations is > also slightly reduced albeit the reductions are within varianes. > > Once again, our maximum allocation time is significantly reduced for > low-order allocations and slightly increased for THP allocations. > > Anon stream mmap reference Highorder Allocation Latency Statistics > 1 :: Count 1376 1790 > 1 :: Min 0.4940 0.5010 > 1 :: Mean 1.0289 0.9732 > 1 :: Max 6.2670 4.2540 > 1 :: Stddev 0.4142 0.2785 > 2 :: Count 1 - > 2 :: Min 1.9060 - > 2 :: Mean 1.9060 - > 2 :: Max 1.9060 - > 2 :: Stddev 0.0000 - > 9 :: Count 11266 11257 > 9 :: Min 0.4990 0.4940 > 9 :: Mean 27250.4669 24256.1919 > 9 :: Max 11439211.0000 6008885.0000 > 9 :: Stddev 226427.4624 186298.1430 > > This benchmark creates one thread per CPU which references an amount of > anonymous memory 1.5 times the size of physical RAM. This pounds swap quite > heavily and is intended to exercise THP a bit. > > Mean allocation time for order-1 is reduced as before. It's also reduced > for THP allocations but the variations here are pretty massive due to swap. > As before, maximum allocation times are significantly reduced. > > Overall, the patch reduces the mean and maximum allocation latencies for > the smaller high-order allocations. This was with Slab configured so it > would be expected to be more significant with Slub which uses these size > allocations more aggressively. > > The mean allocation times for THP allocations are also slightly reduced. > The maximum latency was slightly increased as predicted by the comments due > to reclaim/compaction breaking early. However, workloads care more about the > latency of lower-order allocations than THP so it's an acceptable trade-off. > Please consider merging for 2.6.38. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 12:38 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2011-02-16 12:38 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote: > should_continue_reclaim() for reclaim/compaction allows scanning to continue > even if pages are not being reclaimed until the full list is scanned. In > terms of allocation success, this makes sense but potentially it introduces > unwanted latency for high-order allocations such as transparent hugepages > and network jumbo frames that would prefer to fail the allocation attempt > and fallback to order-0 pages. Worse, there is a potential that the full > LRU scan will clear all the young bits, distort page aging information and > potentially push pages into swap that would have otherwise remained resident. > > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. For allocations such as > hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU > list may still be scanned. > > To test this, a tool was developed based on ftrace that tracked the latency of > high-order allocations while transparent hugepage support was enabled and three > benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's > patch "vmscan: fix zone shrinking exit when scan work is done" applied. > > STREAM Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 10298 10229 > 1 :: Min 0.4560 0.4640 > 1 :: Mean 1.0589 1.0183 > 1 :: Max 14.5990 11.7510 > 1 :: Stddev 0.5208 0.4719 > 2 :: Count 2 1 > 2 :: Min 1.8610 3.7240 > 2 :: Mean 3.4325 3.7240 > 2 :: Max 5.0040 3.7240 > 2 :: Stddev 1.5715 0.0000 > 9 :: Count 111696 111694 > 9 :: Min 0.5230 0.4110 > 9 :: Mean 10.5831 10.5718 > 9 :: Max 38.4480 43.2900 > 9 :: Stddev 1.1147 1.1325 > > Mean time for order-1 allocations is reduced. order-2 looks increased > but with so few allocations, it's not particularly significant. THP mean > allocation latency is also reduced. That said, allocation time varies so > significantly that the reductions are within noise. > > Max allocation time is reduced by a significant amount for low-order > allocations but reduced for THP allocations which presumably are now > breaking before reclaim has done enough work. > > SysBench Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 15745 15677 > 1 :: Min 0.4250 0.4550 > 1 :: Mean 1.1023 1.0810 > 1 :: Max 14.4590 10.8220 > 1 :: Stddev 0.5117 0.5100 > 2 :: Count 1 1 > 2 :: Min 3.0040 2.1530 > 2 :: Mean 3.0040 2.1530 > 2 :: Max 3.0040 2.1530 > 2 :: Stddev 0.0000 0.0000 > 9 :: Count 2017 1931 > 9 :: Min 0.4980 0.7480 > 9 :: Mean 10.4717 10.3840 > 9 :: Max 24.9460 26.2500 > 9 :: Stddev 1.1726 1.1966 > > Again, mean time for order-1 allocations is reduced while order-2 allocations > are too few to draw conclusions from. The mean time for THP allocations is > also slightly reduced albeit the reductions are within varianes. > > Once again, our maximum allocation time is significantly reduced for > low-order allocations and slightly increased for THP allocations. > > Anon stream mmap reference Highorder Allocation Latency Statistics > 1 :: Count 1376 1790 > 1 :: Min 0.4940 0.5010 > 1 :: Mean 1.0289 0.9732 > 1 :: Max 6.2670 4.2540 > 1 :: Stddev 0.4142 0.2785 > 2 :: Count 1 - > 2 :: Min 1.9060 - > 2 :: Mean 1.9060 - > 2 :: Max 1.9060 - > 2 :: Stddev 0.0000 - > 9 :: Count 11266 11257 > 9 :: Min 0.4990 0.4940 > 9 :: Mean 27250.4669 24256.1919 > 9 :: Max 11439211.0000 6008885.0000 > 9 :: Stddev 226427.4624 186298.1430 > > This benchmark creates one thread per CPU which references an amount of > anonymous memory 1.5 times the size of physical RAM. This pounds swap quite > heavily and is intended to exercise THP a bit. > > Mean allocation time for order-1 is reduced as before. It's also reduced > for THP allocations but the variations here are pretty massive due to swap. > As before, maximum allocation times are significantly reduced. > > Overall, the patch reduces the mean and maximum allocation latencies for > the smaller high-order allocations. This was with Slab configured so it > would be expected to be more significant with Slub which uses these size > allocations more aggressively. > > The mean allocation times for THP allocations are also slightly reduced. > The maximum latency was slightly increased as predicted by the comments due > to reclaim/compaction breaking early. However, workloads care more about the > latency of lower-order allocations than THP so it's an acceptable trade-off. > Please consider merging for 2.6.38. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 9:50 ` Mel Gorman @ 2011-02-16 23:26 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2011-02-16 23:26 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 6:50 PM, Mel Gorman <mel@csn.ul.ie> wrote: > should_continue_reclaim() for reclaim/compaction allows scanning to continue > even if pages are not being reclaimed until the full list is scanned. In > terms of allocation success, this makes sense but potentially it introduces > unwanted latency for high-order allocations such as transparent hugepages > and network jumbo frames that would prefer to fail the allocation attempt > and fallback to order-0 pages. Worse, there is a potential that the full > LRU scan will clear all the young bits, distort page aging information and > potentially push pages into swap that would have otherwise remained resident. > > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. For allocations such as > hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU > list may still be scanned. > > To test this, a tool was developed based on ftrace that tracked the latency of > high-order allocations while transparent hugepage support was enabled and three > benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's > patch "vmscan: fix zone shrinking exit when scan work is done" applied. > > STREAM Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 10298 10229 > 1 :: Min 0.4560 0.4640 > 1 :: Mean 1.0589 1.0183 > 1 :: Max 14.5990 11.7510 > 1 :: Stddev 0.5208 0.4719 > 2 :: Count 2 1 > 2 :: Min 1.8610 3.7240 > 2 :: Mean 3.4325 3.7240 > 2 :: Max 5.0040 3.7240 > 2 :: Stddev 1.5715 0.0000 > 9 :: Count 111696 111694 > 9 :: Min 0.5230 0.4110 > 9 :: Mean 10.5831 10.5718 > 9 :: Max 38.4480 43.2900 > 9 :: Stddev 1.1147 1.1325 > > Mean time for order-1 allocations is reduced. order-2 looks increased > but with so few allocations, it's not particularly significant. THP mean > allocation latency is also reduced. That said, allocation time varies so > significantly that the reductions are within noise. > > Max allocation time is reduced by a significant amount for low-order > allocations but reduced for THP allocations which presumably are now > breaking before reclaim has done enough work. > > SysBench Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 15745 15677 > 1 :: Min 0.4250 0.4550 > 1 :: Mean 1.1023 1.0810 > 1 :: Max 14.4590 10.8220 > 1 :: Stddev 0.5117 0.5100 > 2 :: Count 1 1 > 2 :: Min 3.0040 2.1530 > 2 :: Mean 3.0040 2.1530 > 2 :: Max 3.0040 2.1530 > 2 :: Stddev 0.0000 0.0000 > 9 :: Count 2017 1931 > 9 :: Min 0.4980 0.7480 > 9 :: Mean 10.4717 10.3840 > 9 :: Max 24.9460 26.2500 > 9 :: Stddev 1.1726 1.1966 > > Again, mean time for order-1 allocations is reduced while order-2 allocations > are too few to draw conclusions from. The mean time for THP allocations is > also slightly reduced albeit the reductions are within varianes. > > Once again, our maximum allocation time is significantly reduced for > low-order allocations and slightly increased for THP allocations. > > Anon stream mmap reference Highorder Allocation Latency Statistics > 1 :: Count 1376 1790 > 1 :: Min 0.4940 0.5010 > 1 :: Mean 1.0289 0.9732 > 1 :: Max 6.2670 4.2540 > 1 :: Stddev 0.4142 0.2785 > 2 :: Count 1 - > 2 :: Min 1.9060 - > 2 :: Mean 1.9060 - > 2 :: Max 1.9060 - > 2 :: Stddev 0.0000 - > 9 :: Count 11266 11257 > 9 :: Min 0.4990 0.4940 > 9 :: Mean 27250.4669 24256.1919 > 9 :: Max 11439211.0000 6008885.0000 > 9 :: Stddev 226427.4624 186298.1430 > > This benchmark creates one thread per CPU which references an amount of > anonymous memory 1.5 times the size of physical RAM. This pounds swap quite > heavily and is intended to exercise THP a bit. > > Mean allocation time for order-1 is reduced as before. It's also reduced > for THP allocations but the variations here are pretty massive due to swap. > As before, maximum allocation times are significantly reduced. > > Overall, the patch reduces the mean and maximum allocation latencies for > the smaller high-order allocations. This was with Slab configured so it > would be expected to be more significant with Slub which uses these size > allocations more aggressively. > > The mean allocation times for THP allocations are also slightly reduced. > The maximum latency was slightly increased as predicted by the comments due > to reclaim/compaction breaking early. However, workloads care more about the > latency of lower-order allocations than THP so it's an acceptable trade-off. > Please consider merging for 2.6.38. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/vmscan.c | 32 ++++++++++++++++++++++---------- > 1 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 148c6e6..591b907 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone, > if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) > return false; > > - /* > - * If we failed to reclaim and have scanned the full list, stop. > - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far > - * faster but obviously would be less likely to succeed > - * allocation. If this is desirable, use GFP_REPEAT to decide Typo. __GFP_REPEAT Otherwise, looks good to me. Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-16 23:26 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2011-02-16 23:26 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, Feb 16, 2011 at 6:50 PM, Mel Gorman <mel@csn.ul.ie> wrote: > should_continue_reclaim() for reclaim/compaction allows scanning to continue > even if pages are not being reclaimed until the full list is scanned. In > terms of allocation success, this makes sense but potentially it introduces > unwanted latency for high-order allocations such as transparent hugepages > and network jumbo frames that would prefer to fail the allocation attempt > and fallback to order-0 pages. Worse, there is a potential that the full > LRU scan will clear all the young bits, distort page aging information and > potentially push pages into swap that would have otherwise remained resident. > > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. For allocations such as > hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU > list may still be scanned. > > To test this, a tool was developed based on ftrace that tracked the latency of > high-order allocations while transparent hugepage support was enabled and three > benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's > patch "vmscan: fix zone shrinking exit when scan work is done" applied. > > STREAM Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 10298 10229 > 1 :: Min 0.4560 0.4640 > 1 :: Mean 1.0589 1.0183 > 1 :: Max 14.5990 11.7510 > 1 :: Stddev 0.5208 0.4719 > 2 :: Count 2 1 > 2 :: Min 1.8610 3.7240 > 2 :: Mean 3.4325 3.7240 > 2 :: Max 5.0040 3.7240 > 2 :: Stddev 1.5715 0.0000 > 9 :: Count 111696 111694 > 9 :: Min 0.5230 0.4110 > 9 :: Mean 10.5831 10.5718 > 9 :: Max 38.4480 43.2900 > 9 :: Stddev 1.1147 1.1325 > > Mean time for order-1 allocations is reduced. order-2 looks increased > but with so few allocations, it's not particularly significant. THP mean > allocation latency is also reduced. That said, allocation time varies so > significantly that the reductions are within noise. > > Max allocation time is reduced by a significant amount for low-order > allocations but reduced for THP allocations which presumably are now > breaking before reclaim has done enough work. > > SysBench Highorder Allocation Latency Statistics > fix-infinite break-early > 1 :: Count 15745 15677 > 1 :: Min 0.4250 0.4550 > 1 :: Mean 1.1023 1.0810 > 1 :: Max 14.4590 10.8220 > 1 :: Stddev 0.5117 0.5100 > 2 :: Count 1 1 > 2 :: Min 3.0040 2.1530 > 2 :: Mean 3.0040 2.1530 > 2 :: Max 3.0040 2.1530 > 2 :: Stddev 0.0000 0.0000 > 9 :: Count 2017 1931 > 9 :: Min 0.4980 0.7480 > 9 :: Mean 10.4717 10.3840 > 9 :: Max 24.9460 26.2500 > 9 :: Stddev 1.1726 1.1966 > > Again, mean time for order-1 allocations is reduced while order-2 allocations > are too few to draw conclusions from. The mean time for THP allocations is > also slightly reduced albeit the reductions are within varianes. > > Once again, our maximum allocation time is significantly reduced for > low-order allocations and slightly increased for THP allocations. > > Anon stream mmap reference Highorder Allocation Latency Statistics > 1 :: Count 1376 1790 > 1 :: Min 0.4940 0.5010 > 1 :: Mean 1.0289 0.9732 > 1 :: Max 6.2670 4.2540 > 1 :: Stddev 0.4142 0.2785 > 2 :: Count 1 - > 2 :: Min 1.9060 - > 2 :: Mean 1.9060 - > 2 :: Max 1.9060 - > 2 :: Stddev 0.0000 - > 9 :: Count 11266 11257 > 9 :: Min 0.4990 0.4940 > 9 :: Mean 27250.4669 24256.1919 > 9 :: Max 11439211.0000 6008885.0000 > 9 :: Stddev 226427.4624 186298.1430 > > This benchmark creates one thread per CPU which references an amount of > anonymous memory 1.5 times the size of physical RAM. This pounds swap quite > heavily and is intended to exercise THP a bit. > > Mean allocation time for order-1 is reduced as before. It's also reduced > for THP allocations but the variations here are pretty massive due to swap. > As before, maximum allocation times are significantly reduced. > > Overall, the patch reduces the mean and maximum allocation latencies for > the smaller high-order allocations. This was with Slab configured so it > would be expected to be more significant with Slub which uses these size > allocations more aggressively. > > The mean allocation times for THP allocations are also slightly reduced. > The maximum latency was slightly increased as predicted by the comments due > to reclaim/compaction breaking early. However, workloads care more about the > latency of lower-order allocations than THP so it's an acceptable trade-off. > Please consider merging for 2.6.38. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/vmscan.c | 32 ++++++++++++++++++++++---------- > 1 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 148c6e6..591b907 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone, > if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) > return false; > > - /* > - * If we failed to reclaim and have scanned the full list, stop. > - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far > - * faster but obviously would be less likely to succeed > - * allocation. If this is desirable, use GFP_REPEAT to decide Typo. __GFP_REPEAT Otherwise, looks good to me. Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-16 9:50 ` Mel Gorman @ 2011-02-17 22:22 ` Andrew Morton -1 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2011-02-17 22:22 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, 16 Feb 2011 09:50:49 +0000 Mel Gorman <mel@csn.ul.ie> wrote: > should_continue_reclaim() for reclaim/compaction allows scanning to continue > even if pages are not being reclaimed until the full list is scanned. In > terms of allocation success, this makes sense but potentially it introduces > unwanted latency for high-order allocations such as transparent hugepages > and network jumbo frames that would prefer to fail the allocation attempt > and fallback to order-0 pages. Worse, there is a potential that the full > LRU scan will clear all the young bits, distort page aging information and > potentially push pages into swap that would have otherwise remained resident. afaict the patch affects order-0 allocations as well. What are the implications of this? Also, what might be the downsides of this change, and did you test for them? > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. a) Why SWAP_CLUSTER_MAX? Is (SWAP_CLUSTER_MAX+7) better or worse? b) The sentence doesn't seem even vaguely accurate. shrink_zone() will scan vastly more than SWAP_CLUSTER_MAX pages before calling should_continue_reclaim(). Confused. c) The patch doesn't "stop reclaim/compaction" fully. It stops it against one zone. reclaim will then advance on to any other eligible zones. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-17 22:22 ` Andrew Morton 0 siblings, 0 replies; 44+ messages in thread From: Andrew Morton @ 2011-02-17 22:22 UTC (permalink / raw) To: Mel Gorman Cc: Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Wed, 16 Feb 2011 09:50:49 +0000 Mel Gorman <mel@csn.ul.ie> wrote: > should_continue_reclaim() for reclaim/compaction allows scanning to continue > even if pages are not being reclaimed until the full list is scanned. In > terms of allocation success, this makes sense but potentially it introduces > unwanted latency for high-order allocations such as transparent hugepages > and network jumbo frames that would prefer to fail the allocation attempt > and fallback to order-0 pages. Worse, there is a potential that the full > LRU scan will clear all the young bits, distort page aging information and > potentially push pages into swap that would have otherwise remained resident. afaict the patch affects order-0 allocations as well. What are the implications of this? Also, what might be the downsides of this change, and did you test for them? > This patch will stop reclaim/compaction if no pages were reclaimed in the > last SWAP_CLUSTER_MAX pages that were considered. a) Why SWAP_CLUSTER_MAX? Is (SWAP_CLUSTER_MAX+7) better or worse? b) The sentence doesn't seem even vaguely accurate. shrink_zone() will scan vastly more than SWAP_CLUSTER_MAX pages before calling should_continue_reclaim(). Confused. c) The patch doesn't "stop reclaim/compaction" fully. It stops it against one zone. reclaim will then advance on to any other eligible zones. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-17 22:22 ` Andrew Morton @ 2011-02-18 12:22 ` Mel Gorman -1 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-18 12:22 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 17, 2011 at 02:22:09PM -0800, Andrew Morton wrote: > On Wed, 16 Feb 2011 09:50:49 +0000 > Mel Gorman <mel@csn.ul.ie> wrote: > > > should_continue_reclaim() for reclaim/compaction allows scanning to continue > > even if pages are not being reclaimed until the full list is scanned. In > > terms of allocation success, this makes sense but potentially it introduces > > unwanted latency for high-order allocations such as transparent hugepages > > and network jumbo frames that would prefer to fail the allocation attempt > > and fallback to order-0 pages. Worse, there is a potential that the full > > LRU scan will clear all the young bits, distort page aging information and > > potentially push pages into swap that would have otherwise remained resident. > > afaict the patch affects order-0 allocations as well. What are the > implications of this? > order-0 allocation should not be affected because RECLAIM_MODE_COMPACTION is not set so the following avoids the gfp_mask being examined; if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; > Also, what might be the downsides of this change, and did you test for > them? > The main downside that I predict is that the worst-case latencies for successful transparent hugepage allocations will be increased as there will be more looping in do_try_to_free_pages() at higher priorities. I would also not be surprised if there were fewer successful allocations. Latencies did seem to be worse for order-9 allocations in testing but it was offset by lower latencies for lower orders and seemed an acceptable trade-off. Other major consequences did not spring to mind. > > This patch will stop reclaim/compaction if no pages were reclaimed in the > > last SWAP_CLUSTER_MAX pages that were considered. > > a) Why SWAP_CLUSTER_MAX? Is (SWAP_CLUSTER_MAX+7) better or worse? > SWAP_CLUSTER_MAX is the standard "unit of reclaim" and that's what I had in mind when writing the comment but it's wrong and misleading. More on this below. > b) The sentence doesn't seem even vaguely accurate. shrink_zone() > will scan vastly more than SWAP_CLUSTER_MAX pages before calling > should_continue_reclaim(). Confused. > > c) The patch doesn't "stop reclaim/compaction" fully. It stops it > against one zone. reclaim will then advance on to any other > eligible zones. You're right on both counts and this comment is inaccurate. It should have read; This patch will stop reclaim/compaction for the current zone in shrink_zone() if there were no pages reclaimed in the last batch of scanning at the current priority. For allocations such as hugetlbfs that use __GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. The comment in the code itself then becomes + /* + * For non-__GFP_REPEAT allocations which can presumably + * fail without consequence, stop if we failed to reclaim + * any pages from the last batch of pages that were scanned. + * This will return to the caller faster at the risk that + * reclaim/compaction and the resulting allocation attempt + * fails + */ -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT @ 2011-02-18 12:22 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2011-02-18 12:22 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Andrea Arcangeli, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 17, 2011 at 02:22:09PM -0800, Andrew Morton wrote: > On Wed, 16 Feb 2011 09:50:49 +0000 > Mel Gorman <mel@csn.ul.ie> wrote: > > > should_continue_reclaim() for reclaim/compaction allows scanning to continue > > even if pages are not being reclaimed until the full list is scanned. In > > terms of allocation success, this makes sense but potentially it introduces > > unwanted latency for high-order allocations such as transparent hugepages > > and network jumbo frames that would prefer to fail the allocation attempt > > and fallback to order-0 pages. Worse, there is a potential that the full > > LRU scan will clear all the young bits, distort page aging information and > > potentially push pages into swap that would have otherwise remained resident. > > afaict the patch affects order-0 allocations as well. What are the > implications of this? > order-0 allocation should not be affected because RECLAIM_MODE_COMPACTION is not set so the following avoids the gfp_mask being examined; if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; > Also, what might be the downsides of this change, and did you test for > them? > The main downside that I predict is that the worst-case latencies for successful transparent hugepage allocations will be increased as there will be more looping in do_try_to_free_pages() at higher priorities. I would also not be surprised if there were fewer successful allocations. Latencies did seem to be worse for order-9 allocations in testing but it was offset by lower latencies for lower orders and seemed an acceptable trade-off. Other major consequences did not spring to mind. > > This patch will stop reclaim/compaction if no pages were reclaimed in the > > last SWAP_CLUSTER_MAX pages that were considered. > > a) Why SWAP_CLUSTER_MAX? Is (SWAP_CLUSTER_MAX+7) better or worse? > SWAP_CLUSTER_MAX is the standard "unit of reclaim" and that's what I had in mind when writing the comment but it's wrong and misleading. More on this below. > b) The sentence doesn't seem even vaguely accurate. shrink_zone() > will scan vastly more than SWAP_CLUSTER_MAX pages before calling > should_continue_reclaim(). Confused. > > c) The patch doesn't "stop reclaim/compaction" fully. It stops it > against one zone. reclaim will then advance on to any other > eligible zones. You're right on both counts and this comment is inaccurate. It should have read; This patch will stop reclaim/compaction for the current zone in shrink_zone() if there were no pages reclaimed in the last batch of scanning at the current priority. For allocations such as hugetlbfs that use __GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. The comment in the code itself then becomes + /* + * For non-__GFP_REPEAT allocations which can presumably + * fail without consequence, stop if we failed to reclaim + * any pages from the last batch of pages that were scanned. + * This will return to the caller faster at the risk that + * reclaim/compaction and the resulting allocation attempt + * fails + */ -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done 2011-02-09 15:46 ` Johannes Weiner @ 2011-02-10 4:04 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2011-02-10 4:04 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 12:46 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Hi, > > I think this should fix the problem of processes getting stuck in > reclaim that has been reported several times. Kent actually > single-stepped through this code and noted that it was never exiting > shrink_zone(), which really narrowed it down a lot, considering the > tons of nested loops from the allocator down to the list shrinking. > > Hannes > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: vmscan: fix zone shrinking exit when scan work is done > > '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of > lumpy reclaim' introduced an indefinite loop in shrink_zone(). > > It meant to break out of this loop when no pages had been reclaimed > and not a single page was even scanned. The way it would detect the > latter is by taking a snapshot of sc->nr_scanned at the beginning of > the function and comparing it against the new sc->nr_scanned after the > scan loop. But it would re-iterate without updating that snapshot, > looping forever if sc->nr_scanned changed at least once since > shrink_zone() was invoked. > > This is not the sole condition that would exit that loop, but it > requires other processes to change the zone state, as the reclaimer > that is stuck obviously can not anymore. > > This is only happening for higher-order allocations, where reclaim is > run back to back with compaction. > > Reported-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Kent Overstreet <kent.overstreet@gmail.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] vmscan: fix zone shrinking exit when scan work is done @ 2011-02-10 4:04 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2011-02-10 4:04 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, Rik van Riel, Michal Hocko, Kent Overstreet, linux-mm, linux-kernel On Thu, Feb 10, 2011 at 12:46 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Hi, > > I think this should fix the problem of processes getting stuck in > reclaim that has been reported several times. Kent actually > single-stepped through this code and noted that it was never exiting > shrink_zone(), which really narrowed it down a lot, considering the > tons of nested loops from the allocator down to the list shrinking. > > Hannes > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: vmscan: fix zone shrinking exit when scan work is done > > '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of > lumpy reclaim' introduced an indefinite loop in shrink_zone(). > > It meant to break out of this loop when no pages had been reclaimed > and not a single page was even scanned. The way it would detect the > latter is by taking a snapshot of sc->nr_scanned at the beginning of > the function and comparing it against the new sc->nr_scanned after the > scan loop. But it would re-iterate without updating that snapshot, > looping forever if sc->nr_scanned changed at least once since > shrink_zone() was invoked. > > This is not the sole condition that would exit that loop, but it > requires other processes to change the zone state, as the reclaimer > that is stuck obviously can not anymore. > > This is only happening for higher-order allocations, where reclaim is > run back to back with compaction. > > Reported-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Kent Overstreet <kent.overstreet@gmail.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2011-02-18 12:22 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-02-09 15:46 [patch] vmscan: fix zone shrinking exit when scan work is done Johannes Weiner 2011-02-09 15:46 ` Johannes Weiner 2011-02-09 15:54 ` Kent Overstreet 2011-02-09 15:54 ` Kent Overstreet 2011-02-09 16:46 ` Mel Gorman 2011-02-09 16:46 ` Mel Gorman 2011-02-09 18:28 ` Andrea Arcangeli 2011-02-09 18:28 ` Andrea Arcangeli 2011-02-09 20:05 ` Andrew Morton 2011-02-09 20:05 ` Andrew Morton 2011-02-10 10:21 ` Mel Gorman 2011-02-10 10:21 ` Mel Gorman 2011-02-10 10:41 ` Michal Hocko 2011-02-10 10:41 ` Michal Hocko 2011-02-10 12:48 ` Andrea Arcangeli 2011-02-10 12:48 ` Andrea Arcangeli 2011-02-10 13:33 ` Mel Gorman 2011-02-10 13:33 ` Mel Gorman 2011-02-10 14:14 ` Andrea Arcangeli 2011-02-10 14:14 ` Andrea Arcangeli 2011-02-10 14:58 ` Mel Gorman 2011-02-10 14:58 ` Mel Gorman 2011-02-16 9:50 ` [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT Mel Gorman 2011-02-16 9:50 ` Mel Gorman 2011-02-16 10:13 ` Andrea Arcangeli 2011-02-16 10:13 ` Andrea Arcangeli 2011-02-16 11:22 ` Mel Gorman 2011-02-16 11:22 ` Mel Gorman 2011-02-16 14:44 ` Andrea Arcangeli 2011-02-16 14:44 ` Andrea Arcangeli 2011-02-16 12:03 ` Andrea Arcangeli 2011-02-16 12:03 ` Andrea Arcangeli 2011-02-16 12:14 ` Rik van Riel 2011-02-16 12:14 ` Rik van Riel 2011-02-16 12:38 ` Johannes Weiner 2011-02-16 12:38 ` Johannes Weiner 2011-02-16 23:26 ` Minchan Kim 2011-02-16 23:26 ` Minchan Kim 2011-02-17 22:22 ` Andrew Morton 2011-02-17 22:22 ` Andrew Morton 2011-02-18 12:22 ` Mel Gorman 2011-02-18 12:22 ` Mel Gorman 2011-02-10 4:04 ` [patch] vmscan: fix zone shrinking exit when scan work is done Minchan Kim 2011-02-10 4:04 ` Minchan Kim
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.