* [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
A check is made for an empty zonelist early in the page allocator fast path
but it's unnecessary. When get_page_from_freelist() is called, it'll return
NULL immediately. Removing the first check is slower for machines with
memoryless nodes but that is a corner case that can live with the overhead.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df03ccc7f07c..21aaef6ddd7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
if (should_fail_alloc_page(gfp_mask, order))
return NULL;
- /*
- * Check the zones suitable for the gfp_mask contain at least one
- * valid zone. It's possible to have an empty zonelist as a result
- * of __GFP_THISNODE and a memoryless node
- */
- if (unlikely(!zonelist->_zonerefs->zone))
- return NULL;
-
if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
alloc_flags |= ALLOC_CMA;
@@ -3394,8 +3386,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
/* The preferred zone is used for statistics later */
preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
ac.nodemask, &ac.preferred_zone);
- if (!ac.preferred_zone)
- goto out;
ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
/* First allocation attempt */
@@ -3418,7 +3408,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
-out:
/*
* When updating a task's mems_allowed, it is possible to race with
* parallel threads in such a way that an allocation can fail while
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 14/28] mm, page_alloc: Simplify last cpupid reset
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The current reset unnecessarily clears flags and makes pointless calculations.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
include/linux/mm.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffcff53e3b2b..60656db00abd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -837,10 +837,7 @@ extern int page_cpupid_xchg_last(struct page *page, int cpupid);
static inline void page_cpupid_reset_last(struct page *page)
{
- int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
-
- page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
- page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
+ page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT;
}
#endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
#else /* !CONFIG_NUMA_BALANCING */
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 14/28] mm, page_alloc: Simplify last cpupid reset
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The current reset unnecessarily clears flags and makes pointless calculations.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
include/linux/mm.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffcff53e3b2b..60656db00abd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -837,10 +837,7 @@ extern int page_cpupid_xchg_last(struct page *page, int cpupid);
static inline void page_cpupid_reset_last(struct page *page)
{
- int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
-
- page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
- page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
+ page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT;
}
#endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
#else /* !CONFIG_NUMA_BALANCING */
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 14/28] mm, page_alloc: Simplify last cpupid reset
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 13:30 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 13:30 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The current reset unnecessarily clears flags and makes pointless calculations.
Ugh, indeed.
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/mm.h | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ffcff53e3b2b..60656db00abd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -837,10 +837,7 @@ extern int page_cpupid_xchg_last(struct page *page, int cpupid);
>
> static inline void page_cpupid_reset_last(struct page *page)
> {
> - int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
> -
> - page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
> - page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
> + page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT;
> }
> #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
> #else /* !CONFIG_NUMA_BALANCING */
>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 14/28] mm, page_alloc: Simplify last cpupid reset
@ 2016-04-26 13:30 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 13:30 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The current reset unnecessarily clears flags and makes pointless calculations.
Ugh, indeed.
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/mm.h | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ffcff53e3b2b..60656db00abd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -837,10 +837,7 @@ extern int page_cpupid_xchg_last(struct page *page, int cpupid);
>
> static inline void page_cpupid_reset_last(struct page *page)
> {
> - int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
> -
> - page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
> - page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
> + page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT;
> }
> #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
> #else /* !CONFIG_NUMA_BALANCING */
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
from a context that cannot sleep. Triggering this is almost certainly
a bug but it's also overhead in the fast path. Move the check to the slow
path. It'll be harder to trigger as it'll only be checked when watermarks
are depleted but it'll also only be checked in a path that can sleep.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 21aaef6ddd7a..9ef2f4ab9ca5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3176,6 +3176,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
return NULL;
}
+ might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
/*
* We also sanity check to catch abuse of atomic reserves being used by
* callers that are not in atomic context.
@@ -3369,8 +3371,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
lockdep_trace_alloc(gfp_mask);
- might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
if (should_fail_alloc_page(gfp_mask, order))
return NULL;
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
from a context that cannot sleep. Triggering this is almost certainly
a bug but it's also overhead in the fast path. Move the check to the slow
path. It'll be harder to trigger as it'll only be checked when watermarks
are depleted but it'll also only be checked in a path that can sleep.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 21aaef6ddd7a..9ef2f4ab9ca5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3176,6 +3176,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
return NULL;
}
+ might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
/*
* We also sanity check to catch abuse of atomic reserves being used by
* callers that are not in atomic context.
@@ -3369,8 +3371,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
lockdep_trace_alloc(gfp_mask);
- might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
if (should_fail_alloc_page(gfp_mask, order))
return NULL;
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 13:41 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 13:41 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
> from a context that cannot sleep. Triggering this is almost certainly
> a bug but it's also overhead in the fast path.
For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's not.
> Move the check to the slow
> path. It'll be harder to trigger as it'll only be checked when watermarks
> are depleted but it'll also only be checked in a path that can sleep.
Hmm what about zone_reclaim_mode=1, should the check be also duplicated to that
part of get_page_from_freelist()?
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 21aaef6ddd7a..9ef2f4ab9ca5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3176,6 +3176,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> return NULL;
> }
>
> + might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> +
> /*
> * We also sanity check to catch abuse of atomic reserves being used by
> * callers that are not in atomic context.
> @@ -3369,8 +3371,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>
> lockdep_trace_alloc(gfp_mask);
>
> - might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> -
> if (should_fail_alloc_page(gfp_mask, order))
> return NULL;
>
>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
@ 2016-04-26 13:41 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 13:41 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
> from a context that cannot sleep. Triggering this is almost certainly
> a bug but it's also overhead in the fast path.
For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's not.
> Move the check to the slow
> path. It'll be harder to trigger as it'll only be checked when watermarks
> are depleted but it'll also only be checked in a path that can sleep.
Hmm what about zone_reclaim_mode=1, should the check be also duplicated to that
part of get_page_from_freelist()?
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 21aaef6ddd7a..9ef2f4ab9ca5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3176,6 +3176,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> return NULL;
> }
>
> + might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> +
> /*
> * We also sanity check to catch abuse of atomic reserves being used by
> * callers that are not in atomic context.
> @@ -3369,8 +3371,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>
> lockdep_trace_alloc(gfp_mask);
>
> - might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> -
> if (should_fail_alloc_page(gfp_mask, order))
> return NULL;
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
2016-04-26 13:41 ` Vlastimil Babka
@ 2016-04-26 14:50 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-26 14:50 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 03:41:22PM +0200, Vlastimil Babka wrote:
> On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
> >from a context that cannot sleep. Triggering this is almost certainly
> >a bug but it's also overhead in the fast path.
>
> For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
> CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's
> not.
>
Either way, it struck me as odd. It does depend on the config and it's
marginal so if there is a problem then I can drop it.
> >Move the check to the slow
> >path. It'll be harder to trigger as it'll only be checked when watermarks
> >are depleted but it'll also only be checked in a path that can sleep.
>
> Hmm what about zone_reclaim_mode=1, should the check be also duplicated to
> that part of get_page_from_freelist()?
>
zone_reclaim has a !gfpflags_allow_blocking() check, does not call
cond_resched() before that check so it does not fall into an accidental
sleep path. I'm not seeing why the check is necessary there.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
@ 2016-04-26 14:50 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-26 14:50 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 03:41:22PM +0200, Vlastimil Babka wrote:
> On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
> >from a context that cannot sleep. Triggering this is almost certainly
> >a bug but it's also overhead in the fast path.
>
> For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
> CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's
> not.
>
Either way, it struck me as odd. It does depend on the config and it's
marginal so if there is a problem then I can drop it.
> >Move the check to the slow
> >path. It'll be harder to trigger as it'll only be checked when watermarks
> >are depleted but it'll also only be checked in a path that can sleep.
>
> Hmm what about zone_reclaim_mode=1, should the check be also duplicated to
> that part of get_page_from_freelist()?
>
zone_reclaim has a !gfpflags_allow_blocking() check, does not call
cond_resched() before that check so it does not fall into an accidental
sleep path. I'm not seeing why the check is necessary there.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
2016-04-26 14:50 ` Mel Gorman
@ 2016-04-26 15:16 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 15:16 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 04/26/2016 04:50 PM, Mel Gorman wrote:
> On Tue, Apr 26, 2016 at 03:41:22PM +0200, Vlastimil Babka wrote:
>> On 04/15/2016 11:07 AM, Mel Gorman wrote:
>> >There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
>> >from a context that cannot sleep. Triggering this is almost certainly
>> >a bug but it's also overhead in the fast path.
>>
>> For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
>> CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's
>> not.
>>
>
> Either way, it struck me as odd. It does depend on the config and it's
> marginal so if there is a problem then I can drop it.
What I tried to say is that it makes sense, but it's perhaps non-obvious :)
>> >Move the check to the slow
>> >path. It'll be harder to trigger as it'll only be checked when watermarks
>> >are depleted but it'll also only be checked in a path that can sleep.
>>
>> Hmm what about zone_reclaim_mode=1, should the check be also duplicated to
>> that part of get_page_from_freelist()?
>>
>
> zone_reclaim has a !gfpflags_allow_blocking() check, does not call
> cond_resched() before that check so it does not fall into an accidental
> sleep path. I'm not seeing why the check is necessary there.
Hmm I thought the primary purpose of this might_sleep_if() is to catch those
(via the DEBUG_ATOMIC_SLEEP) that do pass __GFP_DIRECT_RECLAIM (which means
gfpflags_allow_blocking() will be true and zone_reclaim will proceed), but do so
from the wrong context. Am I getting that wrong?
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
@ 2016-04-26 15:16 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 15:16 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 04/26/2016 04:50 PM, Mel Gorman wrote:
> On Tue, Apr 26, 2016 at 03:41:22PM +0200, Vlastimil Babka wrote:
>> On 04/15/2016 11:07 AM, Mel Gorman wrote:
>> >There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
>> >from a context that cannot sleep. Triggering this is almost certainly
>> >a bug but it's also overhead in the fast path.
>>
>> For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
>> CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's
>> not.
>>
>
> Either way, it struck me as odd. It does depend on the config and it's
> marginal so if there is a problem then I can drop it.
What I tried to say is that it makes sense, but it's perhaps non-obvious :)
>> >Move the check to the slow
>> >path. It'll be harder to trigger as it'll only be checked when watermarks
>> >are depleted but it'll also only be checked in a path that can sleep.
>>
>> Hmm what about zone_reclaim_mode=1, should the check be also duplicated to
>> that part of get_page_from_freelist()?
>>
>
> zone_reclaim has a !gfpflags_allow_blocking() check, does not call
> cond_resched() before that check so it does not fall into an accidental
> sleep path. I'm not seeing why the check is necessary there.
Hmm I thought the primary purpose of this might_sleep_if() is to catch those
(via the DEBUG_ATOMIC_SLEEP) that do pass __GFP_DIRECT_RECLAIM (which means
gfpflags_allow_blocking() will be true and zone_reclaim will proceed), but do so
from the wrong context. Am I getting that wrong?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
2016-04-26 15:16 ` Vlastimil Babka
@ 2016-04-26 16:29 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-26 16:29 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 05:16:21PM +0200, Vlastimil Babka wrote:
> On 04/26/2016 04:50 PM, Mel Gorman wrote:
> >On Tue, Apr 26, 2016 at 03:41:22PM +0200, Vlastimil Babka wrote:
> >>On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >>>There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
> >>>from a context that cannot sleep. Triggering this is almost certainly
> >>>a bug but it's also overhead in the fast path.
> >>
> >>For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
> >>CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's
> >>not.
> >>
> >
> >Either way, it struck me as odd. It does depend on the config and it's
> >marginal so if there is a problem then I can drop it.
>
> What I tried to say is that it makes sense, but it's perhaps non-obvious :)
>
> >>>Move the check to the slow
> >>>path. It'll be harder to trigger as it'll only be checked when watermarks
> >>>are depleted but it'll also only be checked in a path that can sleep.
> >>
> >>Hmm what about zone_reclaim_mode=1, should the check be also duplicated to
> >>that part of get_page_from_freelist()?
> >>
> >
> >zone_reclaim has a !gfpflags_allow_blocking() check, does not call
> >cond_resched() before that check so it does not fall into an accidental
> >sleep path. I'm not seeing why the check is necessary there.
>
> Hmm I thought the primary purpose of this might_sleep_if() is to catch those
> (via the DEBUG_ATOMIC_SLEEP) that do pass __GFP_DIRECT_RECLAIM (which means
> gfpflags_allow_blocking() will be true and zone_reclaim will proceed),
It proceeds but fails immediately so what I'm failing to see is why
moving the check increases risk. I wanted to remove the check from the
path where the problem it's catching cannot happen. It does mean the
debugging check is made less frequently but it's still useful. If you
feel the safety is preferred then I'll drop the patch.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
@ 2016-04-26 16:29 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-26 16:29 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 05:16:21PM +0200, Vlastimil Babka wrote:
> On 04/26/2016 04:50 PM, Mel Gorman wrote:
> >On Tue, Apr 26, 2016 at 03:41:22PM +0200, Vlastimil Babka wrote:
> >>On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >>>There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
> >>>from a context that cannot sleep. Triggering this is almost certainly
> >>>a bug but it's also overhead in the fast path.
> >>
> >>For CONFIG_DEBUG_ATOMIC_SLEEP, enabling is asking for the overhead. But for
> >>CONFIG_PREEMPT_VOLUNTARY which turns it into _cond_resched(), I guess it's
> >>not.
> >>
> >
> >Either way, it struck me as odd. It does depend on the config and it's
> >marginal so if there is a problem then I can drop it.
>
> What I tried to say is that it makes sense, but it's perhaps non-obvious :)
>
> >>>Move the check to the slow
> >>>path. It'll be harder to trigger as it'll only be checked when watermarks
> >>>are depleted but it'll also only be checked in a path that can sleep.
> >>
> >>Hmm what about zone_reclaim_mode=1, should the check be also duplicated to
> >>that part of get_page_from_freelist()?
> >>
> >
> >zone_reclaim has a !gfpflags_allow_blocking() check, does not call
> >cond_resched() before that check so it does not fall into an accidental
> >sleep path. I'm not seeing why the check is necessary there.
>
> Hmm I thought the primary purpose of this might_sleep_if() is to catch those
> (via the DEBUG_ATOMIC_SLEEP) that do pass __GFP_DIRECT_RECLAIM (which means
> gfpflags_allow_blocking() will be true and zone_reclaim will proceed),
It proceeds but fails immediately so what I'm failing to see is why
moving the check increases risk. I wanted to remove the check from the
path where the problem it's catching cannot happen. It does mean the
debugging check is made less frequently but it's still useful. If you
feel the safety is preferred then I'll drop the patch.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 16/28] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
__GFP_HARDWALL only has meaning in the context of cpusets but the fast path
always applies the flag on the first attempt. Move the manipulations into
the cpuset paths where they will be masked by a static branch in the common
case.
With the other micro-optimisations in this series combined, the impact on
a page allocator microbenchmark is
4.6.0-rc2 4.6.0-rc2
decstat-v1r20 micro-v1r20
Min alloc-odr0-1 381.00 ( 0.00%) 377.00 ( 1.05%)
Min alloc-odr0-2 275.00 ( 0.00%) 273.00 ( 0.73%)
Min alloc-odr0-4 229.00 ( 0.00%) 226.00 ( 1.31%)
Min alloc-odr0-8 199.00 ( 0.00%) 196.00 ( 1.51%)
Min alloc-odr0-16 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-32 179.00 ( 0.00%) 175.00 ( 2.23%)
Min alloc-odr0-64 174.00 ( 0.00%) 172.00 ( 1.15%)
Min alloc-odr0-128 172.00 ( 0.00%) 170.00 ( 1.16%)
Min alloc-odr0-256 181.00 ( 0.00%) 183.00 ( -1.10%)
Min alloc-odr0-512 193.00 ( 0.00%) 191.00 ( 1.04%)
Min alloc-odr0-1024 201.00 ( 0.00%) 199.00 ( 1.00%)
Min alloc-odr0-2048 206.00 ( 0.00%) 204.00 ( 0.97%)
Min alloc-odr0-4096 212.00 ( 0.00%) 210.00 ( 0.94%)
Min alloc-odr0-8192 215.00 ( 0.00%) 213.00 ( 0.93%)
Min alloc-odr0-16384 216.00 ( 0.00%) 214.00 ( 0.93%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ef2f4ab9ca5..4a364e318873 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3353,7 +3353,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct page *page;
unsigned int cpuset_mems_cookie;
unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
- gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
+ gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = {
.high_zoneidx = gfp_zone(gfp_mask),
.zonelist = zonelist,
@@ -3362,6 +3362,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
};
if (cpusets_enabled()) {
+ alloc_mask |= __GFP_HARDWALL;
alloc_flags |= ALLOC_CPUSET;
if (!ac.nodemask)
ac.nodemask = &cpuset_current_mems_allowed;
@@ -3389,7 +3390,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
/* First allocation attempt */
- alloc_mask = gfp_mask|__GFP_HARDWALL;
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
if (unlikely(!page)) {
/*
@@ -3414,8 +3414,10 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
* the mask is being updated. If a page allocation is about to fail,
* check if the cpuset changed during allocation and if so, retry.
*/
- if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
+ if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) {
+ alloc_mask = gfp_mask;
goto retry_cpuset;
+ }
return page;
}
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 16/28] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
__GFP_HARDWALL only has meaning in the context of cpusets but the fast path
always applies the flag on the first attempt. Move the manipulations into
the cpuset paths where they will be masked by a static branch in the common
case.
With the other micro-optimisations in this series combined, the impact on
a page allocator microbenchmark is
4.6.0-rc2 4.6.0-rc2
decstat-v1r20 micro-v1r20
Min alloc-odr0-1 381.00 ( 0.00%) 377.00 ( 1.05%)
Min alloc-odr0-2 275.00 ( 0.00%) 273.00 ( 0.73%)
Min alloc-odr0-4 229.00 ( 0.00%) 226.00 ( 1.31%)
Min alloc-odr0-8 199.00 ( 0.00%) 196.00 ( 1.51%)
Min alloc-odr0-16 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-32 179.00 ( 0.00%) 175.00 ( 2.23%)
Min alloc-odr0-64 174.00 ( 0.00%) 172.00 ( 1.15%)
Min alloc-odr0-128 172.00 ( 0.00%) 170.00 ( 1.16%)
Min alloc-odr0-256 181.00 ( 0.00%) 183.00 ( -1.10%)
Min alloc-odr0-512 193.00 ( 0.00%) 191.00 ( 1.04%)
Min alloc-odr0-1024 201.00 ( 0.00%) 199.00 ( 1.00%)
Min alloc-odr0-2048 206.00 ( 0.00%) 204.00 ( 0.97%)
Min alloc-odr0-4096 212.00 ( 0.00%) 210.00 ( 0.94%)
Min alloc-odr0-8192 215.00 ( 0.00%) 213.00 ( 0.93%)
Min alloc-odr0-16384 216.00 ( 0.00%) 214.00 ( 0.93%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ef2f4ab9ca5..4a364e318873 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3353,7 +3353,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct page *page;
unsigned int cpuset_mems_cookie;
unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
- gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
+ gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = {
.high_zoneidx = gfp_zone(gfp_mask),
.zonelist = zonelist,
@@ -3362,6 +3362,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
};
if (cpusets_enabled()) {
+ alloc_mask |= __GFP_HARDWALL;
alloc_flags |= ALLOC_CPUSET;
if (!ac.nodemask)
ac.nodemask = &cpuset_current_mems_allowed;
@@ -3389,7 +3390,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
/* First allocation attempt */
- alloc_mask = gfp_mask|__GFP_HARDWALL;
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
if (unlikely(!page)) {
/*
@@ -3414,8 +3414,10 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
* the mask is being updated. If a page allocation is about to fail,
* check if the cpuset changed during allocation and if so, retry.
*/
- if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
+ if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) {
+ alloc_mask = gfp_mask;
goto retry_cpuset;
+ }
return page;
}
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 16/28] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 14:13 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 14:13 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> __GFP_HARDWALL only has meaning in the context of cpusets but the fast path
> always applies the flag on the first attempt. Move the manipulations into
> the cpuset paths where they will be masked by a static branch in the common
> case.
>
> With the other micro-optimisations in this series combined, the impact on
> a page allocator microbenchmark is
>
> 4.6.0-rc2 4.6.0-rc2
> decstat-v1r20 micro-v1r20
> Min alloc-odr0-1 381.00 ( 0.00%) 377.00 ( 1.05%)
> Min alloc-odr0-2 275.00 ( 0.00%) 273.00 ( 0.73%)
> Min alloc-odr0-4 229.00 ( 0.00%) 226.00 ( 1.31%)
> Min alloc-odr0-8 199.00 ( 0.00%) 196.00 ( 1.51%)
> Min alloc-odr0-16 186.00 ( 0.00%) 183.00 ( 1.61%)
> Min alloc-odr0-32 179.00 ( 0.00%) 175.00 ( 2.23%)
> Min alloc-odr0-64 174.00 ( 0.00%) 172.00 ( 1.15%)
> Min alloc-odr0-128 172.00 ( 0.00%) 170.00 ( 1.16%)
> Min alloc-odr0-256 181.00 ( 0.00%) 183.00 ( -1.10%)
> Min alloc-odr0-512 193.00 ( 0.00%) 191.00 ( 1.04%)
> Min alloc-odr0-1024 201.00 ( 0.00%) 199.00 ( 1.00%)
> Min alloc-odr0-2048 206.00 ( 0.00%) 204.00 ( 0.97%)
> Min alloc-odr0-4096 212.00 ( 0.00%) 210.00 ( 0.94%)
> Min alloc-odr0-8192 215.00 ( 0.00%) 213.00 ( 0.93%)
> Min alloc-odr0-16384 216.00 ( 0.00%) 214.00 ( 0.93%)
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 16/28] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath
@ 2016-04-26 14:13 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 14:13 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> __GFP_HARDWALL only has meaning in the context of cpusets but the fast path
> always applies the flag on the first attempt. Move the manipulations into
> the cpuset paths where they will be masked by a static branch in the common
> case.
>
> With the other micro-optimisations in this series combined, the impact on
> a page allocator microbenchmark is
>
> 4.6.0-rc2 4.6.0-rc2
> decstat-v1r20 micro-v1r20
> Min alloc-odr0-1 381.00 ( 0.00%) 377.00 ( 1.05%)
> Min alloc-odr0-2 275.00 ( 0.00%) 273.00 ( 0.73%)
> Min alloc-odr0-4 229.00 ( 0.00%) 226.00 ( 1.31%)
> Min alloc-odr0-8 199.00 ( 0.00%) 196.00 ( 1.51%)
> Min alloc-odr0-16 186.00 ( 0.00%) 183.00 ( 1.61%)
> Min alloc-odr0-32 179.00 ( 0.00%) 175.00 ( 2.23%)
> Min alloc-odr0-64 174.00 ( 0.00%) 172.00 ( 1.15%)
> Min alloc-odr0-128 172.00 ( 0.00%) 170.00 ( 1.16%)
> Min alloc-odr0-256 181.00 ( 0.00%) 183.00 ( -1.10%)
> Min alloc-odr0-512 193.00 ( 0.00%) 191.00 ( 1.04%)
> Min alloc-odr0-1024 201.00 ( 0.00%) 199.00 ( 1.00%)
> Min alloc-odr0-2048 206.00 ( 0.00%) 204.00 ( 0.97%)
> Min alloc-odr0-4096 212.00 ( 0.00%) 210.00 ( 0.94%)
> Min alloc-odr0-8192 215.00 ( 0.00%) 213.00 ( 0.93%)
> Min alloc-odr0-16384 216.00 ( 0.00%) 214.00 ( 0.93%)
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 17/28] mm, page_alloc: Check once if a zone has isolated pageblocks
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
When bulk freeing pages from the per-cpu lists the zone is checked
for isolated pageblocks on every release. This patch checks it once
per drain. Technically this is race-prone but so is the existing
code.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4a364e318873..835a1c434832 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -831,6 +831,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
int batch_free = 0;
int to_free = count;
unsigned long nr_scanned;
+ bool isolated_pageblocks = has_isolate_pageblock(zone);
spin_lock(&zone->lock);
nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
@@ -870,7 +871,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* MIGRATE_ISOLATE page should not go to pcplists */
VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
/* Pageblock could have been isolated meanwhile */
- if (unlikely(has_isolate_pageblock(zone)))
+ if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 17/28] mm, page_alloc: Check once if a zone has isolated pageblocks
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
When bulk freeing pages from the per-cpu lists the zone is checked
for isolated pageblocks on every release. This patch checks it once
per drain. Technically this is race-prone but so is the existing
code.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4a364e318873..835a1c434832 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -831,6 +831,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
int batch_free = 0;
int to_free = count;
unsigned long nr_scanned;
+ bool isolated_pageblocks = has_isolate_pageblock(zone);
spin_lock(&zone->lock);
nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
@@ -870,7 +871,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* MIGRATE_ISOLATE page should not go to pcplists */
VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
/* Pageblock could have been isolated meanwhile */
- if (unlikely(has_isolate_pageblock(zone)))
+ if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 17/28] mm, page_alloc: Check once if a zone has isolated pageblocks
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 14:27 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 14:27 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: Jesper Dangaard Brouer, Linux-MM, LKML, Joonsoo Kim
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> When bulk freeing pages from the per-cpu lists the zone is checked
> for isolated pageblocks on every release. This patch checks it once
> per drain. Technically this is race-prone but so is the existing
> code.
No, existing code is protected by zone->lock. Both checking and manipulating the
variable zone->nr_isolate_pageblock should happen under the lock, as correct
accounting depends on it.
Luckily, the patch could be simply fixed by removing last changelog sentence and:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 49aabfb39ff1..7de04bdd8c67 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -831,9 +831,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
int batch_free = 0;
int to_free = count;
unsigned long nr_scanned;
- bool isolated_pageblocks = has_isolate_pageblock(zone);
+ bool isolated_pageblocks;
spin_lock(&zone->lock);
+ isolated_pageblocks = has_isolate_pageblock(zone);
nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
if (nr_scanned)
__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4a364e318873..835a1c434832 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -831,6 +831,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> int batch_free = 0;
> int to_free = count;
> unsigned long nr_scanned;
> + bool isolated_pageblocks = has_isolate_pageblock(zone);
>
> spin_lock(&zone->lock);
> nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
> @@ -870,7 +871,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> /* MIGRATE_ISOLATE page should not go to pcplists */
> VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> /* Pageblock could have been isolated meanwhile */
> - if (unlikely(has_isolate_pageblock(zone)))
> + if (unlikely(isolated_pageblocks))
> mt = get_pageblock_migratetype(page);
>
> __free_one_page(page, page_to_pfn(page), zone, 0, mt);
>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 17/28] mm, page_alloc: Check once if a zone has isolated pageblocks
@ 2016-04-26 14:27 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 14:27 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: Jesper Dangaard Brouer, Linux-MM, LKML, Joonsoo Kim
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> When bulk freeing pages from the per-cpu lists the zone is checked
> for isolated pageblocks on every release. This patch checks it once
> per drain. Technically this is race-prone but so is the existing
> code.
No, existing code is protected by zone->lock. Both checking and manipulating the
variable zone->nr_isolate_pageblock should happen under the lock, as correct
accounting depends on it.
Luckily, the patch could be simply fixed by removing last changelog sentence and:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 49aabfb39ff1..7de04bdd8c67 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -831,9 +831,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
int batch_free = 0;
int to_free = count;
unsigned long nr_scanned;
- bool isolated_pageblocks = has_isolate_pageblock(zone);
+ bool isolated_pageblocks;
spin_lock(&zone->lock);
+ isolated_pageblocks = has_isolate_pageblock(zone);
nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
if (nr_scanned)
__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4a364e318873..835a1c434832 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -831,6 +831,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> int batch_free = 0;
> int to_free = count;
> unsigned long nr_scanned;
> + bool isolated_pageblocks = has_isolate_pageblock(zone);
>
> spin_lock(&zone->lock);
> nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
> @@ -870,7 +871,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> /* MIGRATE_ISOLATE page should not go to pcplists */
> VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> /* Pageblock could have been isolated meanwhile */
> - if (unlikely(has_isolate_pageblock(zone)))
> + if (unlikely(isolated_pageblocks))
> mt = get_pageblock_migratetype(page);
>
> __free_one_page(page, page_to_pfn(page), zone, 0, mt);
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 18/28] mm, page_alloc: Shorten the page allocator fast path
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The page allocator fast path checks page multiple times unnecessarily.
This patch avoids all the slowpath checks if the first allocation attempt
succeeds.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 835a1c434832..7a5f6ff4ea06 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3392,22 +3392,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
/* First allocation attempt */
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
- if (unlikely(!page)) {
- /*
- * Runtime PM, block IO and its error handling path
- * can deadlock because I/O on the device might not
- * complete.
- */
- alloc_mask = memalloc_noio_flags(gfp_mask);
- ac.spread_dirty_pages = false;
-
- page = __alloc_pages_slowpath(alloc_mask, order, &ac);
- }
+ if (likely(page))
+ goto out;
- if (kmemcheck_enabled && page)
- kmemcheck_pagealloc_alloc(page, order, gfp_mask);
+ /*
+ * Runtime PM, block IO and its error handling path can deadlock
+ * because I/O on the device might not complete.
+ */
+ alloc_mask = memalloc_noio_flags(gfp_mask);
+ ac.spread_dirty_pages = false;
- trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
+ page = __alloc_pages_slowpath(alloc_mask, order, &ac);
/*
* When updating a task's mems_allowed, it is possible to race with
@@ -3420,6 +3415,12 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
goto retry_cpuset;
}
+out:
+ if (kmemcheck_enabled && page)
+ kmemcheck_pagealloc_alloc(page, order, gfp_mask);
+
+ trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
+
return page;
}
EXPORT_SYMBOL(__alloc_pages_nodemask);
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 18/28] mm, page_alloc: Shorten the page allocator fast path
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The page allocator fast path checks page multiple times unnecessarily.
This patch avoids all the slowpath checks if the first allocation attempt
succeeds.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 835a1c434832..7a5f6ff4ea06 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3392,22 +3392,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
/* First allocation attempt */
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
- if (unlikely(!page)) {
- /*
- * Runtime PM, block IO and its error handling path
- * can deadlock because I/O on the device might not
- * complete.
- */
- alloc_mask = memalloc_noio_flags(gfp_mask);
- ac.spread_dirty_pages = false;
-
- page = __alloc_pages_slowpath(alloc_mask, order, &ac);
- }
+ if (likely(page))
+ goto out;
- if (kmemcheck_enabled && page)
- kmemcheck_pagealloc_alloc(page, order, gfp_mask);
+ /*
+ * Runtime PM, block IO and its error handling path can deadlock
+ * because I/O on the device might not complete.
+ */
+ alloc_mask = memalloc_noio_flags(gfp_mask);
+ ac.spread_dirty_pages = false;
- trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
+ page = __alloc_pages_slowpath(alloc_mask, order, &ac);
/*
* When updating a task's mems_allowed, it is possible to race with
@@ -3420,6 +3415,12 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
goto retry_cpuset;
}
+out:
+ if (kmemcheck_enabled && page)
+ kmemcheck_pagealloc_alloc(page, order, gfp_mask);
+
+ trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
+
return page;
}
EXPORT_SYMBOL(__alloc_pages_nodemask);
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 18/28] mm, page_alloc: Shorten the page allocator fast path
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 15:23 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 15:23 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The page allocator fast path checks page multiple times unnecessarily.
> This patch avoids all the slowpath checks if the first allocation attempt
> succeeds.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 18/28] mm, page_alloc: Shorten the page allocator fast path
@ 2016-04-26 15:23 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 15:23 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The page allocator fast path checks page multiple times unnecessarily.
> This patch avoids all the slowpath checks if the first allocation attempt
> succeeds.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 19/28] mm, page_alloc: Reduce cost of fair zone allocation policy retry
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The fair zone allocation policy is not without cost but it can be reduced
slightly. This patch removes an unnecessary local variable, checks the
likely conditions of the fair zone policy first, uses a bool instead of
a flags check and falls through when a remote node is encountered instead
of doing a full restart. The benefit is marginal but it's there
4.6.0-rc2 4.6.0-rc2
decstat-v1r20 optfair-v1r20
Min alloc-odr0-1 377.00 ( 0.00%) 380.00 ( -0.80%)
Min alloc-odr0-2 273.00 ( 0.00%) 273.00 ( 0.00%)
Min alloc-odr0-4 226.00 ( 0.00%) 227.00 ( -0.44%)
Min alloc-odr0-8 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-16 183.00 ( 0.00%) 183.00 ( 0.00%)
Min alloc-odr0-32 175.00 ( 0.00%) 173.00 ( 1.14%)
Min alloc-odr0-64 172.00 ( 0.00%) 169.00 ( 1.74%)
Min alloc-odr0-128 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-256 183.00 ( 0.00%) 180.00 ( 1.64%)
Min alloc-odr0-512 191.00 ( 0.00%) 190.00 ( 0.52%)
Min alloc-odr0-1024 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-2048 204.00 ( 0.00%) 204.00 ( 0.00%)
Min alloc-odr0-4096 210.00 ( 0.00%) 209.00 ( 0.48%)
Min alloc-odr0-8192 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-16384 214.00 ( 0.00%) 214.00 ( 0.00%)
The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not triggered
by this particular test so the benefit for some corner cases is understated.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a5f6ff4ea06..98b443c97be6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2676,12 +2676,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
{
struct zoneref *z;
struct zone *zone;
- bool fair_skipped;
- bool zonelist_rescan;
+ bool fair_skipped = false;
+ bool apply_fair = (alloc_flags & ALLOC_FAIR);
zonelist_scan:
- zonelist_rescan = false;
-
/*
* Scan zonelist, looking for a zone with enough free.
* See also __cpuset_node_allowed() comment in kernel/cpuset.c.
@@ -2701,13 +2699,16 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
* page was allocated in should have no effect on the
* time the page has in memory before being reclaimed.
*/
- if (alloc_flags & ALLOC_FAIR) {
- if (!zone_local(ac->preferred_zone, zone))
- break;
+ if (apply_fair) {
if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
fair_skipped = true;
continue;
}
+ if (!zone_local(ac->preferred_zone, zone)) {
+ if (fair_skipped)
+ goto reset_fair;
+ apply_fair = false;
+ }
}
/*
* When allocating a page cache page for writing, we
@@ -2796,18 +2797,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
* include remote zones now, before entering the slowpath and waking
* kswapd: prefer spilling to a remote zone over swapping locally.
*/
- if (alloc_flags & ALLOC_FAIR) {
- alloc_flags &= ~ALLOC_FAIR;
- if (fair_skipped) {
- zonelist_rescan = true;
- reset_alloc_batches(ac->preferred_zone);
- }
- if (nr_online_nodes > 1)
- zonelist_rescan = true;
- }
-
- if (zonelist_rescan)
+ if (fair_skipped) {
+reset_fair:
+ apply_fair = false;
+ fair_skipped = false;
+ reset_alloc_batches(ac->preferred_zone);
goto zonelist_scan;
+ }
return NULL;
}
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 19/28] mm, page_alloc: Reduce cost of fair zone allocation policy retry
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The fair zone allocation policy is not without cost but it can be reduced
slightly. This patch removes an unnecessary local variable, checks the
likely conditions of the fair zone policy first, uses a bool instead of
a flags check and falls through when a remote node is encountered instead
of doing a full restart. The benefit is marginal but it's there
4.6.0-rc2 4.6.0-rc2
decstat-v1r20 optfair-v1r20
Min alloc-odr0-1 377.00 ( 0.00%) 380.00 ( -0.80%)
Min alloc-odr0-2 273.00 ( 0.00%) 273.00 ( 0.00%)
Min alloc-odr0-4 226.00 ( 0.00%) 227.00 ( -0.44%)
Min alloc-odr0-8 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-16 183.00 ( 0.00%) 183.00 ( 0.00%)
Min alloc-odr0-32 175.00 ( 0.00%) 173.00 ( 1.14%)
Min alloc-odr0-64 172.00 ( 0.00%) 169.00 ( 1.74%)
Min alloc-odr0-128 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-256 183.00 ( 0.00%) 180.00 ( 1.64%)
Min alloc-odr0-512 191.00 ( 0.00%) 190.00 ( 0.52%)
Min alloc-odr0-1024 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-2048 204.00 ( 0.00%) 204.00 ( 0.00%)
Min alloc-odr0-4096 210.00 ( 0.00%) 209.00 ( 0.48%)
Min alloc-odr0-8192 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-16384 214.00 ( 0.00%) 214.00 ( 0.00%)
The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not triggered
by this particular test so the benefit for some corner cases is understated.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a5f6ff4ea06..98b443c97be6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2676,12 +2676,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
{
struct zoneref *z;
struct zone *zone;
- bool fair_skipped;
- bool zonelist_rescan;
+ bool fair_skipped = false;
+ bool apply_fair = (alloc_flags & ALLOC_FAIR);
zonelist_scan:
- zonelist_rescan = false;
-
/*
* Scan zonelist, looking for a zone with enough free.
* See also __cpuset_node_allowed() comment in kernel/cpuset.c.
@@ -2701,13 +2699,16 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
* page was allocated in should have no effect on the
* time the page has in memory before being reclaimed.
*/
- if (alloc_flags & ALLOC_FAIR) {
- if (!zone_local(ac->preferred_zone, zone))
- break;
+ if (apply_fair) {
if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
fair_skipped = true;
continue;
}
+ if (!zone_local(ac->preferred_zone, zone)) {
+ if (fair_skipped)
+ goto reset_fair;
+ apply_fair = false;
+ }
}
/*
* When allocating a page cache page for writing, we
@@ -2796,18 +2797,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
* include remote zones now, before entering the slowpath and waking
* kswapd: prefer spilling to a remote zone over swapping locally.
*/
- if (alloc_flags & ALLOC_FAIR) {
- alloc_flags &= ~ALLOC_FAIR;
- if (fair_skipped) {
- zonelist_rescan = true;
- reset_alloc_batches(ac->preferred_zone);
- }
- if (nr_online_nodes > 1)
- zonelist_rescan = true;
- }
-
- if (zonelist_rescan)
+ if (fair_skipped) {
+reset_fair:
+ apply_fair = false;
+ fair_skipped = false;
+ reset_alloc_batches(ac->preferred_zone);
goto zonelist_scan;
+ }
return NULL;
}
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 19/28] mm, page_alloc: Reduce cost of fair zone allocation policy retry
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 17:24 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 17:24 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The fair zone allocation policy is not without cost but it can be reduced
> slightly. This patch removes an unnecessary local variable, checks the
> likely conditions of the fair zone policy first, uses a bool instead of
> a flags check and falls through when a remote node is encountered instead
> of doing a full restart. The benefit is marginal but it's there
>
> 4.6.0-rc2 4.6.0-rc2
> decstat-v1r20 optfair-v1r20
> Min alloc-odr0-1 377.00 ( 0.00%) 380.00 ( -0.80%)
> Min alloc-odr0-2 273.00 ( 0.00%) 273.00 ( 0.00%)
> Min alloc-odr0-4 226.00 ( 0.00%) 227.00 ( -0.44%)
> Min alloc-odr0-8 196.00 ( 0.00%) 196.00 ( 0.00%)
> Min alloc-odr0-16 183.00 ( 0.00%) 183.00 ( 0.00%)
> Min alloc-odr0-32 175.00 ( 0.00%) 173.00 ( 1.14%)
> Min alloc-odr0-64 172.00 ( 0.00%) 169.00 ( 1.74%)
> Min alloc-odr0-128 170.00 ( 0.00%) 169.00 ( 0.59%)
> Min alloc-odr0-256 183.00 ( 0.00%) 180.00 ( 1.64%)
> Min alloc-odr0-512 191.00 ( 0.00%) 190.00 ( 0.52%)
> Min alloc-odr0-1024 199.00 ( 0.00%) 198.00 ( 0.50%)
> Min alloc-odr0-2048 204.00 ( 0.00%) 204.00 ( 0.00%)
> Min alloc-odr0-4096 210.00 ( 0.00%) 209.00 ( 0.48%)
> Min alloc-odr0-8192 213.00 ( 0.00%) 213.00 ( 0.00%)
> Min alloc-odr0-16384 214.00 ( 0.00%) 214.00 ( 0.00%)
>
> The benefit is marginal at best but one of the most important benefits,
> avoiding a second search when falling back to another node is not triggered
> by this particular test so the benefit for some corner cases is understated.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 19/28] mm, page_alloc: Reduce cost of fair zone allocation policy retry
@ 2016-04-26 17:24 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 17:24 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The fair zone allocation policy is not without cost but it can be reduced
> slightly. This patch removes an unnecessary local variable, checks the
> likely conditions of the fair zone policy first, uses a bool instead of
> a flags check and falls through when a remote node is encountered instead
> of doing a full restart. The benefit is marginal but it's there
>
> 4.6.0-rc2 4.6.0-rc2
> decstat-v1r20 optfair-v1r20
> Min alloc-odr0-1 377.00 ( 0.00%) 380.00 ( -0.80%)
> Min alloc-odr0-2 273.00 ( 0.00%) 273.00 ( 0.00%)
> Min alloc-odr0-4 226.00 ( 0.00%) 227.00 ( -0.44%)
> Min alloc-odr0-8 196.00 ( 0.00%) 196.00 ( 0.00%)
> Min alloc-odr0-16 183.00 ( 0.00%) 183.00 ( 0.00%)
> Min alloc-odr0-32 175.00 ( 0.00%) 173.00 ( 1.14%)
> Min alloc-odr0-64 172.00 ( 0.00%) 169.00 ( 1.74%)
> Min alloc-odr0-128 170.00 ( 0.00%) 169.00 ( 0.59%)
> Min alloc-odr0-256 183.00 ( 0.00%) 180.00 ( 1.64%)
> Min alloc-odr0-512 191.00 ( 0.00%) 190.00 ( 0.52%)
> Min alloc-odr0-1024 199.00 ( 0.00%) 198.00 ( 0.50%)
> Min alloc-odr0-2048 204.00 ( 0.00%) 204.00 ( 0.00%)
> Min alloc-odr0-4096 210.00 ( 0.00%) 209.00 ( 0.48%)
> Min alloc-odr0-8192 213.00 ( 0.00%) 213.00 ( 0.00%)
> Min alloc-odr0-16384 214.00 ( 0.00%) 214.00 ( 0.00%)
>
> The benefit is marginal at best but one of the most important benefits,
> avoiding a second search when falling back to another node is not triggered
> by this particular test so the benefit for some corner cases is understated.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 20/28] mm, page_alloc: Shortcut watermark checks for order-0 pages
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Watermarks have to be checked on every allocation including the number of
pages being allocated and whether reserves can be accessed. The reserves
only matter if memory is limited and the free_pages adjustment only applies
to high-order pages. This patch adds a shortcut for order-0 pages that avoids
numerous calculations if there is plenty of free memory yielding the following
performance difference in a page allocator microbenchmark;
4.6.0-rc2 4.6.0-rc2
optfair-v1r20 fastmark-v1r20
Min alloc-odr0-1 380.00 ( 0.00%) 364.00 ( 4.21%)
Min alloc-odr0-2 273.00 ( 0.00%) 262.00 ( 4.03%)
Min alloc-odr0-4 227.00 ( 0.00%) 214.00 ( 5.73%)
Min alloc-odr0-8 196.00 ( 0.00%) 186.00 ( 5.10%)
Min alloc-odr0-16 183.00 ( 0.00%) 173.00 ( 5.46%)
Min alloc-odr0-32 173.00 ( 0.00%) 165.00 ( 4.62%)
Min alloc-odr0-64 169.00 ( 0.00%) 161.00 ( 4.73%)
Min alloc-odr0-128 169.00 ( 0.00%) 159.00 ( 5.92%)
Min alloc-odr0-256 180.00 ( 0.00%) 168.00 ( 6.67%)
Min alloc-odr0-512 190.00 ( 0.00%) 180.00 ( 5.26%)
Min alloc-odr0-1024 198.00 ( 0.00%) 190.00 ( 4.04%)
Min alloc-odr0-2048 204.00 ( 0.00%) 196.00 ( 3.92%)
Min alloc-odr0-4096 209.00 ( 0.00%) 202.00 ( 3.35%)
Min alloc-odr0-8192 213.00 ( 0.00%) 206.00 ( 3.29%)
Min alloc-odr0-16384 214.00 ( 0.00%) 206.00 ( 3.74%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98b443c97be6..8923d74b1707 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2619,6 +2619,32 @@ bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
zone_page_state(z, NR_FREE_PAGES));
}
+static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
+ unsigned long mark, int classzone_idx, unsigned int alloc_flags)
+{
+ long free_pages = zone_page_state(z, NR_FREE_PAGES);
+ long cma_pages = 0;
+
+#ifdef CONFIG_CMA
+ /* If allocation can't use CMA areas don't use free CMA pages */
+ if (!(alloc_flags & ALLOC_CMA))
+ cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
+#endif
+
+ /*
+ * Fast check for order-0 only. If this fails then the reserves
+ * need to be calculated. There is a corner case where the check
+ * passes but only the high-order atomic reserve are free. If
+ * the caller is !atomic then it'll uselessly search the free
+ * list. That corner case is then slower but it is harmless.
+ */
+ if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+ return true;
+
+ return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
+ free_pages);
+}
+
bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
unsigned long mark, int classzone_idx)
{
@@ -2740,7 +2766,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
continue;
mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
- if (!zone_watermark_ok(zone, order, mark,
+ if (!zone_watermark_fast(zone, order, mark,
ac->classzone_idx, alloc_flags)) {
int ret;
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 20/28] mm, page_alloc: Shortcut watermark checks for order-0 pages
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Watermarks have to be checked on every allocation including the number of
pages being allocated and whether reserves can be accessed. The reserves
only matter if memory is limited and the free_pages adjustment only applies
to high-order pages. This patch adds a shortcut for order-0 pages that avoids
numerous calculations if there is plenty of free memory yielding the following
performance difference in a page allocator microbenchmark;
4.6.0-rc2 4.6.0-rc2
optfair-v1r20 fastmark-v1r20
Min alloc-odr0-1 380.00 ( 0.00%) 364.00 ( 4.21%)
Min alloc-odr0-2 273.00 ( 0.00%) 262.00 ( 4.03%)
Min alloc-odr0-4 227.00 ( 0.00%) 214.00 ( 5.73%)
Min alloc-odr0-8 196.00 ( 0.00%) 186.00 ( 5.10%)
Min alloc-odr0-16 183.00 ( 0.00%) 173.00 ( 5.46%)
Min alloc-odr0-32 173.00 ( 0.00%) 165.00 ( 4.62%)
Min alloc-odr0-64 169.00 ( 0.00%) 161.00 ( 4.73%)
Min alloc-odr0-128 169.00 ( 0.00%) 159.00 ( 5.92%)
Min alloc-odr0-256 180.00 ( 0.00%) 168.00 ( 6.67%)
Min alloc-odr0-512 190.00 ( 0.00%) 180.00 ( 5.26%)
Min alloc-odr0-1024 198.00 ( 0.00%) 190.00 ( 4.04%)
Min alloc-odr0-2048 204.00 ( 0.00%) 196.00 ( 3.92%)
Min alloc-odr0-4096 209.00 ( 0.00%) 202.00 ( 3.35%)
Min alloc-odr0-8192 213.00 ( 0.00%) 206.00 ( 3.29%)
Min alloc-odr0-16384 214.00 ( 0.00%) 206.00 ( 3.74%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98b443c97be6..8923d74b1707 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2619,6 +2619,32 @@ bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
zone_page_state(z, NR_FREE_PAGES));
}
+static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
+ unsigned long mark, int classzone_idx, unsigned int alloc_flags)
+{
+ long free_pages = zone_page_state(z, NR_FREE_PAGES);
+ long cma_pages = 0;
+
+#ifdef CONFIG_CMA
+ /* If allocation can't use CMA areas don't use free CMA pages */
+ if (!(alloc_flags & ALLOC_CMA))
+ cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
+#endif
+
+ /*
+ * Fast check for order-0 only. If this fails then the reserves
+ * need to be calculated. There is a corner case where the check
+ * passes but only the high-order atomic reserve are free. If
+ * the caller is !atomic then it'll uselessly search the free
+ * list. That corner case is then slower but it is harmless.
+ */
+ if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+ return true;
+
+ return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
+ free_pages);
+}
+
bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
unsigned long mark, int classzone_idx)
{
@@ -2740,7 +2766,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
continue;
mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
- if (!zone_watermark_ok(zone, order, mark,
+ if (!zone_watermark_fast(zone, order, mark,
ac->classzone_idx, alloc_flags)) {
int ret;
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 20/28] mm, page_alloc: Shortcut watermark checks for order-0 pages
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 17:32 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 17:32 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Watermarks have to be checked on every allocation including the number of
> pages being allocated and whether reserves can be accessed. The reserves
> only matter if memory is limited and the free_pages adjustment only applies
> to high-order pages. This patch adds a shortcut for order-0 pages that avoids
> numerous calculations if there is plenty of free memory yielding the following
> performance difference in a page allocator microbenchmark;
>
> 4.6.0-rc2 4.6.0-rc2
> optfair-v1r20 fastmark-v1r20
> Min alloc-odr0-1 380.00 ( 0.00%) 364.00 ( 4.21%)
> Min alloc-odr0-2 273.00 ( 0.00%) 262.00 ( 4.03%)
> Min alloc-odr0-4 227.00 ( 0.00%) 214.00 ( 5.73%)
> Min alloc-odr0-8 196.00 ( 0.00%) 186.00 ( 5.10%)
> Min alloc-odr0-16 183.00 ( 0.00%) 173.00 ( 5.46%)
> Min alloc-odr0-32 173.00 ( 0.00%) 165.00 ( 4.62%)
> Min alloc-odr0-64 169.00 ( 0.00%) 161.00 ( 4.73%)
> Min alloc-odr0-128 169.00 ( 0.00%) 159.00 ( 5.92%)
> Min alloc-odr0-256 180.00 ( 0.00%) 168.00 ( 6.67%)
> Min alloc-odr0-512 190.00 ( 0.00%) 180.00 ( 5.26%)
> Min alloc-odr0-1024 198.00 ( 0.00%) 190.00 ( 4.04%)
> Min alloc-odr0-2048 204.00 ( 0.00%) 196.00 ( 3.92%)
> Min alloc-odr0-4096 209.00 ( 0.00%) 202.00 ( 3.35%)
> Min alloc-odr0-8192 213.00 ( 0.00%) 206.00 ( 3.29%)
> Min alloc-odr0-16384 214.00 ( 0.00%) 206.00 ( 3.74%)
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 20/28] mm, page_alloc: Shortcut watermark checks for order-0 pages
@ 2016-04-26 17:32 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 17:32 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Watermarks have to be checked on every allocation including the number of
> pages being allocated and whether reserves can be accessed. The reserves
> only matter if memory is limited and the free_pages adjustment only applies
> to high-order pages. This patch adds a shortcut for order-0 pages that avoids
> numerous calculations if there is plenty of free memory yielding the following
> performance difference in a page allocator microbenchmark;
>
> 4.6.0-rc2 4.6.0-rc2
> optfair-v1r20 fastmark-v1r20
> Min alloc-odr0-1 380.00 ( 0.00%) 364.00 ( 4.21%)
> Min alloc-odr0-2 273.00 ( 0.00%) 262.00 ( 4.03%)
> Min alloc-odr0-4 227.00 ( 0.00%) 214.00 ( 5.73%)
> Min alloc-odr0-8 196.00 ( 0.00%) 186.00 ( 5.10%)
> Min alloc-odr0-16 183.00 ( 0.00%) 173.00 ( 5.46%)
> Min alloc-odr0-32 173.00 ( 0.00%) 165.00 ( 4.62%)
> Min alloc-odr0-64 169.00 ( 0.00%) 161.00 ( 4.73%)
> Min alloc-odr0-128 169.00 ( 0.00%) 159.00 ( 5.92%)
> Min alloc-odr0-256 180.00 ( 0.00%) 168.00 ( 6.67%)
> Min alloc-odr0-512 190.00 ( 0.00%) 180.00 ( 5.26%)
> Min alloc-odr0-1024 198.00 ( 0.00%) 190.00 ( 4.04%)
> Min alloc-odr0-2048 204.00 ( 0.00%) 196.00 ( 3.92%)
> Min alloc-odr0-4096 209.00 ( 0.00%) 202.00 ( 3.35%)
> Min alloc-odr0-8192 213.00 ( 0.00%) 206.00 ( 3.29%)
> Min alloc-odr0-16384 214.00 ( 0.00%) 206.00 ( 3.74%)
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 21/28] mm, page_alloc: Avoid looking up the first zone in a zonelist twice
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The allocator fast path looks up the first usable zone in a zonelist
and then get_page_from_freelist does the same job in the zonelist
iterator. This patch preserves the necessary information.
4.6.0-rc2 4.6.0-rc2
fastmark-v1r20 initonce-v1r20
Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%)
Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%)
Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%)
Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%)
Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%)
Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%)
Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%)
Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%)
Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%)
Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%)
The benefit is negligible and the results are within the noise but each
cycle counts.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
fs/buffer.c | 10 +++++-----
include/linux/mmzone.h | 18 +++++++++++-------
mm/internal.h | 2 +-
mm/mempolicy.c | 19 ++++++++++---------
mm/page_alloc.c | 32 +++++++++++++++-----------------
5 files changed, 42 insertions(+), 39 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index af0d9a82a8ed..754813a6962b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -255,17 +255,17 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
*/
static void free_more_memory(void)
{
- struct zone *zone;
+ struct zoneref *z;
int nid;
wakeup_flusher_threads(1024, WB_REASON_FREE_MORE_MEM);
yield();
for_each_online_node(nid) {
- (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
- gfp_zone(GFP_NOFS), NULL,
- &zone);
- if (zone)
+
+ z = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
+ gfp_zone(GFP_NOFS), NULL);
+ if (z->zone)
try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
GFP_NOFS, NULL);
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f49bb9add372..bf153ed097d5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -962,13 +962,10 @@ static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
*/
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
enum zone_type highest_zoneidx,
- nodemask_t *nodes,
- struct zone **zone)
+ nodemask_t *nodes)
{
- struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs,
+ return next_zones_zonelist(zonelist->_zonerefs,
highest_zoneidx, nodes);
- *zone = zonelist_zone(z);
- return z;
}
/**
@@ -983,10 +980,17 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
* within a given nodemask
*/
#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
- for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \
+ for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \
zone; \
z = next_zones_zonelist(++z, highidx, nodemask), \
- zone = zonelist_zone(z)) \
+ zone = zonelist_zone(z))
+
+#define for_next_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
+ for (zone = z->zone; \
+ zone; \
+ z = next_zones_zonelist(++z, highidx, nodemask), \
+ zone = zonelist_zone(z))
+
/**
* for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
diff --git a/mm/internal.h b/mm/internal.h
index f6d0a5875ec4..4c2396cd514c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -102,7 +102,7 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
struct alloc_context {
struct zonelist *zonelist;
nodemask_t *nodemask;
- struct zone *preferred_zone;
+ struct zoneref *preferred_zoneref;
int classzone_idx;
int migratetype;
enum zone_type high_zoneidx;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36cc01bc950a..66d73efba370 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1744,18 +1744,18 @@ unsigned int mempolicy_slab_node(void)
return interleave_nodes(policy);
case MPOL_BIND: {
+ struct zoneref *z;
+
/*
* Follow bind policy behavior and start allocation at the
* first node.
*/
struct zonelist *zonelist;
- struct zone *zone;
enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
zonelist = &NODE_DATA(node)->node_zonelists[0];
- (void)first_zones_zonelist(zonelist, highest_zoneidx,
- &policy->v.nodes,
- &zone);
- return zone ? zone->node : node;
+ z = first_zones_zonelist(zonelist, highest_zoneidx,
+ &policy->v.nodes);
+ return z->zone ? z->zone->node : node;
}
default:
@@ -2284,7 +2284,7 @@ static void sp_free(struct sp_node *n)
int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol;
- struct zone *zone;
+ struct zoneref *z;
int curnid = page_to_nid(page);
unsigned long pgoff;
int thiscpu = raw_smp_processor_id();
@@ -2316,6 +2316,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
break;
case MPOL_BIND:
+
/*
* allows binding to multiple nodes.
* use current page if in policy nodemask,
@@ -2324,11 +2325,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
*/
if (node_isset(curnid, pol->v.nodes))
goto out;
- (void)first_zones_zonelist(
+ z = first_zones_zonelist(
node_zonelist(numa_node_id(), GFP_HIGHUSER),
gfp_zone(GFP_HIGHUSER),
- &pol->v.nodes, &zone);
- polnid = zone->node;
+ &pol->v.nodes);
+ polnid = z->zone->node;
break;
default:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8923d74b1707..897e9d2a8500 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2700,7 +2700,7 @@ static struct page *
get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
const struct alloc_context *ac)
{
- struct zoneref *z;
+ struct zoneref *z = ac->preferred_zoneref;
struct zone *zone;
bool fair_skipped = false;
bool apply_fair = (alloc_flags & ALLOC_FAIR);
@@ -2710,7 +2710,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
* Scan zonelist, looking for a zone with enough free.
* See also __cpuset_node_allowed() comment in kernel/cpuset.c.
*/
- for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
+ for_next_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
ac->nodemask) {
struct page *page;
unsigned long mark;
@@ -2730,7 +2730,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
fair_skipped = true;
continue;
}
- if (!zone_local(ac->preferred_zone, zone)) {
+ if (!zone_local(ac->preferred_zoneref->zone, zone)) {
if (fair_skipped)
goto reset_fair;
apply_fair = false;
@@ -2776,7 +2776,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
goto try_this_zone;
if (zone_reclaim_mode == 0 ||
- !zone_allows_reclaim(ac->preferred_zone, zone))
+ !zone_allows_reclaim(ac->preferred_zoneref->zone, zone))
continue;
ret = zone_reclaim(zone, gfp_mask, order);
@@ -2798,7 +2798,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
}
try_this_zone:
- page = buffered_rmqueue(ac->preferred_zone, zone, order,
+ page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
if (prep_new_page(page, order, gfp_mask, alloc_flags))
@@ -2827,7 +2827,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
reset_fair:
apply_fair = false;
fair_skipped = false;
- reset_alloc_batches(ac->preferred_zone);
+ reset_alloc_batches(ac->preferred_zoneref->zone);
goto zonelist_scan;
}
@@ -3114,7 +3114,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
ac->high_zoneidx, ac->nodemask)
- wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
+ wakeup_kswapd(zone, order, zonelist_zone_idx(ac->preferred_zoneref));
}
static inline unsigned int
@@ -3334,7 +3334,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) ||
((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) {
/* Wait for some write requests to complete then retry */
- wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
+ wait_iff_congested(ac->preferred_zoneref->zone, BLK_RW_ASYNC, HZ/50);
goto retry;
}
@@ -3372,7 +3372,6 @@ struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
- struct zoneref *preferred_zoneref;
struct page *page;
unsigned int cpuset_mems_cookie;
unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
@@ -3408,9 +3407,9 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
/* The preferred zone is used for statistics later */
- preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
- ac.nodemask, &ac.preferred_zone);
- ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
+ ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
+ ac.nodemask);
+ ac.classzone_idx = zonelist_zone_idx(ac.preferred_zoneref);
/* First allocation attempt */
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
@@ -4439,13 +4438,12 @@ static void build_zonelists(pg_data_t *pgdat)
*/
int local_memory_node(int node)
{
- struct zone *zone;
+ struct zoneref *z;
- (void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+ z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
gfp_zone(GFP_KERNEL),
- NULL,
- &zone);
- return zone->node;
+ NULL);
+ return z->zone->node;
}
#endif
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 21/28] mm, page_alloc: Avoid looking up the first zone in a zonelist twice
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The allocator fast path looks up the first usable zone in a zonelist
and then get_page_from_freelist does the same job in the zonelist
iterator. This patch preserves the necessary information.
4.6.0-rc2 4.6.0-rc2
fastmark-v1r20 initonce-v1r20
Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%)
Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%)
Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%)
Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%)
Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%)
Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%)
Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%)
Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%)
Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%)
Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%)
The benefit is negligible and the results are within the noise but each
cycle counts.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
fs/buffer.c | 10 +++++-----
include/linux/mmzone.h | 18 +++++++++++-------
mm/internal.h | 2 +-
mm/mempolicy.c | 19 ++++++++++---------
mm/page_alloc.c | 32 +++++++++++++++-----------------
5 files changed, 42 insertions(+), 39 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index af0d9a82a8ed..754813a6962b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -255,17 +255,17 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
*/
static void free_more_memory(void)
{
- struct zone *zone;
+ struct zoneref *z;
int nid;
wakeup_flusher_threads(1024, WB_REASON_FREE_MORE_MEM);
yield();
for_each_online_node(nid) {
- (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
- gfp_zone(GFP_NOFS), NULL,
- &zone);
- if (zone)
+
+ z = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
+ gfp_zone(GFP_NOFS), NULL);
+ if (z->zone)
try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
GFP_NOFS, NULL);
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f49bb9add372..bf153ed097d5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -962,13 +962,10 @@ static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
*/
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
enum zone_type highest_zoneidx,
- nodemask_t *nodes,
- struct zone **zone)
+ nodemask_t *nodes)
{
- struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs,
+ return next_zones_zonelist(zonelist->_zonerefs,
highest_zoneidx, nodes);
- *zone = zonelist_zone(z);
- return z;
}
/**
@@ -983,10 +980,17 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
* within a given nodemask
*/
#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
- for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \
+ for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \
zone; \
z = next_zones_zonelist(++z, highidx, nodemask), \
- zone = zonelist_zone(z)) \
+ zone = zonelist_zone(z))
+
+#define for_next_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
+ for (zone = z->zone; \
+ zone; \
+ z = next_zones_zonelist(++z, highidx, nodemask), \
+ zone = zonelist_zone(z))
+
/**
* for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
diff --git a/mm/internal.h b/mm/internal.h
index f6d0a5875ec4..4c2396cd514c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -102,7 +102,7 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
struct alloc_context {
struct zonelist *zonelist;
nodemask_t *nodemask;
- struct zone *preferred_zone;
+ struct zoneref *preferred_zoneref;
int classzone_idx;
int migratetype;
enum zone_type high_zoneidx;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36cc01bc950a..66d73efba370 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1744,18 +1744,18 @@ unsigned int mempolicy_slab_node(void)
return interleave_nodes(policy);
case MPOL_BIND: {
+ struct zoneref *z;
+
/*
* Follow bind policy behavior and start allocation at the
* first node.
*/
struct zonelist *zonelist;
- struct zone *zone;
enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
zonelist = &NODE_DATA(node)->node_zonelists[0];
- (void)first_zones_zonelist(zonelist, highest_zoneidx,
- &policy->v.nodes,
- &zone);
- return zone ? zone->node : node;
+ z = first_zones_zonelist(zonelist, highest_zoneidx,
+ &policy->v.nodes);
+ return z->zone ? z->zone->node : node;
}
default:
@@ -2284,7 +2284,7 @@ static void sp_free(struct sp_node *n)
int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol;
- struct zone *zone;
+ struct zoneref *z;
int curnid = page_to_nid(page);
unsigned long pgoff;
int thiscpu = raw_smp_processor_id();
@@ -2316,6 +2316,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
break;
case MPOL_BIND:
+
/*
* allows binding to multiple nodes.
* use current page if in policy nodemask,
@@ -2324,11 +2325,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
*/
if (node_isset(curnid, pol->v.nodes))
goto out;
- (void)first_zones_zonelist(
+ z = first_zones_zonelist(
node_zonelist(numa_node_id(), GFP_HIGHUSER),
gfp_zone(GFP_HIGHUSER),
- &pol->v.nodes, &zone);
- polnid = zone->node;
+ &pol->v.nodes);
+ polnid = z->zone->node;
break;
default:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8923d74b1707..897e9d2a8500 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2700,7 +2700,7 @@ static struct page *
get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
const struct alloc_context *ac)
{
- struct zoneref *z;
+ struct zoneref *z = ac->preferred_zoneref;
struct zone *zone;
bool fair_skipped = false;
bool apply_fair = (alloc_flags & ALLOC_FAIR);
@@ -2710,7 +2710,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
* Scan zonelist, looking for a zone with enough free.
* See also __cpuset_node_allowed() comment in kernel/cpuset.c.
*/
- for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
+ for_next_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
ac->nodemask) {
struct page *page;
unsigned long mark;
@@ -2730,7 +2730,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
fair_skipped = true;
continue;
}
- if (!zone_local(ac->preferred_zone, zone)) {
+ if (!zone_local(ac->preferred_zoneref->zone, zone)) {
if (fair_skipped)
goto reset_fair;
apply_fair = false;
@@ -2776,7 +2776,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
goto try_this_zone;
if (zone_reclaim_mode == 0 ||
- !zone_allows_reclaim(ac->preferred_zone, zone))
+ !zone_allows_reclaim(ac->preferred_zoneref->zone, zone))
continue;
ret = zone_reclaim(zone, gfp_mask, order);
@@ -2798,7 +2798,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
}
try_this_zone:
- page = buffered_rmqueue(ac->preferred_zone, zone, order,
+ page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
if (prep_new_page(page, order, gfp_mask, alloc_flags))
@@ -2827,7 +2827,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
reset_fair:
apply_fair = false;
fair_skipped = false;
- reset_alloc_batches(ac->preferred_zone);
+ reset_alloc_batches(ac->preferred_zoneref->zone);
goto zonelist_scan;
}
@@ -3114,7 +3114,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
ac->high_zoneidx, ac->nodemask)
- wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
+ wakeup_kswapd(zone, order, zonelist_zone_idx(ac->preferred_zoneref));
}
static inline unsigned int
@@ -3334,7 +3334,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) ||
((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) {
/* Wait for some write requests to complete then retry */
- wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
+ wait_iff_congested(ac->preferred_zoneref->zone, BLK_RW_ASYNC, HZ/50);
goto retry;
}
@@ -3372,7 +3372,6 @@ struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
- struct zoneref *preferred_zoneref;
struct page *page;
unsigned int cpuset_mems_cookie;
unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
@@ -3408,9 +3407,9 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
/* The preferred zone is used for statistics later */
- preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
- ac.nodemask, &ac.preferred_zone);
- ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
+ ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
+ ac.nodemask);
+ ac.classzone_idx = zonelist_zone_idx(ac.preferred_zoneref);
/* First allocation attempt */
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
@@ -4439,13 +4438,12 @@ static void build_zonelists(pg_data_t *pgdat)
*/
int local_memory_node(int node)
{
- struct zone *zone;
+ struct zoneref *z;
- (void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+ z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
gfp_zone(GFP_KERNEL),
- NULL,
- &zone);
- return zone->node;
+ NULL);
+ return z->zone->node;
}
#endif
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 21/28] mm, page_alloc: Avoid looking up the first zone in a zonelist twice
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 17:46 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 17:46 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The allocator fast path looks up the first usable zone in a zonelist
> and then get_page_from_freelist does the same job in the zonelist
> iterator. This patch preserves the necessary information.
>
> 4.6.0-rc2 4.6.0-rc2
> fastmark-v1r20 initonce-v1r20
> Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%)
> Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%)
> Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%)
> Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
> Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
> Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%)
> Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%)
> Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%)
> Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%)
> Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%)
> Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%)
> Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%)
> Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%)
> Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%)
> Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%)
>
> The benefit is negligible and the results are within the noise but each
> cycle counts.
Hmm this indeed doesn't look too convincing to justify the patch. Also it's
adding adding extra pointer dereferences by accessing zone via zoneref, and the
next patch does the same with classzone_idx (stack saving shouldn't be that
important when the purpose of alloc_context is to have all of it only once on
stack). I don't feel strongly enough to NAK, but not convinced to ack either.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 21/28] mm, page_alloc: Avoid looking up the first zone in a zonelist twice
@ 2016-04-26 17:46 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 17:46 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The allocator fast path looks up the first usable zone in a zonelist
> and then get_page_from_freelist does the same job in the zonelist
> iterator. This patch preserves the necessary information.
>
> 4.6.0-rc2 4.6.0-rc2
> fastmark-v1r20 initonce-v1r20
> Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%)
> Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%)
> Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%)
> Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
> Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
> Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%)
> Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%)
> Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%)
> Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%)
> Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%)
> Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%)
> Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%)
> Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%)
> Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%)
> Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%)
>
> The benefit is negligible and the results are within the noise but each
> cycle counts.
Hmm this indeed doesn't look too convincing to justify the patch. Also it's
adding adding extra pointer dereferences by accessing zone via zoneref, and the
next patch does the same with classzone_idx (stack saving shouldn't be that
important when the purpose of alloc_context is to have all of it only once on
stack). I don't feel strongly enough to NAK, but not convinced to ack either.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 22/28] mm, page_alloc: Remove field from alloc_context
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The classzone_idx can be inferred from preferred_zoneref so remove the
unnecessary field and save stack space.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/compaction.c | 4 ++--
mm/internal.h | 3 ++-
mm/page_alloc.c | 7 +++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 244bb669b5a6..c2fb3c61f1b6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1536,7 +1536,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
status = compact_zone_order(zone, order, gfp_mask, mode,
&zone_contended, alloc_flags,
- ac->classzone_idx);
+ ac_classzone_idx(ac));
rc = max(status, rc);
/*
* It takes at least one zone that wasn't lock contended
@@ -1546,7 +1546,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
/* If a normal allocation would succeed, stop compacting */
if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
- ac->classzone_idx, alloc_flags)) {
+ ac_classzone_idx(ac), alloc_flags)) {
/*
* We think the allocation will succeed in this zone,
* but it is not certain, hence the false. The caller
diff --git a/mm/internal.h b/mm/internal.h
index 4c2396cd514c..3bf62e085b16 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -103,12 +103,13 @@ struct alloc_context {
struct zonelist *zonelist;
nodemask_t *nodemask;
struct zoneref *preferred_zoneref;
- int classzone_idx;
int migratetype;
enum zone_type high_zoneidx;
bool spread_dirty_pages;
};
+#define ac_classzone_idx(ac) zonelist_zone_idx(ac->preferred_zoneref)
+
/*
* Locate the struct page for both the matching buddy in our
* pair (buddy1) and the combined O(n+1) page they form (page).
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 897e9d2a8500..bc754d32aed6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2767,7 +2767,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
if (!zone_watermark_fast(zone, order, mark,
- ac->classzone_idx, alloc_flags)) {
+ ac_classzone_idx(ac), alloc_flags)) {
int ret;
/* Checked here to keep the fast path fast */
@@ -2790,7 +2790,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
default:
/* did we reclaim enough */
if (zone_watermark_ok(zone, order, mark,
- ac->classzone_idx, alloc_flags))
+ ac_classzone_idx(ac), alloc_flags))
goto try_this_zone;
continue;
@@ -3114,7 +3114,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
ac->high_zoneidx, ac->nodemask)
- wakeup_kswapd(zone, order, zonelist_zone_idx(ac->preferred_zoneref));
+ wakeup_kswapd(zone, order, ac_classzone_idx(ac));
}
static inline unsigned int
@@ -3409,7 +3409,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
/* The preferred zone is used for statistics later */
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
ac.nodemask);
- ac.classzone_idx = zonelist_zone_idx(ac.preferred_zoneref);
/* First allocation attempt */
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 22/28] mm, page_alloc: Remove field from alloc_context
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The classzone_idx can be inferred from preferred_zoneref so remove the
unnecessary field and save stack space.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/compaction.c | 4 ++--
mm/internal.h | 3 ++-
mm/page_alloc.c | 7 +++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 244bb669b5a6..c2fb3c61f1b6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1536,7 +1536,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
status = compact_zone_order(zone, order, gfp_mask, mode,
&zone_contended, alloc_flags,
- ac->classzone_idx);
+ ac_classzone_idx(ac));
rc = max(status, rc);
/*
* It takes at least one zone that wasn't lock contended
@@ -1546,7 +1546,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
/* If a normal allocation would succeed, stop compacting */
if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
- ac->classzone_idx, alloc_flags)) {
+ ac_classzone_idx(ac), alloc_flags)) {
/*
* We think the allocation will succeed in this zone,
* but it is not certain, hence the false. The caller
diff --git a/mm/internal.h b/mm/internal.h
index 4c2396cd514c..3bf62e085b16 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -103,12 +103,13 @@ struct alloc_context {
struct zonelist *zonelist;
nodemask_t *nodemask;
struct zoneref *preferred_zoneref;
- int classzone_idx;
int migratetype;
enum zone_type high_zoneidx;
bool spread_dirty_pages;
};
+#define ac_classzone_idx(ac) zonelist_zone_idx(ac->preferred_zoneref)
+
/*
* Locate the struct page for both the matching buddy in our
* pair (buddy1) and the combined O(n+1) page they form (page).
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 897e9d2a8500..bc754d32aed6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2767,7 +2767,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
if (!zone_watermark_fast(zone, order, mark,
- ac->classzone_idx, alloc_flags)) {
+ ac_classzone_idx(ac), alloc_flags)) {
int ret;
/* Checked here to keep the fast path fast */
@@ -2790,7 +2790,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
default:
/* did we reclaim enough */
if (zone_watermark_ok(zone, order, mark,
- ac->classzone_idx, alloc_flags))
+ ac_classzone_idx(ac), alloc_flags))
goto try_this_zone;
continue;
@@ -3114,7 +3114,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
ac->high_zoneidx, ac->nodemask)
- wakeup_kswapd(zone, order, zonelist_zone_idx(ac->preferred_zoneref));
+ wakeup_kswapd(zone, order, ac_classzone_idx(ac));
}
static inline unsigned int
@@ -3409,7 +3409,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
/* The preferred zone is used for statistics later */
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
ac.nodemask);
- ac.classzone_idx = zonelist_zone_idx(ac.preferred_zoneref);
/* First allocation attempt */
page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 23/28] mm, page_alloc: Check multiple page fields with a single branch
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Every page allocated or freed is checked for sanity to avoid corruptions
that are difficult to detect later. A bad page could be due to a number of
fields. Instead of using multiple branches, this patch combines multiple
fields into a single branch. A detailed check is only necessary if that
check fails.
4.6.0-rc2 4.6.0-rc2
initonce-v1r20 multcheck-v1r20
Min alloc-odr0-1 359.00 ( 0.00%) 348.00 ( 3.06%)
Min alloc-odr0-2 260.00 ( 0.00%) 254.00 ( 2.31%)
Min alloc-odr0-4 214.00 ( 0.00%) 213.00 ( 0.47%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 166.00 ( -0.61%)
Min alloc-odr0-64 162.00 ( 0.00%) 162.00 ( 0.00%)
Min alloc-odr0-128 161.00 ( 0.00%) 160.00 ( 0.62%)
Min alloc-odr0-256 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-512 181.00 ( 0.00%) 180.00 ( 0.55%)
Min alloc-odr0-1024 190.00 ( 0.00%) 188.00 ( 1.05%)
Min alloc-odr0-2048 196.00 ( 0.00%) 194.00 ( 1.02%)
Min alloc-odr0-4096 202.00 ( 0.00%) 199.00 ( 1.49%)
Min alloc-odr0-8192 205.00 ( 0.00%) 202.00 ( 1.46%)
Min alloc-odr0-16384 205.00 ( 0.00%) 203.00 ( 0.98%)
Again, the benefit is marginal but avoiding excessive branches is
important. Ideally the paths would not have to check these conditions at
all but regrettably abandoning the tests would make use-after-free bugs
much harder to detect.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 43 insertions(+), 12 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc754d32aed6..3a60579342a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -784,10 +784,42 @@ static inline void __free_one_page(struct page *page,
zone->free_area[order].nr_free++;
}
+/*
+ * A bad page could be due to a number of fields. Instead of multiple branches,
+ * try and check multiple fields with one check. The caller must do a detailed
+ * check if necessary.
+ */
+static inline bool page_expected_state(struct page *page,
+ unsigned long check_flags)
+{
+ if (unlikely(atomic_read(&page->_mapcount) != -1))
+ return false;
+
+ if (unlikely((unsigned long)page->mapping |
+ page_ref_count(page) |
+#ifdef CONFIG_MEMCG
+ (unsigned long)page->mem_cgroup |
+#endif
+ (page->flags & check_flags)))
+ return false;
+
+ return true;
+}
+
static inline int free_pages_check(struct page *page)
{
- const char *bad_reason = NULL;
- unsigned long bad_flags = 0;
+ const char *bad_reason;
+ unsigned long bad_flags;
+
+ if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return 0;
+ }
+
+ /* Something has gone sideways, find it */
+ bad_reason = NULL;
+ bad_flags = 0;
if (unlikely(atomic_read(&page->_mapcount) != -1))
bad_reason = "nonzero mapcount";
@@ -803,14 +835,8 @@ static inline int free_pages_check(struct page *page)
if (unlikely(page->mem_cgroup))
bad_reason = "page still charged to cgroup";
#endif
- if (unlikely(bad_reason)) {
- bad_page(page, bad_reason, bad_flags);
- return 1;
- }
- page_cpupid_reset_last(page);
- if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return 0;
+ bad_page(page, bad_reason, bad_flags);
+ return 1;
}
/*
@@ -1492,9 +1518,14 @@ static inline void expand(struct zone *zone, struct page *page,
*/
static inline int check_new_page(struct page *page)
{
- const char *bad_reason = NULL;
- unsigned long bad_flags = 0;
+ const char *bad_reason;
+ unsigned long bad_flags;
+
+ if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))
+ return 0;
+ bad_reason = NULL;
+ bad_flags = 0;
if (unlikely(atomic_read(&page->_mapcount) != -1))
bad_reason = "nonzero mapcount";
if (unlikely(page->mapping != NULL))
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 23/28] mm, page_alloc: Check multiple page fields with a single branch
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Every page allocated or freed is checked for sanity to avoid corruptions
that are difficult to detect later. A bad page could be due to a number of
fields. Instead of using multiple branches, this patch combines multiple
fields into a single branch. A detailed check is only necessary if that
check fails.
4.6.0-rc2 4.6.0-rc2
initonce-v1r20 multcheck-v1r20
Min alloc-odr0-1 359.00 ( 0.00%) 348.00 ( 3.06%)
Min alloc-odr0-2 260.00 ( 0.00%) 254.00 ( 2.31%)
Min alloc-odr0-4 214.00 ( 0.00%) 213.00 ( 0.47%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 166.00 ( -0.61%)
Min alloc-odr0-64 162.00 ( 0.00%) 162.00 ( 0.00%)
Min alloc-odr0-128 161.00 ( 0.00%) 160.00 ( 0.62%)
Min alloc-odr0-256 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-512 181.00 ( 0.00%) 180.00 ( 0.55%)
Min alloc-odr0-1024 190.00 ( 0.00%) 188.00 ( 1.05%)
Min alloc-odr0-2048 196.00 ( 0.00%) 194.00 ( 1.02%)
Min alloc-odr0-4096 202.00 ( 0.00%) 199.00 ( 1.49%)
Min alloc-odr0-8192 205.00 ( 0.00%) 202.00 ( 1.46%)
Min alloc-odr0-16384 205.00 ( 0.00%) 203.00 ( 0.98%)
Again, the benefit is marginal but avoiding excessive branches is
important. Ideally the paths would not have to check these conditions at
all but regrettably abandoning the tests would make use-after-free bugs
much harder to detect.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 43 insertions(+), 12 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc754d32aed6..3a60579342a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -784,10 +784,42 @@ static inline void __free_one_page(struct page *page,
zone->free_area[order].nr_free++;
}
+/*
+ * A bad page could be due to a number of fields. Instead of multiple branches,
+ * try and check multiple fields with one check. The caller must do a detailed
+ * check if necessary.
+ */
+static inline bool page_expected_state(struct page *page,
+ unsigned long check_flags)
+{
+ if (unlikely(atomic_read(&page->_mapcount) != -1))
+ return false;
+
+ if (unlikely((unsigned long)page->mapping |
+ page_ref_count(page) |
+#ifdef CONFIG_MEMCG
+ (unsigned long)page->mem_cgroup |
+#endif
+ (page->flags & check_flags)))
+ return false;
+
+ return true;
+}
+
static inline int free_pages_check(struct page *page)
{
- const char *bad_reason = NULL;
- unsigned long bad_flags = 0;
+ const char *bad_reason;
+ unsigned long bad_flags;
+
+ if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return 0;
+ }
+
+ /* Something has gone sideways, find it */
+ bad_reason = NULL;
+ bad_flags = 0;
if (unlikely(atomic_read(&page->_mapcount) != -1))
bad_reason = "nonzero mapcount";
@@ -803,14 +835,8 @@ static inline int free_pages_check(struct page *page)
if (unlikely(page->mem_cgroup))
bad_reason = "page still charged to cgroup";
#endif
- if (unlikely(bad_reason)) {
- bad_page(page, bad_reason, bad_flags);
- return 1;
- }
- page_cpupid_reset_last(page);
- if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return 0;
+ bad_page(page, bad_reason, bad_flags);
+ return 1;
}
/*
@@ -1492,9 +1518,14 @@ static inline void expand(struct zone *zone, struct page *page,
*/
static inline int check_new_page(struct page *page)
{
- const char *bad_reason = NULL;
- unsigned long bad_flags = 0;
+ const char *bad_reason;
+ unsigned long bad_flags;
+
+ if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))
+ return 0;
+ bad_reason = NULL;
+ bad_flags = 0;
if (unlikely(atomic_read(&page->_mapcount) != -1))
bad_reason = "nonzero mapcount";
if (unlikely(page->mapping != NULL))
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 23/28] mm, page_alloc: Check multiple page fields with a single branch
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 18:41 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 18:41 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Every page allocated or freed is checked for sanity to avoid corruptions
> that are difficult to detect later. A bad page could be due to a number of
> fields. Instead of using multiple branches, this patch combines multiple
> fields into a single branch. A detailed check is only necessary if that
> check fails.
>
> 4.6.0-rc2 4.6.0-rc2
> initonce-v1r20 multcheck-v1r20
> Min alloc-odr0-1 359.00 ( 0.00%) 348.00 ( 3.06%)
> Min alloc-odr0-2 260.00 ( 0.00%) 254.00 ( 2.31%)
> Min alloc-odr0-4 214.00 ( 0.00%) 213.00 ( 0.47%)
> Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
> Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
> Min alloc-odr0-32 165.00 ( 0.00%) 166.00 ( -0.61%)
> Min alloc-odr0-64 162.00 ( 0.00%) 162.00 ( 0.00%)
> Min alloc-odr0-128 161.00 ( 0.00%) 160.00 ( 0.62%)
> Min alloc-odr0-256 170.00 ( 0.00%) 169.00 ( 0.59%)
> Min alloc-odr0-512 181.00 ( 0.00%) 180.00 ( 0.55%)
> Min alloc-odr0-1024 190.00 ( 0.00%) 188.00 ( 1.05%)
> Min alloc-odr0-2048 196.00 ( 0.00%) 194.00 ( 1.02%)
> Min alloc-odr0-4096 202.00 ( 0.00%) 199.00 ( 1.49%)
> Min alloc-odr0-8192 205.00 ( 0.00%) 202.00 ( 1.46%)
> Min alloc-odr0-16384 205.00 ( 0.00%) 203.00 ( 0.98%)
>
> Again, the benefit is marginal but avoiding excessive branches is
> important. Ideally the paths would not have to check these conditions at
> all but regrettably abandoning the tests would make use-after-free bugs
> much harder to detect.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
I wonder, would it be just too ugly to add +1 to atomic_read(&page->_mapcount)
and OR it with the rest for a truly single branch?
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 23/28] mm, page_alloc: Check multiple page fields with a single branch
@ 2016-04-26 18:41 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 18:41 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Every page allocated or freed is checked for sanity to avoid corruptions
> that are difficult to detect later. A bad page could be due to a number of
> fields. Instead of using multiple branches, this patch combines multiple
> fields into a single branch. A detailed check is only necessary if that
> check fails.
>
> 4.6.0-rc2 4.6.0-rc2
> initonce-v1r20 multcheck-v1r20
> Min alloc-odr0-1 359.00 ( 0.00%) 348.00 ( 3.06%)
> Min alloc-odr0-2 260.00 ( 0.00%) 254.00 ( 2.31%)
> Min alloc-odr0-4 214.00 ( 0.00%) 213.00 ( 0.47%)
> Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
> Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
> Min alloc-odr0-32 165.00 ( 0.00%) 166.00 ( -0.61%)
> Min alloc-odr0-64 162.00 ( 0.00%) 162.00 ( 0.00%)
> Min alloc-odr0-128 161.00 ( 0.00%) 160.00 ( 0.62%)
> Min alloc-odr0-256 170.00 ( 0.00%) 169.00 ( 0.59%)
> Min alloc-odr0-512 181.00 ( 0.00%) 180.00 ( 0.55%)
> Min alloc-odr0-1024 190.00 ( 0.00%) 188.00 ( 1.05%)
> Min alloc-odr0-2048 196.00 ( 0.00%) 194.00 ( 1.02%)
> Min alloc-odr0-4096 202.00 ( 0.00%) 199.00 ( 1.49%)
> Min alloc-odr0-8192 205.00 ( 0.00%) 202.00 ( 1.46%)
> Min alloc-odr0-16384 205.00 ( 0.00%) 203.00 ( 0.98%)
>
> Again, the benefit is marginal but avoiding excessive branches is
> important. Ideally the paths would not have to check these conditions at
> all but regrettably abandoning the tests would make use-after-free bugs
> much harder to detect.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
I wonder, would it be just too ugly to add +1 to atomic_read(&page->_mapcount)
and OR it with the rest for a truly single branch?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 23/28] mm, page_alloc: Check multiple page fields with a single branch
2016-04-26 18:41 ` Vlastimil Babka
@ 2016-04-27 10:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 10:07 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 08:41:50PM +0200, Vlastimil Babka wrote:
> On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
> I wonder, would it be just too ugly to add +1 to
> atomic_read(&page->_mapcount) and OR it with the rest for a truly single
> branch?
>
Interesting thought. I'm not going to do it as a fix but when I'm doing
the next round of page allocator material, I'll add it to the pile for
evaluation.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 23/28] mm, page_alloc: Check multiple page fields with a single branch
@ 2016-04-27 10:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 10:07 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 08:41:50PM +0200, Vlastimil Babka wrote:
> On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
> I wonder, would it be just too ugly to add +1 to
> atomic_read(&page->_mapcount) and OR it with the rest for a truly single
> branch?
>
Interesting thought. I'm not going to do it as a fix but when I'm doing
the next round of page allocator material, I'll add it to the pile for
evaluation.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 24/28] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The original count is never reused so it can be removed.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3a60579342a5..bdcd4087553e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -855,7 +855,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
{
int migratetype = 0;
int batch_free = 0;
- int to_free = count;
unsigned long nr_scanned;
bool isolated_pageblocks = has_isolate_pageblock(zone);
@@ -864,7 +863,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
if (nr_scanned)
__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
- while (to_free) {
+ while (count) {
struct page *page;
struct list_head *list;
@@ -884,7 +883,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* This is the only non-empty list. Free them all. */
if (batch_free == MIGRATE_PCPTYPES)
- batch_free = to_free;
+ batch_free = count;
do {
int mt; /* migratetype of the to-be-freed page */
@@ -902,7 +901,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
- } while (--to_free && --batch_free && !list_empty(list));
+ } while (--count && --batch_free && !list_empty(list));
}
spin_unlock(&zone->lock);
}
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 24/28] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The original count is never reused so it can be removed.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3a60579342a5..bdcd4087553e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -855,7 +855,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
{
int migratetype = 0;
int batch_free = 0;
- int to_free = count;
unsigned long nr_scanned;
bool isolated_pageblocks = has_isolate_pageblock(zone);
@@ -864,7 +863,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
if (nr_scanned)
__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
- while (to_free) {
+ while (count) {
struct page *page;
struct list_head *list;
@@ -884,7 +883,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* This is the only non-empty list. Free them all. */
if (batch_free == MIGRATE_PCPTYPES)
- batch_free = to_free;
+ batch_free = count;
do {
int mt; /* migratetype of the to-be-freed page */
@@ -902,7 +901,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
- } while (--to_free && --batch_free && !list_empty(list));
+ } while (--count && --batch_free && !list_empty(list));
}
spin_unlock(&zone->lock);
}
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 24/28] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 18:43 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 18:43 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The original count is never reused so it can be removed.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 24/28] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk
@ 2016-04-26 18:43 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 18:43 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The original count is never reused so it can be removed.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 25/28] mm, page_alloc: Inline pageblock lookup in page free fast paths
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The function call overhead of get_pfnblock_flags_mask() is measurable in
the page free paths. This patch uses an inlined version that is faster.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
include/linux/mmzone.h | 7 --
mm/page_alloc.c | 188 ++++++++++++++++++++++++++-----------------------
mm/page_owner.c | 2 +-
mm/vmstat.c | 2 +-
4 files changed, 102 insertions(+), 97 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bf153ed097d5..48ee8885aa74 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -85,13 +85,6 @@ extern int page_group_by_mobility_disabled;
get_pfnblock_flags_mask(page, page_to_pfn(page), \
PB_migrate_end, MIGRATETYPE_MASK)
-static inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn)
-{
- BUILD_BUG_ON(PB_migrate_end - PB_migrate != 2);
- return get_pfnblock_flags_mask(page, pfn, PB_migrate_end,
- MIGRATETYPE_MASK);
-}
-
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdcd4087553e..f038d06192c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -352,6 +352,106 @@ static inline bool update_defer_init(pg_data_t *pgdat,
}
#endif
+/* Return a pointer to the bitmap storing bits affecting a block of pages */
+static inline unsigned long *get_pageblock_bitmap(struct page *page,
+ unsigned long pfn)
+{
+#ifdef CONFIG_SPARSEMEM
+ return __pfn_to_section(pfn)->pageblock_flags;
+#else
+ return page_zone(page)->pageblock_flags;
+#endif /* CONFIG_SPARSEMEM */
+}
+
+static inline int pfn_to_bitidx(struct page *page, unsigned long pfn)
+{
+#ifdef CONFIG_SPARSEMEM
+ pfn &= (PAGES_PER_SECTION-1);
+ return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
+#else
+ pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages);
+ return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
+#endif /* CONFIG_SPARSEMEM */
+}
+
+/**
+ * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages
+ * @page: The page within the block of interest
+ * @pfn: The target page frame number
+ * @end_bitidx: The last bit of interest to retrieve
+ * @mask: mask of bits that the caller is interested in
+ *
+ * Return: pageblock_bits flags
+ */
+static __always_inline unsigned long __get_pfnblock_flags_mask(struct page *page,
+ unsigned long pfn,
+ unsigned long end_bitidx,
+ unsigned long mask)
+{
+ unsigned long *bitmap;
+ unsigned long bitidx, word_bitidx;
+ unsigned long word;
+
+ bitmap = get_pageblock_bitmap(page, pfn);
+ bitidx = pfn_to_bitidx(page, pfn);
+ word_bitidx = bitidx / BITS_PER_LONG;
+ bitidx &= (BITS_PER_LONG-1);
+
+ word = bitmap[word_bitidx];
+ bitidx += end_bitidx;
+ return (word >> (BITS_PER_LONG - bitidx - 1)) & mask;
+}
+
+unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn,
+ unsigned long end_bitidx,
+ unsigned long mask)
+{
+ return __get_pfnblock_flags_mask(page, pfn, end_bitidx, mask);
+}
+
+static __always_inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn)
+{
+ return __get_pfnblock_flags_mask(page, pfn, PB_migrate_end, MIGRATETYPE_MASK);
+}
+
+/**
+ * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages
+ * @page: The page within the block of interest
+ * @flags: The flags to set
+ * @pfn: The target page frame number
+ * @end_bitidx: The last bit of interest
+ * @mask: mask of bits that the caller is interested in
+ */
+void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
+ unsigned long pfn,
+ unsigned long end_bitidx,
+ unsigned long mask)
+{
+ unsigned long *bitmap;
+ unsigned long bitidx, word_bitidx;
+ unsigned long old_word, word;
+
+ BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
+
+ bitmap = get_pageblock_bitmap(page, pfn);
+ bitidx = pfn_to_bitidx(page, pfn);
+ word_bitidx = bitidx / BITS_PER_LONG;
+ bitidx &= (BITS_PER_LONG-1);
+
+ VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
+
+ bitidx += end_bitidx;
+ mask <<= (BITS_PER_LONG - bitidx - 1);
+ flags <<= (BITS_PER_LONG - bitidx - 1);
+
+ word = READ_ONCE(bitmap[word_bitidx]);
+ for (;;) {
+ old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags);
+ if (word == old_word)
+ break;
+ word = old_word;
+ }
+}
void set_pageblock_migratetype(struct page *page, int migratetype)
{
@@ -6801,94 +6901,6 @@ void *__init alloc_large_system_hash(const char *tablename,
return table;
}
-/* Return a pointer to the bitmap storing bits affecting a block of pages */
-static inline unsigned long *get_pageblock_bitmap(struct page *page,
- unsigned long pfn)
-{
-#ifdef CONFIG_SPARSEMEM
- return __pfn_to_section(pfn)->pageblock_flags;
-#else
- return page_zone(page)->pageblock_flags;
-#endif /* CONFIG_SPARSEMEM */
-}
-
-static inline int pfn_to_bitidx(struct page *page, unsigned long pfn)
-{
-#ifdef CONFIG_SPARSEMEM
- pfn &= (PAGES_PER_SECTION-1);
- return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
-#else
- pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages);
- return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
-#endif /* CONFIG_SPARSEMEM */
-}
-
-/**
- * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages
- * @page: The page within the block of interest
- * @pfn: The target page frame number
- * @end_bitidx: The last bit of interest to retrieve
- * @mask: mask of bits that the caller is interested in
- *
- * Return: pageblock_bits flags
- */
-unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn,
- unsigned long end_bitidx,
- unsigned long mask)
-{
- unsigned long *bitmap;
- unsigned long bitidx, word_bitidx;
- unsigned long word;
-
- bitmap = get_pageblock_bitmap(page, pfn);
- bitidx = pfn_to_bitidx(page, pfn);
- word_bitidx = bitidx / BITS_PER_LONG;
- bitidx &= (BITS_PER_LONG-1);
-
- word = bitmap[word_bitidx];
- bitidx += end_bitidx;
- return (word >> (BITS_PER_LONG - bitidx - 1)) & mask;
-}
-
-/**
- * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages
- * @page: The page within the block of interest
- * @flags: The flags to set
- * @pfn: The target page frame number
- * @end_bitidx: The last bit of interest
- * @mask: mask of bits that the caller is interested in
- */
-void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
- unsigned long pfn,
- unsigned long end_bitidx,
- unsigned long mask)
-{
- unsigned long *bitmap;
- unsigned long bitidx, word_bitidx;
- unsigned long old_word, word;
-
- BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
-
- bitmap = get_pageblock_bitmap(page, pfn);
- bitidx = pfn_to_bitidx(page, pfn);
- word_bitidx = bitidx / BITS_PER_LONG;
- bitidx &= (BITS_PER_LONG-1);
-
- VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
-
- bitidx += end_bitidx;
- mask <<= (BITS_PER_LONG - bitidx - 1);
- flags <<= (BITS_PER_LONG - bitidx - 1);
-
- word = READ_ONCE(bitmap[word_bitidx]);
- for (;;) {
- old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags);
- if (word == old_word)
- break;
- word = old_word;
- }
-}
-
/*
* This function checks whether pageblock includes unmovable pages or not.
* If @count is not zero, it is okay to include less @count unmovable pages
diff --git a/mm/page_owner.c b/mm/page_owner.c
index ac3d8d129974..22630e75c192 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -143,7 +143,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
goto err;
/* Print information relevant to grouping pages by mobility */
- pageblock_mt = get_pfnblock_migratetype(page, pfn);
+ pageblock_mt = get_pageblock_migratetype(page);
page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
ret += snprintf(kbuf + ret, count - ret,
"PFN %lu type %s Block %lu type %s Flags %#lx(%pGp)\n",
diff --git a/mm/vmstat.c b/mm/vmstat.c
index a4bda11eac8d..20698fc82354 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1044,7 +1044,7 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
block_end_pfn = min(block_end_pfn, end_pfn);
page = pfn_to_page(pfn);
- pageblock_mt = get_pfnblock_migratetype(page, pfn);
+ pageblock_mt = get_pageblock_migratetype(page);
for (; pfn < block_end_pfn; pfn++) {
if (!pfn_valid_within(pfn))
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 25/28] mm, page_alloc: Inline pageblock lookup in page free fast paths
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
The function call overhead of get_pfnblock_flags_mask() is measurable in
the page free paths. This patch uses an inlined version that is faster.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
include/linux/mmzone.h | 7 --
mm/page_alloc.c | 188 ++++++++++++++++++++++++++-----------------------
mm/page_owner.c | 2 +-
mm/vmstat.c | 2 +-
4 files changed, 102 insertions(+), 97 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bf153ed097d5..48ee8885aa74 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -85,13 +85,6 @@ extern int page_group_by_mobility_disabled;
get_pfnblock_flags_mask(page, page_to_pfn(page), \
PB_migrate_end, MIGRATETYPE_MASK)
-static inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn)
-{
- BUILD_BUG_ON(PB_migrate_end - PB_migrate != 2);
- return get_pfnblock_flags_mask(page, pfn, PB_migrate_end,
- MIGRATETYPE_MASK);
-}
-
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdcd4087553e..f038d06192c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -352,6 +352,106 @@ static inline bool update_defer_init(pg_data_t *pgdat,
}
#endif
+/* Return a pointer to the bitmap storing bits affecting a block of pages */
+static inline unsigned long *get_pageblock_bitmap(struct page *page,
+ unsigned long pfn)
+{
+#ifdef CONFIG_SPARSEMEM
+ return __pfn_to_section(pfn)->pageblock_flags;
+#else
+ return page_zone(page)->pageblock_flags;
+#endif /* CONFIG_SPARSEMEM */
+}
+
+static inline int pfn_to_bitidx(struct page *page, unsigned long pfn)
+{
+#ifdef CONFIG_SPARSEMEM
+ pfn &= (PAGES_PER_SECTION-1);
+ return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
+#else
+ pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages);
+ return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
+#endif /* CONFIG_SPARSEMEM */
+}
+
+/**
+ * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages
+ * @page: The page within the block of interest
+ * @pfn: The target page frame number
+ * @end_bitidx: The last bit of interest to retrieve
+ * @mask: mask of bits that the caller is interested in
+ *
+ * Return: pageblock_bits flags
+ */
+static __always_inline unsigned long __get_pfnblock_flags_mask(struct page *page,
+ unsigned long pfn,
+ unsigned long end_bitidx,
+ unsigned long mask)
+{
+ unsigned long *bitmap;
+ unsigned long bitidx, word_bitidx;
+ unsigned long word;
+
+ bitmap = get_pageblock_bitmap(page, pfn);
+ bitidx = pfn_to_bitidx(page, pfn);
+ word_bitidx = bitidx / BITS_PER_LONG;
+ bitidx &= (BITS_PER_LONG-1);
+
+ word = bitmap[word_bitidx];
+ bitidx += end_bitidx;
+ return (word >> (BITS_PER_LONG - bitidx - 1)) & mask;
+}
+
+unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn,
+ unsigned long end_bitidx,
+ unsigned long mask)
+{
+ return __get_pfnblock_flags_mask(page, pfn, end_bitidx, mask);
+}
+
+static __always_inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn)
+{
+ return __get_pfnblock_flags_mask(page, pfn, PB_migrate_end, MIGRATETYPE_MASK);
+}
+
+/**
+ * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages
+ * @page: The page within the block of interest
+ * @flags: The flags to set
+ * @pfn: The target page frame number
+ * @end_bitidx: The last bit of interest
+ * @mask: mask of bits that the caller is interested in
+ */
+void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
+ unsigned long pfn,
+ unsigned long end_bitidx,
+ unsigned long mask)
+{
+ unsigned long *bitmap;
+ unsigned long bitidx, word_bitidx;
+ unsigned long old_word, word;
+
+ BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
+
+ bitmap = get_pageblock_bitmap(page, pfn);
+ bitidx = pfn_to_bitidx(page, pfn);
+ word_bitidx = bitidx / BITS_PER_LONG;
+ bitidx &= (BITS_PER_LONG-1);
+
+ VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
+
+ bitidx += end_bitidx;
+ mask <<= (BITS_PER_LONG - bitidx - 1);
+ flags <<= (BITS_PER_LONG - bitidx - 1);
+
+ word = READ_ONCE(bitmap[word_bitidx]);
+ for (;;) {
+ old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags);
+ if (word == old_word)
+ break;
+ word = old_word;
+ }
+}
void set_pageblock_migratetype(struct page *page, int migratetype)
{
@@ -6801,94 +6901,6 @@ void *__init alloc_large_system_hash(const char *tablename,
return table;
}
-/* Return a pointer to the bitmap storing bits affecting a block of pages */
-static inline unsigned long *get_pageblock_bitmap(struct page *page,
- unsigned long pfn)
-{
-#ifdef CONFIG_SPARSEMEM
- return __pfn_to_section(pfn)->pageblock_flags;
-#else
- return page_zone(page)->pageblock_flags;
-#endif /* CONFIG_SPARSEMEM */
-}
-
-static inline int pfn_to_bitidx(struct page *page, unsigned long pfn)
-{
-#ifdef CONFIG_SPARSEMEM
- pfn &= (PAGES_PER_SECTION-1);
- return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
-#else
- pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages);
- return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
-#endif /* CONFIG_SPARSEMEM */
-}
-
-/**
- * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages
- * @page: The page within the block of interest
- * @pfn: The target page frame number
- * @end_bitidx: The last bit of interest to retrieve
- * @mask: mask of bits that the caller is interested in
- *
- * Return: pageblock_bits flags
- */
-unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn,
- unsigned long end_bitidx,
- unsigned long mask)
-{
- unsigned long *bitmap;
- unsigned long bitidx, word_bitidx;
- unsigned long word;
-
- bitmap = get_pageblock_bitmap(page, pfn);
- bitidx = pfn_to_bitidx(page, pfn);
- word_bitidx = bitidx / BITS_PER_LONG;
- bitidx &= (BITS_PER_LONG-1);
-
- word = bitmap[word_bitidx];
- bitidx += end_bitidx;
- return (word >> (BITS_PER_LONG - bitidx - 1)) & mask;
-}
-
-/**
- * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages
- * @page: The page within the block of interest
- * @flags: The flags to set
- * @pfn: The target page frame number
- * @end_bitidx: The last bit of interest
- * @mask: mask of bits that the caller is interested in
- */
-void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
- unsigned long pfn,
- unsigned long end_bitidx,
- unsigned long mask)
-{
- unsigned long *bitmap;
- unsigned long bitidx, word_bitidx;
- unsigned long old_word, word;
-
- BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
-
- bitmap = get_pageblock_bitmap(page, pfn);
- bitidx = pfn_to_bitidx(page, pfn);
- word_bitidx = bitidx / BITS_PER_LONG;
- bitidx &= (BITS_PER_LONG-1);
-
- VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
-
- bitidx += end_bitidx;
- mask <<= (BITS_PER_LONG - bitidx - 1);
- flags <<= (BITS_PER_LONG - bitidx - 1);
-
- word = READ_ONCE(bitmap[word_bitidx]);
- for (;;) {
- old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags);
- if (word == old_word)
- break;
- word = old_word;
- }
-}
-
/*
* This function checks whether pageblock includes unmovable pages or not.
* If @count is not zero, it is okay to include less @count unmovable pages
diff --git a/mm/page_owner.c b/mm/page_owner.c
index ac3d8d129974..22630e75c192 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -143,7 +143,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
goto err;
/* Print information relevant to grouping pages by mobility */
- pageblock_mt = get_pfnblock_migratetype(page, pfn);
+ pageblock_mt = get_pageblock_migratetype(page);
page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
ret += snprintf(kbuf + ret, count - ret,
"PFN %lu type %s Block %lu type %s Flags %#lx(%pGp)\n",
diff --git a/mm/vmstat.c b/mm/vmstat.c
index a4bda11eac8d..20698fc82354 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1044,7 +1044,7 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
block_end_pfn = min(block_end_pfn, end_pfn);
page = pfn_to_page(pfn);
- pageblock_mt = get_pfnblock_migratetype(page, pfn);
+ pageblock_mt = get_pageblock_migratetype(page);
for (; pfn < block_end_pfn; pfn++) {
if (!pfn_valid_within(pfn))
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 25/28] mm, page_alloc: Inline pageblock lookup in page free fast paths
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 19:10 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 19:10 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The function call overhead of get_pfnblock_flags_mask() is measurable in
> the page free paths. This patch uses an inlined version that is faster.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 25/28] mm, page_alloc: Inline pageblock lookup in page free fast paths
@ 2016-04-26 19:10 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 19:10 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> The function call overhead of get_pfnblock_flags_mask() is measurable in
> the page free paths. This patch uses an inlined version that is faster.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 26/28] cpuset: use static key better and convert to new API
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
From: Vlastimil Babka <vbabka@suse.cz>
An important function for cpusets is cpuset_node_allowed(), which optimizes on
the fact if there's a single root CPU set, it must be trivially allowed. But
the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key
the right way where static keys eliminate branching overhead with jump labels.
This patch converts it so that static key is used properly. It's also switched
to the new static key API and the checking functions are converted to return
bool instead of int. We also provide a new variant __cpuset_zone_allowed()
which expects that the static key check was already done and they key was
enabled. This is needed for get_page_from_freelist() where we want to also
avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags.
The impact on the page allocator microbenchmark is less than expected but the
cleanup in itself is worthwhile.
4.6.0-rc2 4.6.0-rc2
multcheck-v1r20 cpuset-v1r20
Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%)
Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%)
Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%)
Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%)
Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%)
Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%)
Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%)
Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%)
Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%)
Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%)
Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%)
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
include/linux/cpuset.h | 42 ++++++++++++++++++++++++++++--------------
kernel/cpuset.c | 14 +++++++-------
mm/page_alloc.c | 2 +-
3 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index fea160ee5803..054c734d0170 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -16,26 +16,26 @@
#ifdef CONFIG_CPUSETS
-extern struct static_key cpusets_enabled_key;
+extern struct static_key_false cpusets_enabled_key;
static inline bool cpusets_enabled(void)
{
- return static_key_false(&cpusets_enabled_key);
+ return static_branch_unlikely(&cpusets_enabled_key);
}
static inline int nr_cpusets(void)
{
/* jump label reference count + the top-level cpuset */
- return static_key_count(&cpusets_enabled_key) + 1;
+ return static_key_count(&cpusets_enabled_key.key) + 1;
}
static inline void cpuset_inc(void)
{
- static_key_slow_inc(&cpusets_enabled_key);
+ static_branch_inc(&cpusets_enabled_key);
}
static inline void cpuset_dec(void)
{
- static_key_slow_dec(&cpusets_enabled_key);
+ static_branch_dec(&cpusets_enabled_key);
}
extern int cpuset_init(void);
@@ -48,16 +48,25 @@ extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
void cpuset_init_current_mems_allowed(void);
int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask);
-extern int __cpuset_node_allowed(int node, gfp_t gfp_mask);
+extern bool __cpuset_node_allowed(int node, gfp_t gfp_mask);
-static inline int cpuset_node_allowed(int node, gfp_t gfp_mask)
+static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask)
{
- return nr_cpusets() <= 1 || __cpuset_node_allowed(node, gfp_mask);
+ if (cpusets_enabled())
+ return __cpuset_node_allowed(node, gfp_mask);
+ return true;
}
-static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
{
- return cpuset_node_allowed(zone_to_nid(z), gfp_mask);
+ return __cpuset_node_allowed(zone_to_nid(z), gfp_mask);
+}
+
+static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+{
+ if (cpusets_enabled())
+ return __cpuset_zone_allowed(z, gfp_mask);
+ return true;
}
extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
@@ -174,14 +183,19 @@ static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
return 1;
}
-static inline int cpuset_node_allowed(int node, gfp_t gfp_mask)
+static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask)
{
- return 1;
+ return true;
}
-static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
{
- return 1;
+ return true;
+}
+
+static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+{
+ return true;
}
static inline int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 00ab5c2b7c5b..37a0b44d101f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -62,7 +62,7 @@
#include <linux/cgroup.h>
#include <linux/wait.h>
-struct static_key cpusets_enabled_key __read_mostly = STATIC_KEY_INIT_FALSE;
+DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
/* See "Frequency meter" comments, below. */
@@ -2528,27 +2528,27 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs)
* GFP_KERNEL - any node in enclosing hardwalled cpuset ok
* GFP_USER - only nodes in current tasks mems allowed ok.
*/
-int __cpuset_node_allowed(int node, gfp_t gfp_mask)
+bool __cpuset_node_allowed(int node, gfp_t gfp_mask)
{
struct cpuset *cs; /* current cpuset ancestors */
int allowed; /* is allocation in zone z allowed? */
unsigned long flags;
if (in_interrupt())
- return 1;
+ return true;
if (node_isset(node, current->mems_allowed))
- return 1;
+ return true;
/*
* Allow tasks that have access to memory reserves because they have
* been OOM killed to get memory anywhere.
*/
if (unlikely(test_thread_flag(TIF_MEMDIE)))
- return 1;
+ return true;
if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
- return 0;
+ return false;
if (current->flags & PF_EXITING) /* Let dying task have memory */
- return 1;
+ return true;
/* Not hardwall and node outside mems_allowed: scan up cpusets */
spin_lock_irqsave(&callback_lock, flags);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f038d06192c7..e63afe07c032 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2847,7 +2847,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
if (cpusets_enabled() &&
(alloc_flags & ALLOC_CPUSET) &&
- !cpuset_zone_allowed(zone, gfp_mask))
+ !__cpuset_zone_allowed(zone, gfp_mask))
continue;
/*
* Distribute pages in proportion to the individual
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 26/28] cpuset: use static key better and convert to new API
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
From: Vlastimil Babka <vbabka@suse.cz>
An important function for cpusets is cpuset_node_allowed(), which optimizes on
the fact if there's a single root CPU set, it must be trivially allowed. But
the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key
the right way where static keys eliminate branching overhead with jump labels.
This patch converts it so that static key is used properly. It's also switched
to the new static key API and the checking functions are converted to return
bool instead of int. We also provide a new variant __cpuset_zone_allowed()
which expects that the static key check was already done and they key was
enabled. This is needed for get_page_from_freelist() where we want to also
avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags.
The impact on the page allocator microbenchmark is less than expected but the
cleanup in itself is worthwhile.
4.6.0-rc2 4.6.0-rc2
multcheck-v1r20 cpuset-v1r20
Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%)
Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%)
Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%)
Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%)
Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%)
Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%)
Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%)
Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%)
Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%)
Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%)
Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%)
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
include/linux/cpuset.h | 42 ++++++++++++++++++++++++++++--------------
kernel/cpuset.c | 14 +++++++-------
mm/page_alloc.c | 2 +-
3 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index fea160ee5803..054c734d0170 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -16,26 +16,26 @@
#ifdef CONFIG_CPUSETS
-extern struct static_key cpusets_enabled_key;
+extern struct static_key_false cpusets_enabled_key;
static inline bool cpusets_enabled(void)
{
- return static_key_false(&cpusets_enabled_key);
+ return static_branch_unlikely(&cpusets_enabled_key);
}
static inline int nr_cpusets(void)
{
/* jump label reference count + the top-level cpuset */
- return static_key_count(&cpusets_enabled_key) + 1;
+ return static_key_count(&cpusets_enabled_key.key) + 1;
}
static inline void cpuset_inc(void)
{
- static_key_slow_inc(&cpusets_enabled_key);
+ static_branch_inc(&cpusets_enabled_key);
}
static inline void cpuset_dec(void)
{
- static_key_slow_dec(&cpusets_enabled_key);
+ static_branch_dec(&cpusets_enabled_key);
}
extern int cpuset_init(void);
@@ -48,16 +48,25 @@ extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
void cpuset_init_current_mems_allowed(void);
int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask);
-extern int __cpuset_node_allowed(int node, gfp_t gfp_mask);
+extern bool __cpuset_node_allowed(int node, gfp_t gfp_mask);
-static inline int cpuset_node_allowed(int node, gfp_t gfp_mask)
+static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask)
{
- return nr_cpusets() <= 1 || __cpuset_node_allowed(node, gfp_mask);
+ if (cpusets_enabled())
+ return __cpuset_node_allowed(node, gfp_mask);
+ return true;
}
-static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
{
- return cpuset_node_allowed(zone_to_nid(z), gfp_mask);
+ return __cpuset_node_allowed(zone_to_nid(z), gfp_mask);
+}
+
+static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+{
+ if (cpusets_enabled())
+ return __cpuset_zone_allowed(z, gfp_mask);
+ return true;
}
extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
@@ -174,14 +183,19 @@ static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
return 1;
}
-static inline int cpuset_node_allowed(int node, gfp_t gfp_mask)
+static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask)
{
- return 1;
+ return true;
}
-static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
{
- return 1;
+ return true;
+}
+
+static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+{
+ return true;
}
static inline int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 00ab5c2b7c5b..37a0b44d101f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -62,7 +62,7 @@
#include <linux/cgroup.h>
#include <linux/wait.h>
-struct static_key cpusets_enabled_key __read_mostly = STATIC_KEY_INIT_FALSE;
+DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
/* See "Frequency meter" comments, below. */
@@ -2528,27 +2528,27 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs)
* GFP_KERNEL - any node in enclosing hardwalled cpuset ok
* GFP_USER - only nodes in current tasks mems allowed ok.
*/
-int __cpuset_node_allowed(int node, gfp_t gfp_mask)
+bool __cpuset_node_allowed(int node, gfp_t gfp_mask)
{
struct cpuset *cs; /* current cpuset ancestors */
int allowed; /* is allocation in zone z allowed? */
unsigned long flags;
if (in_interrupt())
- return 1;
+ return true;
if (node_isset(node, current->mems_allowed))
- return 1;
+ return true;
/*
* Allow tasks that have access to memory reserves because they have
* been OOM killed to get memory anywhere.
*/
if (unlikely(test_thread_flag(TIF_MEMDIE)))
- return 1;
+ return true;
if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
- return 0;
+ return false;
if (current->flags & PF_EXITING) /* Let dying task have memory */
- return 1;
+ return true;
/* Not hardwall and node outside mems_allowed: scan up cpusets */
spin_lock_irqsave(&callback_lock, flags);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f038d06192c7..e63afe07c032 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2847,7 +2847,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
if (cpusets_enabled() &&
(alloc_flags & ALLOC_CPUSET) &&
- !cpuset_zone_allowed(zone, gfp_mask))
+ !__cpuset_zone_allowed(zone, gfp_mask))
continue;
/*
* Distribute pages in proportion to the individual
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 26/28] cpuset: use static key better and convert to new API
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 19:49 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 19:49 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: Jesper Dangaard Brouer, Linux-MM, LKML, Zefan Li
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
>
> An important function for cpusets is cpuset_node_allowed(), which optimizes on
> the fact if there's a single root CPU set, it must be trivially allowed. But
> the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key
> the right way where static keys eliminate branching overhead with jump labels.
>
> This patch converts it so that static key is used properly. It's also switched
> to the new static key API and the checking functions are converted to return
> bool instead of int. We also provide a new variant __cpuset_zone_allowed()
> which expects that the static key check was already done and they key was
> enabled. This is needed for get_page_from_freelist() where we want to also
> avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags.
>
> The impact on the page allocator microbenchmark is less than expected but the
> cleanup in itself is worthwhile.
>
> 4.6.0-rc2 4.6.0-rc2
> multcheck-v1r20 cpuset-v1r20
> Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%)
> Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%)
> Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%)
> Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%)
> Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%)
> Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%)
> Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%)
> Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%)
> Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%)
> Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%)
> Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%)
> Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%)
> Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%)
> Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%)
> Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%)
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vl... ah, no, I actually wrote this one.
But since the cpuset maintainer acked [1] my earlier posting only after Mel
included it in this series, I think it's worth transferring it here:
Acked-by: Zefan Li <lizefan@huawei.com>
[1] http://marc.info/?l=linux-mm&m=146062276216574&w=2
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 26/28] cpuset: use static key better and convert to new API
@ 2016-04-26 19:49 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 19:49 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: Jesper Dangaard Brouer, Linux-MM, LKML, Zefan Li
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
>
> An important function for cpusets is cpuset_node_allowed(), which optimizes on
> the fact if there's a single root CPU set, it must be trivially allowed. But
> the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key
> the right way where static keys eliminate branching overhead with jump labels.
>
> This patch converts it so that static key is used properly. It's also switched
> to the new static key API and the checking functions are converted to return
> bool instead of int. We also provide a new variant __cpuset_zone_allowed()
> which expects that the static key check was already done and they key was
> enabled. This is needed for get_page_from_freelist() where we want to also
> avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags.
>
> The impact on the page allocator microbenchmark is less than expected but the
> cleanup in itself is worthwhile.
>
> 4.6.0-rc2 4.6.0-rc2
> multcheck-v1r20 cpuset-v1r20
> Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%)
> Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%)
> Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%)
> Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%)
> Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%)
> Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%)
> Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%)
> Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%)
> Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%)
> Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%)
> Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%)
> Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%)
> Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%)
> Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%)
> Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%)
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vl... ah, no, I actually wrote this one.
But since the cpuset maintainer acked [1] my earlier posting only after Mel
included it in this series, I think it's worth transferring it here:
Acked-by: Zefan Li <lizefan@huawei.com>
[1] http://marc.info/?l=linux-mm&m=146062276216574&w=2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 27/28] mm, page_alloc: Defer debugging checks of freed pages until a PCP drain
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Every page free checks a number of page fields for validity. This
catches premature frees and corruptions but it is also expensive.
This patch weakens the debugging check by checking PCP pages at the
time they are drained from the PCP list. This will trigger the bug
but the site that freed the corrupt page will be lost. To get the
full context, a kernel rebuild with DEBUG_VM is necessary.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 244 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 146 insertions(+), 98 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e63afe07c032..b5722790c846 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -939,6 +939,148 @@ static inline int free_pages_check(struct page *page)
return 1;
}
+static int free_tail_pages_check(struct page *head_page, struct page *page)
+{
+ int ret = 1;
+
+ /*
+ * We rely page->lru.next never has bit 0 set, unless the page
+ * is PageTail(). Let's make sure that's true even for poisoned ->lru.
+ */
+ BUILD_BUG_ON((unsigned long)LIST_POISON1 & 1);
+
+ if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
+ ret = 0;
+ goto out;
+ }
+ switch (page - head_page) {
+ case 1:
+ /* the first tail page: ->mapping is compound_mapcount() */
+ if (unlikely(compound_mapcount(page))) {
+ bad_page(page, "nonzero compound_mapcount", 0);
+ goto out;
+ }
+ break;
+ case 2:
+ /*
+ * the second tail page: ->mapping is
+ * page_deferred_list().next -- ignore value.
+ */
+ break;
+ default:
+ if (page->mapping != TAIL_MAPPING) {
+ bad_page(page, "corrupted mapping in tail page", 0);
+ goto out;
+ }
+ break;
+ }
+ if (unlikely(!PageTail(page))) {
+ bad_page(page, "PageTail not set", 0);
+ goto out;
+ }
+ if (unlikely(compound_head(page) != head_page)) {
+ bad_page(page, "compound_head not consistent", 0);
+ goto out;
+ }
+ ret = 0;
+out:
+ page->mapping = NULL;
+ clear_compound_head(page);
+ return ret;
+}
+
+static bool free_pages_prepare(struct page *page, unsigned int order)
+{
+ int bad = 0;
+
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ trace_mm_page_free(page, order);
+ kmemcheck_free_shadow(page, order);
+ kasan_free_pages(page, order);
+
+ /*
+ * Check tail pages before head page information is cleared to
+ * avoid checking PageCompound for order-0 pages.
+ */
+ if (order) {
+ bool compound = PageCompound(page);
+ int i;
+
+ VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
+
+ for (i = 1; i < (1 << order); i++) {
+ if (compound)
+ bad += free_tail_pages_check(page, page + i);
+ bad += free_pages_check(page + i);
+ }
+ }
+ if (PageAnonHead(page))
+ page->mapping = NULL;
+ bad += free_pages_check(page);
+ if (bad)
+ return false;
+
+ reset_page_owner(page, order);
+
+ if (!PageHighMem(page)) {
+ debug_check_no_locks_freed(page_address(page),
+ PAGE_SIZE << order);
+ debug_check_no_obj_freed(page_address(page),
+ PAGE_SIZE << order);
+ }
+ arch_free_page(page, order);
+ kernel_poison_pages(page, 1 << order, 0);
+ kernel_map_pages(page, 1 << order, 0);
+
+ return true;
+}
+
+#ifdef CONFIG_DEBUG_VM
+static inline bool free_pcp_prepare(struct page *page)
+{
+ return free_pages_prepare(page, 0);
+}
+
+static inline bool bulkfree_pcp_prepare(struct page *page)
+{
+ return false;
+}
+#else
+static bool free_pcp_prepare(struct page *page)
+{
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ trace_mm_page_free(page, 0);
+ kmemcheck_free_shadow(page, 0);
+ kasan_free_pages(page, 0);
+
+ if (PageAnonHead(page))
+ page->mapping = NULL;
+
+ reset_page_owner(page, 0);
+
+ if (!PageHighMem(page)) {
+ debug_check_no_locks_freed(page_address(page),
+ PAGE_SIZE);
+ debug_check_no_obj_freed(page_address(page),
+ PAGE_SIZE);
+ }
+ arch_free_page(page, 0);
+ kernel_poison_pages(page, 0, 0);
+ kernel_map_pages(page, 0, 0);
+
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return true;
+}
+
+static bool bulkfree_pcp_prepare(struct page *page)
+{
+ return free_pages_check(page);
+}
+#endif /* CONFIG_DEBUG_VM */
+
/*
* Frees a number of pages from the PCP lists
* Assumes all pages on list are in same zone, and of same order.
@@ -999,6 +1141,9 @@ static void free_pcppages_bulk(struct zone *zone, int count,
if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
+ if (bulkfree_pcp_prepare(page))
+ continue;
+
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
} while (--count && --batch_free && !list_empty(list));
@@ -1025,56 +1170,6 @@ static void free_one_page(struct zone *zone,
spin_unlock(&zone->lock);
}
-static int free_tail_pages_check(struct page *head_page, struct page *page)
-{
- int ret = 1;
-
- /*
- * We rely page->lru.next never has bit 0 set, unless the page
- * is PageTail(). Let's make sure that's true even for poisoned ->lru.
- */
- BUILD_BUG_ON((unsigned long)LIST_POISON1 & 1);
-
- if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
- ret = 0;
- goto out;
- }
- switch (page - head_page) {
- case 1:
- /* the first tail page: ->mapping is compound_mapcount() */
- if (unlikely(compound_mapcount(page))) {
- bad_page(page, "nonzero compound_mapcount", 0);
- goto out;
- }
- break;
- case 2:
- /*
- * the second tail page: ->mapping is
- * page_deferred_list().next -- ignore value.
- */
- break;
- default:
- if (page->mapping != TAIL_MAPPING) {
- bad_page(page, "corrupted mapping in tail page", 0);
- goto out;
- }
- break;
- }
- if (unlikely(!PageTail(page))) {
- bad_page(page, "PageTail not set", 0);
- goto out;
- }
- if (unlikely(compound_head(page) != head_page)) {
- bad_page(page, "compound_head not consistent", 0);
- goto out;
- }
- ret = 0;
-out:
- page->mapping = NULL;
- clear_compound_head(page);
- return ret;
-}
-
static void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long zone, int nid)
{
@@ -1148,53 +1243,6 @@ void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
}
}
-static bool free_pages_prepare(struct page *page, unsigned int order)
-{
- int bad = 0;
-
- VM_BUG_ON_PAGE(PageTail(page), page);
-
- trace_mm_page_free(page, order);
- kmemcheck_free_shadow(page, order);
- kasan_free_pages(page, order);
-
- /*
- * Check tail pages before head page information is cleared to
- * avoid checking PageCompound for order-0 pages.
- */
- if (order) {
- bool compound = PageCompound(page);
- int i;
-
- VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
-
- for (i = 1; i < (1 << order); i++) {
- if (compound)
- bad += free_tail_pages_check(page, page + i);
- bad += free_pages_check(page + i);
- }
- }
- if (PageAnonHead(page))
- page->mapping = NULL;
- bad += free_pages_check(page);
- if (bad)
- return false;
-
- reset_page_owner(page, order);
-
- if (!PageHighMem(page)) {
- debug_check_no_locks_freed(page_address(page),
- PAGE_SIZE << order);
- debug_check_no_obj_freed(page_address(page),
- PAGE_SIZE << order);
- }
- arch_free_page(page, order);
- kernel_poison_pages(page, 1 << order, 0);
- kernel_map_pages(page, 1 << order, 0);
-
- return true;
-}
-
static void __free_pages_ok(struct page *page, unsigned int order)
{
unsigned long flags;
@@ -2327,7 +2375,7 @@ void free_hot_cold_page(struct page *page, bool cold)
unsigned long pfn = page_to_pfn(page);
int migratetype;
- if (!free_pages_prepare(page, 0))
+ if (!free_pcp_prepare(page))
return;
migratetype = get_pfnblock_migratetype(page, pfn);
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 27/28] mm, page_alloc: Defer debugging checks of freed pages until a PCP drain
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Every page free checks a number of page fields for validity. This
catches premature frees and corruptions but it is also expensive.
This patch weakens the debugging check by checking PCP pages at the
time they are drained from the PCP list. This will trigger the bug
but the site that freed the corrupt page will be lost. To get the
full context, a kernel rebuild with DEBUG_VM is necessary.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 244 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 146 insertions(+), 98 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e63afe07c032..b5722790c846 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -939,6 +939,148 @@ static inline int free_pages_check(struct page *page)
return 1;
}
+static int free_tail_pages_check(struct page *head_page, struct page *page)
+{
+ int ret = 1;
+
+ /*
+ * We rely page->lru.next never has bit 0 set, unless the page
+ * is PageTail(). Let's make sure that's true even for poisoned ->lru.
+ */
+ BUILD_BUG_ON((unsigned long)LIST_POISON1 & 1);
+
+ if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
+ ret = 0;
+ goto out;
+ }
+ switch (page - head_page) {
+ case 1:
+ /* the first tail page: ->mapping is compound_mapcount() */
+ if (unlikely(compound_mapcount(page))) {
+ bad_page(page, "nonzero compound_mapcount", 0);
+ goto out;
+ }
+ break;
+ case 2:
+ /*
+ * the second tail page: ->mapping is
+ * page_deferred_list().next -- ignore value.
+ */
+ break;
+ default:
+ if (page->mapping != TAIL_MAPPING) {
+ bad_page(page, "corrupted mapping in tail page", 0);
+ goto out;
+ }
+ break;
+ }
+ if (unlikely(!PageTail(page))) {
+ bad_page(page, "PageTail not set", 0);
+ goto out;
+ }
+ if (unlikely(compound_head(page) != head_page)) {
+ bad_page(page, "compound_head not consistent", 0);
+ goto out;
+ }
+ ret = 0;
+out:
+ page->mapping = NULL;
+ clear_compound_head(page);
+ return ret;
+}
+
+static bool free_pages_prepare(struct page *page, unsigned int order)
+{
+ int bad = 0;
+
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ trace_mm_page_free(page, order);
+ kmemcheck_free_shadow(page, order);
+ kasan_free_pages(page, order);
+
+ /*
+ * Check tail pages before head page information is cleared to
+ * avoid checking PageCompound for order-0 pages.
+ */
+ if (order) {
+ bool compound = PageCompound(page);
+ int i;
+
+ VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
+
+ for (i = 1; i < (1 << order); i++) {
+ if (compound)
+ bad += free_tail_pages_check(page, page + i);
+ bad += free_pages_check(page + i);
+ }
+ }
+ if (PageAnonHead(page))
+ page->mapping = NULL;
+ bad += free_pages_check(page);
+ if (bad)
+ return false;
+
+ reset_page_owner(page, order);
+
+ if (!PageHighMem(page)) {
+ debug_check_no_locks_freed(page_address(page),
+ PAGE_SIZE << order);
+ debug_check_no_obj_freed(page_address(page),
+ PAGE_SIZE << order);
+ }
+ arch_free_page(page, order);
+ kernel_poison_pages(page, 1 << order, 0);
+ kernel_map_pages(page, 1 << order, 0);
+
+ return true;
+}
+
+#ifdef CONFIG_DEBUG_VM
+static inline bool free_pcp_prepare(struct page *page)
+{
+ return free_pages_prepare(page, 0);
+}
+
+static inline bool bulkfree_pcp_prepare(struct page *page)
+{
+ return false;
+}
+#else
+static bool free_pcp_prepare(struct page *page)
+{
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ trace_mm_page_free(page, 0);
+ kmemcheck_free_shadow(page, 0);
+ kasan_free_pages(page, 0);
+
+ if (PageAnonHead(page))
+ page->mapping = NULL;
+
+ reset_page_owner(page, 0);
+
+ if (!PageHighMem(page)) {
+ debug_check_no_locks_freed(page_address(page),
+ PAGE_SIZE);
+ debug_check_no_obj_freed(page_address(page),
+ PAGE_SIZE);
+ }
+ arch_free_page(page, 0);
+ kernel_poison_pages(page, 0, 0);
+ kernel_map_pages(page, 0, 0);
+
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return true;
+}
+
+static bool bulkfree_pcp_prepare(struct page *page)
+{
+ return free_pages_check(page);
+}
+#endif /* CONFIG_DEBUG_VM */
+
/*
* Frees a number of pages from the PCP lists
* Assumes all pages on list are in same zone, and of same order.
@@ -999,6 +1141,9 @@ static void free_pcppages_bulk(struct zone *zone, int count,
if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
+ if (bulkfree_pcp_prepare(page))
+ continue;
+
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
} while (--count && --batch_free && !list_empty(list));
@@ -1025,56 +1170,6 @@ static void free_one_page(struct zone *zone,
spin_unlock(&zone->lock);
}
-static int free_tail_pages_check(struct page *head_page, struct page *page)
-{
- int ret = 1;
-
- /*
- * We rely page->lru.next never has bit 0 set, unless the page
- * is PageTail(). Let's make sure that's true even for poisoned ->lru.
- */
- BUILD_BUG_ON((unsigned long)LIST_POISON1 & 1);
-
- if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
- ret = 0;
- goto out;
- }
- switch (page - head_page) {
- case 1:
- /* the first tail page: ->mapping is compound_mapcount() */
- if (unlikely(compound_mapcount(page))) {
- bad_page(page, "nonzero compound_mapcount", 0);
- goto out;
- }
- break;
- case 2:
- /*
- * the second tail page: ->mapping is
- * page_deferred_list().next -- ignore value.
- */
- break;
- default:
- if (page->mapping != TAIL_MAPPING) {
- bad_page(page, "corrupted mapping in tail page", 0);
- goto out;
- }
- break;
- }
- if (unlikely(!PageTail(page))) {
- bad_page(page, "PageTail not set", 0);
- goto out;
- }
- if (unlikely(compound_head(page) != head_page)) {
- bad_page(page, "compound_head not consistent", 0);
- goto out;
- }
- ret = 0;
-out:
- page->mapping = NULL;
- clear_compound_head(page);
- return ret;
-}
-
static void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long zone, int nid)
{
@@ -1148,53 +1243,6 @@ void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
}
}
-static bool free_pages_prepare(struct page *page, unsigned int order)
-{
- int bad = 0;
-
- VM_BUG_ON_PAGE(PageTail(page), page);
-
- trace_mm_page_free(page, order);
- kmemcheck_free_shadow(page, order);
- kasan_free_pages(page, order);
-
- /*
- * Check tail pages before head page information is cleared to
- * avoid checking PageCompound for order-0 pages.
- */
- if (order) {
- bool compound = PageCompound(page);
- int i;
-
- VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
-
- for (i = 1; i < (1 << order); i++) {
- if (compound)
- bad += free_tail_pages_check(page, page + i);
- bad += free_pages_check(page + i);
- }
- }
- if (PageAnonHead(page))
- page->mapping = NULL;
- bad += free_pages_check(page);
- if (bad)
- return false;
-
- reset_page_owner(page, order);
-
- if (!PageHighMem(page)) {
- debug_check_no_locks_freed(page_address(page),
- PAGE_SIZE << order);
- debug_check_no_obj_freed(page_address(page),
- PAGE_SIZE << order);
- }
- arch_free_page(page, order);
- kernel_poison_pages(page, 1 << order, 0);
- kernel_map_pages(page, 1 << order, 0);
-
- return true;
-}
-
static void __free_pages_ok(struct page *page, unsigned int order)
{
unsigned long flags;
@@ -2327,7 +2375,7 @@ void free_hot_cold_page(struct page *page, bool cold)
unsigned long pfn = page_to_pfn(page);
int migratetype;
- if (!free_pages_prepare(page, 0))
+ if (!free_pcp_prepare(page))
return;
migratetype = get_pfnblock_migratetype(page, pfn);
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 27/28] mm, page_alloc: Defer debugging checks of freed pages until a PCP drain
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-27 11:59 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 11:59 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Every page free checks a number of page fields for validity. This
> catches premature frees and corruptions but it is also expensive.
> This patch weakens the debugging check by checking PCP pages at the
> time they are drained from the PCP list. This will trigger the bug
> but the site that freed the corrupt page will be lost. To get the
> full context, a kernel rebuild with DEBUG_VM is necessary.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
I don't like the duplicated code in free_pcp_prepare() from maintenance
perspective, as Hugh just reminded me that similar kind of duplication
between page_alloc.c and compaction.c can easily lead to mistakes. I've
tried to fix that, which resulted in 3 small patches I'll post as
replies here. Could be that the ideas will be applicable also to 28/28
which I haven't checked yet.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 27/28] mm, page_alloc: Defer debugging checks of freed pages until a PCP drain
@ 2016-04-27 11:59 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 11:59 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Every page free checks a number of page fields for validity. This
> catches premature frees and corruptions but it is also expensive.
> This patch weakens the debugging check by checking PCP pages at the
> time they are drained from the PCP list. This will trigger the bug
> but the site that freed the corrupt page will be lost. To get the
> full context, a kernel rebuild with DEBUG_VM is necessary.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
I don't like the duplicated code in free_pcp_prepare() from maintenance
perspective, as Hugh just reminded me that similar kind of duplication
between page_alloc.c and compaction.c can easily lead to mistakes. I've
tried to fix that, which resulted in 3 small patches I'll post as
replies here. Could be that the ideas will be applicable also to 28/28
which I haven't checked yet.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 1/3] mm, page_alloc: un-inline the bad part of free_pages_check
2016-04-27 11:59 ` Vlastimil Babka
@ 2016-04-27 12:01 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:01 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: linux-mm, linux-kernel, Jesper Dangaard Brouer, Vlastimil Babka
!DEBUG_VM bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-383 (-259)
function old new delta
free_pages_check_bad - 124 +124
free_pcppages_bulk 1509 1403 -106
__free_pages_ok 1025 748 -277
DEBUG_VM:
add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-242 (-118)
function old new delta
free_pages_check_bad - 124 +124
free_pages_prepare 1048 806 -242
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fe78c4dbfa8d..12c03a8509a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -906,18 +906,11 @@ static inline bool page_expected_state(struct page *page,
return true;
}
-static inline int free_pages_check(struct page *page)
+static void free_pages_check_bad(struct page *page)
{
const char *bad_reason;
unsigned long bad_flags;
- if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return 0;
- }
-
- /* Something has gone sideways, find it */
bad_reason = NULL;
bad_flags = 0;
@@ -936,6 +929,17 @@ static inline int free_pages_check(struct page *page)
bad_reason = "page still charged to cgroup";
#endif
bad_page(page, bad_reason, bad_flags);
+}
+static inline int free_pages_check(struct page *page)
+{
+ if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return 0;
+ }
+
+ /* Something has gone sideways, find it */
+ free_pages_check_bad(page);
return 1;
}
--
2.8.1
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 1/3] mm, page_alloc: un-inline the bad part of free_pages_check
@ 2016-04-27 12:01 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:01 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: linux-mm, linux-kernel, Jesper Dangaard Brouer, Vlastimil Babka
!DEBUG_VM bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-383 (-259)
function old new delta
free_pages_check_bad - 124 +124
free_pcppages_bulk 1509 1403 -106
__free_pages_ok 1025 748 -277
DEBUG_VM:
add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-242 (-118)
function old new delta
free_pages_check_bad - 124 +124
free_pages_prepare 1048 806 -242
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fe78c4dbfa8d..12c03a8509a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -906,18 +906,11 @@ static inline bool page_expected_state(struct page *page,
return true;
}
-static inline int free_pages_check(struct page *page)
+static void free_pages_check_bad(struct page *page)
{
const char *bad_reason;
unsigned long bad_flags;
- if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return 0;
- }
-
- /* Something has gone sideways, find it */
bad_reason = NULL;
bad_flags = 0;
@@ -936,6 +929,17 @@ static inline int free_pages_check(struct page *page)
bad_reason = "page still charged to cgroup";
#endif
bad_page(page, bad_reason, bad_flags);
+}
+static inline int free_pages_check(struct page *page)
+{
+ if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return 0;
+ }
+
+ /* Something has gone sideways, find it */
+ free_pages_check_bad(page);
return 1;
}
--
2.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 2/3] mm, page_alloc: pull out side effects from free_pages_check
2016-04-27 12:01 ` Vlastimil Babka
@ 2016-04-27 12:01 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:01 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: linux-mm, linux-kernel, Jesper Dangaard Brouer, Vlastimil Babka
Check without side-effects should be easier to maintain. It also removes the
duplicated cpupid and flags reset done in !DEBUG_VM variant of both
free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
the next patch.
It shouldn't result in new branches, thanks to inlining of the check.
!DEBUG_VM bloat-o-meter:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function old new delta
__free_pages_ok 748 739 -9
free_pcppages_bulk 1403 1385 -18
DEBUG_VM:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function old new delta
free_pages_prepare 806 778 -28
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 12c03a8509a0..163d08ea43f0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -932,11 +932,8 @@ static void free_pages_check_bad(struct page *page)
}
static inline int free_pages_check(struct page *page)
{
- if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
return 0;
- }
/* Something has gone sideways, find it */
free_pages_check_bad(page);
@@ -1016,12 +1013,22 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
- bad += free_pages_check(page + i);
+ if (free_pages_check(page + i)) {
+ bad++;
+ } else {
+ page_cpupid_reset_last(page + i);
+ (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ }
}
}
if (PageAnonHead(page))
page->mapping = NULL;
- bad += free_pages_check(page);
+ if (free_pages_check(page)) {
+ bad++;
+ } else {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ }
if (bad)
return false;
--
2.8.1
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 2/3] mm, page_alloc: pull out side effects from free_pages_check
@ 2016-04-27 12:01 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:01 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: linux-mm, linux-kernel, Jesper Dangaard Brouer, Vlastimil Babka
Check without side-effects should be easier to maintain. It also removes the
duplicated cpupid and flags reset done in !DEBUG_VM variant of both
free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
the next patch.
It shouldn't result in new branches, thanks to inlining of the check.
!DEBUG_VM bloat-o-meter:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function old new delta
__free_pages_ok 748 739 -9
free_pcppages_bulk 1403 1385 -18
DEBUG_VM:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function old new delta
free_pages_prepare 806 778 -28
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 12c03a8509a0..163d08ea43f0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -932,11 +932,8 @@ static void free_pages_check_bad(struct page *page)
}
static inline int free_pages_check(struct page *page)
{
- if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
return 0;
- }
/* Something has gone sideways, find it */
free_pages_check_bad(page);
@@ -1016,12 +1013,22 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
- bad += free_pages_check(page + i);
+ if (free_pages_check(page + i)) {
+ bad++;
+ } else {
+ page_cpupid_reset_last(page + i);
+ (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ }
}
}
if (PageAnonHead(page))
page->mapping = NULL;
- bad += free_pages_check(page);
+ if (free_pages_check(page)) {
+ bad++;
+ } else {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ }
if (bad)
return false;
--
2.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 2/3] mm, page_alloc: pull out side effects from free_pages_check
2016-04-27 12:01 ` Vlastimil Babka
@ 2016-04-27 12:41 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 12:41 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On Wed, Apr 27, 2016 at 02:01:15PM +0200, Vlastimil Babka wrote:
> Check without side-effects should be easier to maintain. It also removes the
> duplicated cpupid and flags reset done in !DEBUG_VM variant of both
> free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
> the next patch.
>
Hmm, now the cpuid and flags reset is done in multiple places. While
this is potentially faster, it goes against the comment "I don't like the
duplicated code in free_pcp_prepare() from maintenance perspective".
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 2/3] mm, page_alloc: pull out side effects from free_pages_check
@ 2016-04-27 12:41 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 12:41 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On Wed, Apr 27, 2016 at 02:01:15PM +0200, Vlastimil Babka wrote:
> Check without side-effects should be easier to maintain. It also removes the
> duplicated cpupid and flags reset done in !DEBUG_VM variant of both
> free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
> the next patch.
>
Hmm, now the cpuid and flags reset is done in multiple places. While
this is potentially faster, it goes against the comment "I don't like the
duplicated code in free_pcp_prepare() from maintenance perspective".
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 2/3] mm, page_alloc: pull out side effects from free_pages_check
2016-04-27 12:41 ` Mel Gorman
@ 2016-04-27 13:00 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 13:00 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On 04/27/2016 02:41 PM, Mel Gorman wrote:
> On Wed, Apr 27, 2016 at 02:01:15PM +0200, Vlastimil Babka wrote:
>> Check without side-effects should be easier to maintain. It also removes the
>> duplicated cpupid and flags reset done in !DEBUG_VM variant of both
>> free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
>> the next patch.
>>
>
> Hmm, now the cpuid and flags reset is done in multiple places. While
> this is potentially faster, it goes against the comment "I don't like the
> duplicated code in free_pcp_prepare() from maintenance perspective".
After patch 3/3 it's done only in free_pages_prepare() which I think is
not that bad, even though it's two places there. Tail pages are already
special in that function. And I thought that the fact it was done twice
in !DEBUG_VM free path was actually not intentional, but a consequence
of the side-effect being unexpected. But it's close to bike-shedding
area so I don't insist. Anyway, overal I like the code after patch 3/3
better than before 2/3.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 2/3] mm, page_alloc: pull out side effects from free_pages_check
@ 2016-04-27 13:00 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 13:00 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On 04/27/2016 02:41 PM, Mel Gorman wrote:
> On Wed, Apr 27, 2016 at 02:01:15PM +0200, Vlastimil Babka wrote:
>> Check without side-effects should be easier to maintain. It also removes the
>> duplicated cpupid and flags reset done in !DEBUG_VM variant of both
>> free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
>> the next patch.
>>
>
> Hmm, now the cpuid and flags reset is done in multiple places. While
> this is potentially faster, it goes against the comment "I don't like the
> duplicated code in free_pcp_prepare() from maintenance perspective".
After patch 3/3 it's done only in free_pages_prepare() which I think is
not that bad, even though it's two places there. Tail pages are already
special in that function. And I thought that the fact it was done twice
in !DEBUG_VM free path was actually not intentional, but a consequence
of the side-effect being unexpected. But it's close to bike-shedding
area so I don't insist. Anyway, overal I like the code after patch 3/3
better than before 2/3.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 3/3] mm, page_alloc: don't duplicate code in free_pcp_prepare
2016-04-27 12:01 ` Vlastimil Babka
@ 2016-04-27 12:01 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:01 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: linux-mm, linux-kernel, Jesper Dangaard Brouer, Vlastimil Babka
The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some future
patch modifies only one of them. We should be able to achieve the same effect
(skipping free_pages_check() from !DEBUG_VM configs) by adding a parameter to
free_pages_prepare() and making it inline, so the checks (and the order != 0
parts) are eliminated from the call from free_pcp_prepare().
!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already inlining
free_pages_prepare() and the elimination seems to work as expected
DEBUG_VM bloat-o-meter:
add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function old new delta
__free_pages_ok 297 1060 +763
free_hot_cold_page 480 752 +272
free_pages_prepare 778 - -778
Here inlining didn't occur before, and added some code, but it's ok for a debug
option.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 34 ++++++----------------------------
1 file changed, 6 insertions(+), 28 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 163d08ea43f0..b23f641348ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -990,7 +990,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
return ret;
}
-static bool free_pages_prepare(struct page *page, unsigned int order)
+static __always_inline bool free_pages_prepare(struct page *page, unsigned int order,
+ bool check_free)
{
int bad = 0;
@@ -1023,7 +1024,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
}
if (PageAnonHead(page))
page->mapping = NULL;
- if (free_pages_check(page)) {
+ if (check_free && free_pages_check(page)) {
bad++;
} else {
page_cpupid_reset_last(page);
@@ -1050,7 +1051,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
#ifdef CONFIG_DEBUG_VM
static inline bool free_pcp_prepare(struct page *page)
{
- return free_pages_prepare(page, 0);
+ return free_pages_prepare(page, 0, true);
}
static inline bool bulkfree_pcp_prepare(struct page *page)
@@ -1060,30 +1061,7 @@ static inline bool bulkfree_pcp_prepare(struct page *page)
#else
static bool free_pcp_prepare(struct page *page)
{
- VM_BUG_ON_PAGE(PageTail(page), page);
-
- trace_mm_page_free(page, 0);
- kmemcheck_free_shadow(page, 0);
- kasan_free_pages(page, 0);
-
- if (PageAnonHead(page))
- page->mapping = NULL;
-
- reset_page_owner(page, 0);
-
- if (!PageHighMem(page)) {
- debug_check_no_locks_freed(page_address(page),
- PAGE_SIZE);
- debug_check_no_obj_freed(page_address(page),
- PAGE_SIZE);
- }
- arch_free_page(page, 0);
- kernel_poison_pages(page, 0, 0);
- kernel_map_pages(page, 0, 0);
-
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return true;
+ return free_pages_prepare(page, 0, false);
}
static bool bulkfree_pcp_prepare(struct page *page)
@@ -1260,7 +1238,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
int migratetype;
unsigned long pfn = page_to_pfn(page);
- if (!free_pages_prepare(page, order))
+ if (!free_pages_prepare(page, order, true))
return;
migratetype = get_pfnblock_migratetype(page, pfn);
--
2.8.1
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 3/3] mm, page_alloc: don't duplicate code in free_pcp_prepare
@ 2016-04-27 12:01 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:01 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton
Cc: linux-mm, linux-kernel, Jesper Dangaard Brouer, Vlastimil Babka
The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some future
patch modifies only one of them. We should be able to achieve the same effect
(skipping free_pages_check() from !DEBUG_VM configs) by adding a parameter to
free_pages_prepare() and making it inline, so the checks (and the order != 0
parts) are eliminated from the call from free_pcp_prepare().
!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already inlining
free_pages_prepare() and the elimination seems to work as expected
DEBUG_VM bloat-o-meter:
add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function old new delta
__free_pages_ok 297 1060 +763
free_hot_cold_page 480 752 +272
free_pages_prepare 778 - -778
Here inlining didn't occur before, and added some code, but it's ok for a debug
option.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 34 ++++++----------------------------
1 file changed, 6 insertions(+), 28 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 163d08ea43f0..b23f641348ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -990,7 +990,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
return ret;
}
-static bool free_pages_prepare(struct page *page, unsigned int order)
+static __always_inline bool free_pages_prepare(struct page *page, unsigned int order,
+ bool check_free)
{
int bad = 0;
@@ -1023,7 +1024,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
}
if (PageAnonHead(page))
page->mapping = NULL;
- if (free_pages_check(page)) {
+ if (check_free && free_pages_check(page)) {
bad++;
} else {
page_cpupid_reset_last(page);
@@ -1050,7 +1051,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
#ifdef CONFIG_DEBUG_VM
static inline bool free_pcp_prepare(struct page *page)
{
- return free_pages_prepare(page, 0);
+ return free_pages_prepare(page, 0, true);
}
static inline bool bulkfree_pcp_prepare(struct page *page)
@@ -1060,30 +1061,7 @@ static inline bool bulkfree_pcp_prepare(struct page *page)
#else
static bool free_pcp_prepare(struct page *page)
{
- VM_BUG_ON_PAGE(PageTail(page), page);
-
- trace_mm_page_free(page, 0);
- kmemcheck_free_shadow(page, 0);
- kasan_free_pages(page, 0);
-
- if (PageAnonHead(page))
- page->mapping = NULL;
-
- reset_page_owner(page, 0);
-
- if (!PageHighMem(page)) {
- debug_check_no_locks_freed(page_address(page),
- PAGE_SIZE);
- debug_check_no_obj_freed(page_address(page),
- PAGE_SIZE);
- }
- arch_free_page(page, 0);
- kernel_poison_pages(page, 0, 0);
- kernel_map_pages(page, 0, 0);
-
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return true;
+ return free_pages_prepare(page, 0, false);
}
static bool bulkfree_pcp_prepare(struct page *page)
@@ -1260,7 +1238,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
int migratetype;
unsigned long pfn = page_to_pfn(page);
- if (!free_pages_prepare(page, order))
+ if (!free_pages_prepare(page, order, true))
return;
migratetype = get_pfnblock_migratetype(page, pfn);
--
2.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 1/3] mm, page_alloc: un-inline the bad part of free_pages_check
2016-04-27 12:01 ` Vlastimil Babka
@ 2016-04-27 12:37 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 12:37 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On Wed, Apr 27, 2016 at 02:01:14PM +0200, Vlastimil Babka wrote:
> !DEBUG_VM bloat-o-meter:
>
> add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-383 (-259)
> function old new delta
> free_pages_check_bad - 124 +124
> free_pcppages_bulk 1509 1403 -106
> __free_pages_ok 1025 748 -277
>
> DEBUG_VM:
>
> add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-242 (-118)
> function old new delta
> free_pages_check_bad - 124 +124
> free_pages_prepare 1048 806 -242
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
This uninlines the check all right but it also introduces new function
calls into the free path. As it's the free fast path, I suspect it would
be a step in the wrong direction from a performance perspective.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 1/3] mm, page_alloc: un-inline the bad part of free_pages_check
@ 2016-04-27 12:37 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 12:37 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On Wed, Apr 27, 2016 at 02:01:14PM +0200, Vlastimil Babka wrote:
> !DEBUG_VM bloat-o-meter:
>
> add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-383 (-259)
> function old new delta
> free_pages_check_bad - 124 +124
> free_pcppages_bulk 1509 1403 -106
> __free_pages_ok 1025 748 -277
>
> DEBUG_VM:
>
> add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-242 (-118)
> function old new delta
> free_pages_check_bad - 124 +124
> free_pages_prepare 1048 806 -242
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
This uninlines the check all right but it also introduces new function
calls into the free path. As it's the free fast path, I suspect it would
be a step in the wrong direction from a performance perspective.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 1/3] mm, page_alloc: un-inline the bad part of free_pages_check
2016-04-27 12:37 ` Mel Gorman
@ 2016-04-27 12:53 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:53 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On 04/27/2016 02:37 PM, Mel Gorman wrote:
> On Wed, Apr 27, 2016 at 02:01:14PM +0200, Vlastimil Babka wrote:
>> !DEBUG_VM bloat-o-meter:
>>
>> add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-383 (-259)
>> function old new delta
>> free_pages_check_bad - 124 +124
>> free_pcppages_bulk 1509 1403 -106
>> __free_pages_ok 1025 748 -277
>>
>> DEBUG_VM:
>>
>> add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-242 (-118)
>> function old new delta
>> free_pages_check_bad - 124 +124
>> free_pages_prepare 1048 806 -242
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> This uninlines the check all right but it also introduces new function
> calls into the free path. As it's the free fast path, I suspect it would
> be a step in the wrong direction from a performance perspective.
Oh expected this to be a non-issue as the call only happens when a bad
page is actually encountered, which is rare? But if you can measure some
overhead here then sure.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 1/3] mm, page_alloc: un-inline the bad part of free_pages_check
@ 2016-04-27 12:53 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 12:53 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, Jesper Dangaard Brouer
On 04/27/2016 02:37 PM, Mel Gorman wrote:
> On Wed, Apr 27, 2016 at 02:01:14PM +0200, Vlastimil Babka wrote:
>> !DEBUG_VM bloat-o-meter:
>>
>> add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-383 (-259)
>> function old new delta
>> free_pages_check_bad - 124 +124
>> free_pcppages_bulk 1509 1403 -106
>> __free_pages_ok 1025 748 -277
>>
>> DEBUG_VM:
>>
>> add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-242 (-118)
>> function old new delta
>> free_pages_check_bad - 124 +124
>> free_pages_prepare 1048 806 -242
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> This uninlines the check all right but it also introduces new function
> calls into the free path. As it's the free fast path, I suspect it would
> be a step in the wrong direction from a performance perspective.
Oh expected this to be a non-issue as the call only happens when a bad
page is actually encountered, which is rare? But if you can measure some
overhead here then sure.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-15 9:07 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Every page allocated checks a number of page fields for validity. This
catches corruption bugs of pages that are already freed but it is expensive.
This patch weakens the debugging check by checking PCP pages only when
the PCP lists are being refilled. All compound pages are checked. This
potentially avoids debugging checks entirely if the PCP lists are never
emptied and refilled so some corruption issues may be missed. Full checking
requires DEBUG_VM.
With the two deferred debugging patches applied, the impact to a page
allocator microbenchmark is
4.6.0-rc3 4.6.0-rc3
inline-v3r6 deferalloc-v3r7
Min alloc-odr0-1 344.00 ( 0.00%) 317.00 ( 7.85%)
Min alloc-odr0-2 248.00 ( 0.00%) 231.00 ( 6.85%)
Min alloc-odr0-4 209.00 ( 0.00%) 192.00 ( 8.13%)
Min alloc-odr0-8 181.00 ( 0.00%) 166.00 ( 8.29%)
Min alloc-odr0-16 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-32 161.00 ( 0.00%) 148.00 ( 8.07%)
Min alloc-odr0-64 158.00 ( 0.00%) 145.00 ( 8.23%)
Min alloc-odr0-128 156.00 ( 0.00%) 143.00 ( 8.33%)
Min alloc-odr0-256 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-512 178.00 ( 0.00%) 167.00 ( 6.18%)
Min alloc-odr0-1024 186.00 ( 0.00%) 174.00 ( 6.45%)
Min alloc-odr0-2048 192.00 ( 0.00%) 180.00 ( 6.25%)
Min alloc-odr0-4096 198.00 ( 0.00%) 184.00 ( 7.07%)
Min alloc-odr0-8192 200.00 ( 0.00%) 188.00 ( 6.00%)
Min alloc-odr0-16384 201.00 ( 0.00%) 188.00 ( 6.47%)
Min free-odr0-1 189.00 ( 0.00%) 180.00 ( 4.76%)
Min free-odr0-2 132.00 ( 0.00%) 126.00 ( 4.55%)
Min free-odr0-4 104.00 ( 0.00%) 99.00 ( 4.81%)
Min free-odr0-8 90.00 ( 0.00%) 85.00 ( 5.56%)
Min free-odr0-16 84.00 ( 0.00%) 80.00 ( 4.76%)
Min free-odr0-32 80.00 ( 0.00%) 76.00 ( 5.00%)
Min free-odr0-64 78.00 ( 0.00%) 74.00 ( 5.13%)
Min free-odr0-128 77.00 ( 0.00%) 73.00 ( 5.19%)
Min free-odr0-256 94.00 ( 0.00%) 91.00 ( 3.19%)
Min free-odr0-512 108.00 ( 0.00%) 112.00 ( -3.70%)
Min free-odr0-1024 115.00 ( 0.00%) 118.00 ( -2.61%)
Min free-odr0-2048 120.00 ( 0.00%) 125.00 ( -4.17%)
Min free-odr0-4096 123.00 ( 0.00%) 129.00 ( -4.88%)
Min free-odr0-8192 126.00 ( 0.00%) 130.00 ( -3.17%)
Min free-odr0-16384 126.00 ( 0.00%) 131.00 ( -3.97%)
Note that the free paths for large numbers of pages is impacted as the
debugging cost gets shifted into that path when the page data is no longer
necessarily cache-hot.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 92 +++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 64 insertions(+), 28 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b5722790c846..147c0d55ed32 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1704,7 +1704,41 @@ static inline bool free_pages_prezeroed(bool poisoned)
page_poisoning_enabled() && poisoned;
}
-static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
+#ifdef CONFIG_DEBUG_VM
+static bool check_pcp_refill(struct page *page)
+{
+ return false;
+}
+
+static bool check_new_pcp(struct page *page)
+{
+ return check_new_page(page);
+}
+#else
+static bool check_pcp_refill(struct page *page)
+{
+ return check_new_page(page);
+}
+static bool check_new_pcp(struct page *page)
+{
+ return false;
+}
+#endif /* CONFIG_DEBUG_VM */
+
+static bool check_new_pages(struct page *page, unsigned int order)
+{
+ int i;
+ for (i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+
+ if (unlikely(check_new_page(p)))
+ return true;
+ }
+
+ return false;
+}
+
+static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
unsigned int alloc_flags)
{
int i;
@@ -1712,8 +1746,6 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
for (i = 0; i < (1 << order); i++) {
struct page *p = page + i;
- if (unlikely(check_new_page(p)))
- return 1;
if (poisoned)
poisoned &= page_is_poisoned(p);
}
@@ -1745,8 +1777,6 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
set_page_pfmemalloc(page);
else
clear_page_pfmemalloc(page);
-
- return 0;
}
/*
@@ -2168,6 +2198,9 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
if (unlikely(page == NULL))
break;
+ if (unlikely(check_pcp_refill(page)))
+ continue;
+
/*
* Split buddy pages returned by expand() are received here
* in physical page order. The page is added to the callers and
@@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
struct list_head *list;
local_irq_save(flags);
- pcp = &this_cpu_ptr(zone->pageset)->pcp;
- list = &pcp->lists[migratetype];
- if (list_empty(list)) {
- pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, list,
- migratetype, cold);
- if (unlikely(list_empty(list)))
- goto failed;
- }
+ do {
+ pcp = &this_cpu_ptr(zone->pageset)->pcp;
+ list = &pcp->lists[migratetype];
+ if (list_empty(list)) {
+ pcp->count += rmqueue_bulk(zone, 0,
+ pcp->batch, list,
+ migratetype, cold);
+ if (unlikely(list_empty(list)))
+ goto failed;
+ }
- if (cold)
- page = list_last_entry(list, struct page, lru);
- else
- page = list_first_entry(list, struct page, lru);
+ if (cold)
+ page = list_last_entry(list, struct page, lru);
+ else
+ page = list_first_entry(list, struct page, lru);
+ } while (page && check_new_pcp(page));
__dec_zone_state(zone, NR_ALLOC_BATCH);
list_del(&page->lru);
@@ -2605,14 +2640,16 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
spin_lock_irqsave(&zone->lock, flags);
- page = NULL;
- if (alloc_flags & ALLOC_HARDER) {
- page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
- if (page)
- trace_mm_page_alloc_zone_locked(page, order, migratetype);
- }
- if (!page)
- page = __rmqueue(zone, order, migratetype);
+ do {
+ page = NULL;
+ if (alloc_flags & ALLOC_HARDER) {
+ page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+ if (page)
+ trace_mm_page_alloc_zone_locked(page, order, migratetype);
+ }
+ if (!page)
+ page = __rmqueue(zone, order, migratetype);
+ } while (page && check_new_pages(page, order));
spin_unlock(&zone->lock);
if (!page)
goto failed;
@@ -2979,8 +3016,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
- if (prep_new_page(page, order, gfp_mask, alloc_flags))
- goto try_this_zone;
+ prep_new_page(page, order, gfp_mask, alloc_flags);
/*
* If this is a high-order atomic allocation then check
--
2.6.4
^ permalink raw reply related [flat|nested] 160+ messages in thread
* [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-04-15 9:07 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-15 9:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman
Every page allocated checks a number of page fields for validity. This
catches corruption bugs of pages that are already freed but it is expensive.
This patch weakens the debugging check by checking PCP pages only when
the PCP lists are being refilled. All compound pages are checked. This
potentially avoids debugging checks entirely if the PCP lists are never
emptied and refilled so some corruption issues may be missed. Full checking
requires DEBUG_VM.
With the two deferred debugging patches applied, the impact to a page
allocator microbenchmark is
4.6.0-rc3 4.6.0-rc3
inline-v3r6 deferalloc-v3r7
Min alloc-odr0-1 344.00 ( 0.00%) 317.00 ( 7.85%)
Min alloc-odr0-2 248.00 ( 0.00%) 231.00 ( 6.85%)
Min alloc-odr0-4 209.00 ( 0.00%) 192.00 ( 8.13%)
Min alloc-odr0-8 181.00 ( 0.00%) 166.00 ( 8.29%)
Min alloc-odr0-16 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-32 161.00 ( 0.00%) 148.00 ( 8.07%)
Min alloc-odr0-64 158.00 ( 0.00%) 145.00 ( 8.23%)
Min alloc-odr0-128 156.00 ( 0.00%) 143.00 ( 8.33%)
Min alloc-odr0-256 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-512 178.00 ( 0.00%) 167.00 ( 6.18%)
Min alloc-odr0-1024 186.00 ( 0.00%) 174.00 ( 6.45%)
Min alloc-odr0-2048 192.00 ( 0.00%) 180.00 ( 6.25%)
Min alloc-odr0-4096 198.00 ( 0.00%) 184.00 ( 7.07%)
Min alloc-odr0-8192 200.00 ( 0.00%) 188.00 ( 6.00%)
Min alloc-odr0-16384 201.00 ( 0.00%) 188.00 ( 6.47%)
Min free-odr0-1 189.00 ( 0.00%) 180.00 ( 4.76%)
Min free-odr0-2 132.00 ( 0.00%) 126.00 ( 4.55%)
Min free-odr0-4 104.00 ( 0.00%) 99.00 ( 4.81%)
Min free-odr0-8 90.00 ( 0.00%) 85.00 ( 5.56%)
Min free-odr0-16 84.00 ( 0.00%) 80.00 ( 4.76%)
Min free-odr0-32 80.00 ( 0.00%) 76.00 ( 5.00%)
Min free-odr0-64 78.00 ( 0.00%) 74.00 ( 5.13%)
Min free-odr0-128 77.00 ( 0.00%) 73.00 ( 5.19%)
Min free-odr0-256 94.00 ( 0.00%) 91.00 ( 3.19%)
Min free-odr0-512 108.00 ( 0.00%) 112.00 ( -3.70%)
Min free-odr0-1024 115.00 ( 0.00%) 118.00 ( -2.61%)
Min free-odr0-2048 120.00 ( 0.00%) 125.00 ( -4.17%)
Min free-odr0-4096 123.00 ( 0.00%) 129.00 ( -4.88%)
Min free-odr0-8192 126.00 ( 0.00%) 130.00 ( -3.17%)
Min free-odr0-16384 126.00 ( 0.00%) 131.00 ( -3.97%)
Note that the free paths for large numbers of pages is impacted as the
debugging cost gets shifted into that path when the page data is no longer
necessarily cache-hot.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 92 +++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 64 insertions(+), 28 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b5722790c846..147c0d55ed32 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1704,7 +1704,41 @@ static inline bool free_pages_prezeroed(bool poisoned)
page_poisoning_enabled() && poisoned;
}
-static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
+#ifdef CONFIG_DEBUG_VM
+static bool check_pcp_refill(struct page *page)
+{
+ return false;
+}
+
+static bool check_new_pcp(struct page *page)
+{
+ return check_new_page(page);
+}
+#else
+static bool check_pcp_refill(struct page *page)
+{
+ return check_new_page(page);
+}
+static bool check_new_pcp(struct page *page)
+{
+ return false;
+}
+#endif /* CONFIG_DEBUG_VM */
+
+static bool check_new_pages(struct page *page, unsigned int order)
+{
+ int i;
+ for (i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+
+ if (unlikely(check_new_page(p)))
+ return true;
+ }
+
+ return false;
+}
+
+static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
unsigned int alloc_flags)
{
int i;
@@ -1712,8 +1746,6 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
for (i = 0; i < (1 << order); i++) {
struct page *p = page + i;
- if (unlikely(check_new_page(p)))
- return 1;
if (poisoned)
poisoned &= page_is_poisoned(p);
}
@@ -1745,8 +1777,6 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
set_page_pfmemalloc(page);
else
clear_page_pfmemalloc(page);
-
- return 0;
}
/*
@@ -2168,6 +2198,9 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
if (unlikely(page == NULL))
break;
+ if (unlikely(check_pcp_refill(page)))
+ continue;
+
/*
* Split buddy pages returned by expand() are received here
* in physical page order. The page is added to the callers and
@@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
struct list_head *list;
local_irq_save(flags);
- pcp = &this_cpu_ptr(zone->pageset)->pcp;
- list = &pcp->lists[migratetype];
- if (list_empty(list)) {
- pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, list,
- migratetype, cold);
- if (unlikely(list_empty(list)))
- goto failed;
- }
+ do {
+ pcp = &this_cpu_ptr(zone->pageset)->pcp;
+ list = &pcp->lists[migratetype];
+ if (list_empty(list)) {
+ pcp->count += rmqueue_bulk(zone, 0,
+ pcp->batch, list,
+ migratetype, cold);
+ if (unlikely(list_empty(list)))
+ goto failed;
+ }
- if (cold)
- page = list_last_entry(list, struct page, lru);
- else
- page = list_first_entry(list, struct page, lru);
+ if (cold)
+ page = list_last_entry(list, struct page, lru);
+ else
+ page = list_first_entry(list, struct page, lru);
+ } while (page && check_new_pcp(page));
__dec_zone_state(zone, NR_ALLOC_BATCH);
list_del(&page->lru);
@@ -2605,14 +2640,16 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
spin_lock_irqsave(&zone->lock, flags);
- page = NULL;
- if (alloc_flags & ALLOC_HARDER) {
- page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
- if (page)
- trace_mm_page_alloc_zone_locked(page, order, migratetype);
- }
- if (!page)
- page = __rmqueue(zone, order, migratetype);
+ do {
+ page = NULL;
+ if (alloc_flags & ALLOC_HARDER) {
+ page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+ if (page)
+ trace_mm_page_alloc_zone_locked(page, order, migratetype);
+ }
+ if (!page)
+ page = __rmqueue(zone, order, migratetype);
+ } while (page && check_new_pages(page, order));
spin_unlock(&zone->lock);
if (!page)
goto failed;
@@ -2979,8 +3016,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
- if (prep_new_page(page, order, gfp_mask, alloc_flags))
- goto try_this_zone;
+ prep_new_page(page, order, gfp_mask, alloc_flags);
/*
* If this is a high-order atomic allocation then check
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-27 14:06 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 14:06 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Every page allocated checks a number of page fields for validity. This
> catches corruption bugs of pages that are already freed but it is expensive.
> This patch weakens the debugging check by checking PCP pages only when
> the PCP lists are being refilled. All compound pages are checked. This
> potentially avoids debugging checks entirely if the PCP lists are never
> emptied and refilled so some corruption issues may be missed. Full checking
> requires DEBUG_VM.
>
> With the two deferred debugging patches applied, the impact to a page
> allocator microbenchmark is
>
> 4.6.0-rc3 4.6.0-rc3
> inline-v3r6 deferalloc-v3r7
> Min alloc-odr0-1 344.00 ( 0.00%) 317.00 ( 7.85%)
> Min alloc-odr0-2 248.00 ( 0.00%) 231.00 ( 6.85%)
> Min alloc-odr0-4 209.00 ( 0.00%) 192.00 ( 8.13%)
> Min alloc-odr0-8 181.00 ( 0.00%) 166.00 ( 8.29%)
> Min alloc-odr0-16 168.00 ( 0.00%) 154.00 ( 8.33%)
> Min alloc-odr0-32 161.00 ( 0.00%) 148.00 ( 8.07%)
> Min alloc-odr0-64 158.00 ( 0.00%) 145.00 ( 8.23%)
> Min alloc-odr0-128 156.00 ( 0.00%) 143.00 ( 8.33%)
> Min alloc-odr0-256 168.00 ( 0.00%) 154.00 ( 8.33%)
> Min alloc-odr0-512 178.00 ( 0.00%) 167.00 ( 6.18%)
> Min alloc-odr0-1024 186.00 ( 0.00%) 174.00 ( 6.45%)
> Min alloc-odr0-2048 192.00 ( 0.00%) 180.00 ( 6.25%)
> Min alloc-odr0-4096 198.00 ( 0.00%) 184.00 ( 7.07%)
> Min alloc-odr0-8192 200.00 ( 0.00%) 188.00 ( 6.00%)
> Min alloc-odr0-16384 201.00 ( 0.00%) 188.00 ( 6.47%)
> Min free-odr0-1 189.00 ( 0.00%) 180.00 ( 4.76%)
> Min free-odr0-2 132.00 ( 0.00%) 126.00 ( 4.55%)
> Min free-odr0-4 104.00 ( 0.00%) 99.00 ( 4.81%)
> Min free-odr0-8 90.00 ( 0.00%) 85.00 ( 5.56%)
> Min free-odr0-16 84.00 ( 0.00%) 80.00 ( 4.76%)
> Min free-odr0-32 80.00 ( 0.00%) 76.00 ( 5.00%)
> Min free-odr0-64 78.00 ( 0.00%) 74.00 ( 5.13%)
> Min free-odr0-128 77.00 ( 0.00%) 73.00 ( 5.19%)
> Min free-odr0-256 94.00 ( 0.00%) 91.00 ( 3.19%)
> Min free-odr0-512 108.00 ( 0.00%) 112.00 ( -3.70%)
> Min free-odr0-1024 115.00 ( 0.00%) 118.00 ( -2.61%)
> Min free-odr0-2048 120.00 ( 0.00%) 125.00 ( -4.17%)
> Min free-odr0-4096 123.00 ( 0.00%) 129.00 ( -4.88%)
> Min free-odr0-8192 126.00 ( 0.00%) 130.00 ( -3.17%)
> Min free-odr0-16384 126.00 ( 0.00%) 131.00 ( -3.97%)
>
> Note that the free paths for large numbers of pages is impacted as the
> debugging cost gets shifted into that path when the page data is no longer
> necessarily cache-hot.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Unlike the free path, there are no duplications here, which is nice.
Some un-inlining of bad page check should still work here though imho:
>From afdefd87f2d8d07cba4bd2a2f3531dc8bb0b7a19 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 27 Apr 2016 15:47:29 +0200
Subject: [PATCH] mm, page_alloc: uninline the bad page part of
check_new_page()
Bad pages should be rare so the code handling them doesn't need to be inline
for performance reasons. Put it to separate function which returns void.
This also assumes that the initial page_expected_state() result will match the
result of the thorough check, i.e. the page doesn't become "good" in the
meanwhile. This matches the same expectations already in place in
free_pages_check().
!DEBUG_VM bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
function old new delta
check_new_page_bad - 134 +134
get_page_from_freelist 3468 3194 -274
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b3aefdfcaa2..755ec9465d8a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1648,19 +1648,11 @@ static inline void expand(struct zone *zone, struct page *page,
}
}
-/*
- * This page is about to be returned from the page allocator
- */
-static inline int check_new_page(struct page *page)
+static void check_new_page_bad(struct page *page)
{
- const char *bad_reason;
- unsigned long bad_flags;
+ const char *bad_reason = NULL;
+ unsigned long bad_flags = 0;
- if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))
- return 0;
-
- bad_reason = NULL;
- bad_flags = 0;
if (unlikely(atomic_read(&page->_mapcount) != -1))
bad_reason = "nonzero mapcount";
if (unlikely(page->mapping != NULL))
@@ -1679,11 +1671,20 @@ static inline int check_new_page(struct page *page)
if (unlikely(page->mem_cgroup))
bad_reason = "page still charged to cgroup";
#endif
- if (unlikely(bad_reason)) {
- bad_page(page, bad_reason, bad_flags);
- return 1;
- }
- return 0;
+ bad_page(page, bad_reason, bad_flags);
+}
+
+/*
+ * This page is about to be returned from the page allocator
+ */
+static inline int check_new_page(struct page *page)
+{
+ if (likely(page_expected_state(page,
+ PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON)))
+ return 0;
+
+ check_new_page_bad(page);
+ return 1;
}
static inline bool free_pages_prezeroed(bool poisoned)
--
2.8.1
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-04-27 14:06 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-27 14:06 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> Every page allocated checks a number of page fields for validity. This
> catches corruption bugs of pages that are already freed but it is expensive.
> This patch weakens the debugging check by checking PCP pages only when
> the PCP lists are being refilled. All compound pages are checked. This
> potentially avoids debugging checks entirely if the PCP lists are never
> emptied and refilled so some corruption issues may be missed. Full checking
> requires DEBUG_VM.
>
> With the two deferred debugging patches applied, the impact to a page
> allocator microbenchmark is
>
> 4.6.0-rc3 4.6.0-rc3
> inline-v3r6 deferalloc-v3r7
> Min alloc-odr0-1 344.00 ( 0.00%) 317.00 ( 7.85%)
> Min alloc-odr0-2 248.00 ( 0.00%) 231.00 ( 6.85%)
> Min alloc-odr0-4 209.00 ( 0.00%) 192.00 ( 8.13%)
> Min alloc-odr0-8 181.00 ( 0.00%) 166.00 ( 8.29%)
> Min alloc-odr0-16 168.00 ( 0.00%) 154.00 ( 8.33%)
> Min alloc-odr0-32 161.00 ( 0.00%) 148.00 ( 8.07%)
> Min alloc-odr0-64 158.00 ( 0.00%) 145.00 ( 8.23%)
> Min alloc-odr0-128 156.00 ( 0.00%) 143.00 ( 8.33%)
> Min alloc-odr0-256 168.00 ( 0.00%) 154.00 ( 8.33%)
> Min alloc-odr0-512 178.00 ( 0.00%) 167.00 ( 6.18%)
> Min alloc-odr0-1024 186.00 ( 0.00%) 174.00 ( 6.45%)
> Min alloc-odr0-2048 192.00 ( 0.00%) 180.00 ( 6.25%)
> Min alloc-odr0-4096 198.00 ( 0.00%) 184.00 ( 7.07%)
> Min alloc-odr0-8192 200.00 ( 0.00%) 188.00 ( 6.00%)
> Min alloc-odr0-16384 201.00 ( 0.00%) 188.00 ( 6.47%)
> Min free-odr0-1 189.00 ( 0.00%) 180.00 ( 4.76%)
> Min free-odr0-2 132.00 ( 0.00%) 126.00 ( 4.55%)
> Min free-odr0-4 104.00 ( 0.00%) 99.00 ( 4.81%)
> Min free-odr0-8 90.00 ( 0.00%) 85.00 ( 5.56%)
> Min free-odr0-16 84.00 ( 0.00%) 80.00 ( 4.76%)
> Min free-odr0-32 80.00 ( 0.00%) 76.00 ( 5.00%)
> Min free-odr0-64 78.00 ( 0.00%) 74.00 ( 5.13%)
> Min free-odr0-128 77.00 ( 0.00%) 73.00 ( 5.19%)
> Min free-odr0-256 94.00 ( 0.00%) 91.00 ( 3.19%)
> Min free-odr0-512 108.00 ( 0.00%) 112.00 ( -3.70%)
> Min free-odr0-1024 115.00 ( 0.00%) 118.00 ( -2.61%)
> Min free-odr0-2048 120.00 ( 0.00%) 125.00 ( -4.17%)
> Min free-odr0-4096 123.00 ( 0.00%) 129.00 ( -4.88%)
> Min free-odr0-8192 126.00 ( 0.00%) 130.00 ( -3.17%)
> Min free-odr0-16384 126.00 ( 0.00%) 131.00 ( -3.97%)
>
> Note that the free paths for large numbers of pages is impacted as the
> debugging cost gets shifted into that path when the page data is no longer
> necessarily cache-hot.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Unlike the free path, there are no duplications here, which is nice.
Some un-inlining of bad page check should still work here though imho:
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-04-27 14:06 ` Vlastimil Babka
@ 2016-04-27 15:31 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 15:31 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Wed, Apr 27, 2016 at 04:06:11PM +0200, Vlastimil Babka wrote:
> From afdefd87f2d8d07cba4bd2a2f3531dc8bb0b7a19 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 27 Apr 2016 15:47:29 +0200
> Subject: [PATCH] mm, page_alloc: uninline the bad page part of
> check_new_page()
>
> Bad pages should be rare so the code handling them doesn't need to be inline
> for performance reasons. Put it to separate function which returns void.
> This also assumes that the initial page_expected_state() result will match the
> result of the thorough check, i.e. the page doesn't become "good" in the
> meanwhile. This matches the same expectations already in place in
> free_pages_check().
>
> !DEBUG_VM bloat-o-meter:
>
> add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
> function old new delta
> check_new_page_bad - 134 +134
> get_page_from_freelist 3468 3194 -274
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Andrew, if you pick up v2 of of the follow-up series then can you also
add this patch on top if it's convenient please?
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-04-27 15:31 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-27 15:31 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Wed, Apr 27, 2016 at 04:06:11PM +0200, Vlastimil Babka wrote:
> From afdefd87f2d8d07cba4bd2a2f3531dc8bb0b7a19 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 27 Apr 2016 15:47:29 +0200
> Subject: [PATCH] mm, page_alloc: uninline the bad page part of
> check_new_page()
>
> Bad pages should be rare so the code handling them doesn't need to be inline
> for performance reasons. Put it to separate function which returns void.
> This also assumes that the initial page_expected_state() result will match the
> result of the thorough check, i.e. the page doesn't become "good" in the
> meanwhile. This matches the same expectations already in place in
> free_pages_check().
>
> !DEBUG_VM bloat-o-meter:
>
> add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
> function old new delta
> check_new_page_bad - 134 +134
> get_page_from_freelist 3468 3194 -274
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Andrew, if you pick up v2 of of the follow-up series then can you also
add this patch on top if it's convenient please?
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-04-15 9:07 ` Mel Gorman
@ 2016-05-17 6:41 ` Naoya Horiguchi
-1 siblings, 0 replies; 160+ messages in thread
From: Naoya Horiguchi @ 2016-05-17 6:41 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML
> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> struct list_head *list;
>
> local_irq_save(flags);
> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
> - list = &pcp->lists[migratetype];
> - if (list_empty(list)) {
> - pcp->count += rmqueue_bulk(zone, 0,
> - pcp->batch, list,
> - migratetype, cold);
> - if (unlikely(list_empty(list)))
> - goto failed;
> - }
> + do {
> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
> + list = &pcp->lists[migratetype];
> + if (list_empty(list)) {
> + pcp->count += rmqueue_bulk(zone, 0,
> + pcp->batch, list,
> + migratetype, cold);
> + if (unlikely(list_empty(list)))
> + goto failed;
> + }
>
> - if (cold)
> - page = list_last_entry(list, struct page, lru);
> - else
> - page = list_first_entry(list, struct page, lru);
> + if (cold)
> + page = list_last_entry(list, struct page, lru);
> + else
> + page = list_first_entry(list, struct page, lru);
> + } while (page && check_new_pcp(page));
This causes infinite loop when check_new_pcp() returns 1, because the bad
page is still in the list (I assume that a bad page never disappears).
The original kernel is free from this problem because we do retry after
list_del(). So moving the following 3 lines into this do-while block solves
the problem?
__dec_zone_state(zone, NR_ALLOC_BATCH);
list_del(&page->lru);
pcp->count--;
There seems no infinit loop issue in order > 0 block below, because bad pages
are deleted from free list in __rmqueue_smallest().
Thanks,
Naoya Horiguchi
>
> __dec_zone_state(zone, NR_ALLOC_BATCH);
> list_del(&page->lru);
> @@ -2605,14 +2640,16 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> spin_lock_irqsave(&zone->lock, flags);
>
> - page = NULL;
> - if (alloc_flags & ALLOC_HARDER) {
> - page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> - if (page)
> - trace_mm_page_alloc_zone_locked(page, order, migratetype);
> - }
> - if (!page)
> - page = __rmqueue(zone, order, migratetype);
> + do {
> + page = NULL;
> + if (alloc_flags & ALLOC_HARDER) {
> + page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> + if (page)
> + trace_mm_page_alloc_zone_locked(page, order, migratetype);
> + }
> + if (!page)
> + page = __rmqueue(zone, order, migratetype);
> + } while (page && check_new_pages(page, order));
> spin_unlock(&zone->lock);
> if (!page)
> goto failed;
> @@ -2979,8 +3016,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
> gfp_mask, alloc_flags, ac->migratetype);
> if (page) {
> - if (prep_new_page(page, order, gfp_mask, alloc_flags))
> - goto try_this_zone;
> + prep_new_page(page, order, gfp_mask, alloc_flags);
>
> /*
> * If this is a high-order atomic allocation then check
> --
> 2.6.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-05-17 6:41 ` Naoya Horiguchi
0 siblings, 0 replies; 160+ messages in thread
From: Naoya Horiguchi @ 2016-05-17 6:41 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML
> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> struct list_head *list;
>
> local_irq_save(flags);
> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
> - list = &pcp->lists[migratetype];
> - if (list_empty(list)) {
> - pcp->count += rmqueue_bulk(zone, 0,
> - pcp->batch, list,
> - migratetype, cold);
> - if (unlikely(list_empty(list)))
> - goto failed;
> - }
> + do {
> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
> + list = &pcp->lists[migratetype];
> + if (list_empty(list)) {
> + pcp->count += rmqueue_bulk(zone, 0,
> + pcp->batch, list,
> + migratetype, cold);
> + if (unlikely(list_empty(list)))
> + goto failed;
> + }
>
> - if (cold)
> - page = list_last_entry(list, struct page, lru);
> - else
> - page = list_first_entry(list, struct page, lru);
> + if (cold)
> + page = list_last_entry(list, struct page, lru);
> + else
> + page = list_first_entry(list, struct page, lru);
> + } while (page && check_new_pcp(page));
This causes infinite loop when check_new_pcp() returns 1, because the bad
page is still in the list (I assume that a bad page never disappears).
The original kernel is free from this problem because we do retry after
list_del(). So moving the following 3 lines into this do-while block solves
the problem?
__dec_zone_state(zone, NR_ALLOC_BATCH);
list_del(&page->lru);
pcp->count--;
There seems no infinit loop issue in order > 0 block below, because bad pages
are deleted from free list in __rmqueue_smallest().
Thanks,
Naoya Horiguchi
>
> __dec_zone_state(zone, NR_ALLOC_BATCH);
> list_del(&page->lru);
> @@ -2605,14 +2640,16 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> spin_lock_irqsave(&zone->lock, flags);
>
> - page = NULL;
> - if (alloc_flags & ALLOC_HARDER) {
> - page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> - if (page)
> - trace_mm_page_alloc_zone_locked(page, order, migratetype);
> - }
> - if (!page)
> - page = __rmqueue(zone, order, migratetype);
> + do {
> + page = NULL;
> + if (alloc_flags & ALLOC_HARDER) {
> + page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> + if (page)
> + trace_mm_page_alloc_zone_locked(page, order, migratetype);
> + }
> + if (!page)
> + page = __rmqueue(zone, order, migratetype);
> + } while (page && check_new_pages(page, order));
> spin_unlock(&zone->lock);
> if (!page)
> goto failed;
> @@ -2979,8 +3016,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
> gfp_mask, alloc_flags, ac->migratetype);
> if (page) {
> - if (prep_new_page(page, order, gfp_mask, alloc_flags))
> - goto try_this_zone;
> + prep_new_page(page, order, gfp_mask, alloc_flags);
>
> /*
> * If this is a high-order atomic allocation then check
> --
> 2.6.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-05-17 6:41 ` Naoya Horiguchi
@ 2016-05-18 7:51 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-05-18 7:51 UTC (permalink / raw)
To: Naoya Horiguchi, Mel Gorman
Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 05/17/2016 08:41 AM, Naoya Horiguchi wrote:
>> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
>> struct list_head *list;
>>
>> local_irq_save(flags);
>> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
>> - list = &pcp->lists[migratetype];
>> - if (list_empty(list)) {
>> - pcp->count += rmqueue_bulk(zone, 0,
>> - pcp->batch, list,
>> - migratetype, cold);
>> - if (unlikely(list_empty(list)))
>> - goto failed;
>> - }
>> + do {
>> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
>> + list = &pcp->lists[migratetype];
>> + if (list_empty(list)) {
>> + pcp->count += rmqueue_bulk(zone, 0,
>> + pcp->batch, list,
>> + migratetype, cold);
>> + if (unlikely(list_empty(list)))
>> + goto failed;
>> + }
>>
>> - if (cold)
>> - page = list_last_entry(list, struct page, lru);
>> - else
>> - page = list_first_entry(list, struct page, lru);
>> + if (cold)
>> + page = list_last_entry(list, struct page, lru);
>> + else
>> + page = list_first_entry(list, struct page, lru);
>> + } while (page && check_new_pcp(page));
>
> This causes infinite loop when check_new_pcp() returns 1, because the bad
> page is still in the list (I assume that a bad page never disappears).
> The original kernel is free from this problem because we do retry after
> list_del(). So moving the following 3 lines into this do-while block solves
> the problem?
>
> __dec_zone_state(zone, NR_ALLOC_BATCH);
> list_del(&page->lru);
> pcp->count--;
>
> There seems no infinit loop issue in order > 0 block below, because bad pages
> are deleted from free list in __rmqueue_smallest().
Ooops, thanks for catching this, wish it was sooner...
----8<----
>From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 18 May 2016 09:41:01 +0200
Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
never removed from the pcp list. Fix this by removing the page before retrying.
Also we don't need to check if page is non-NULL, because we simply grab it from
the list which was just tested for being non-empty.
Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c81e2e7b172..d5b93e5dd697 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2641,11 +2641,12 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
page = list_last_entry(list, struct page, lru);
else
page = list_first_entry(list, struct page, lru);
- } while (page && check_new_pcp(page));
- __dec_zone_state(zone, NR_ALLOC_BATCH);
- list_del(&page->lru);
- pcp->count--;
+ __dec_zone_state(zone, NR_ALLOC_BATCH);
+ list_del(&page->lru);
+ pcp->count--;
+
+ } while (check_new_pcp(page));
} else {
/*
* We most definitely don't want callers attempting to
--
2.8.2
^ permalink raw reply related [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-05-18 7:51 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-05-18 7:51 UTC (permalink / raw)
To: Naoya Horiguchi, Mel Gorman
Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 05/17/2016 08:41 AM, Naoya Horiguchi wrote:
>> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
>> struct list_head *list;
>>
>> local_irq_save(flags);
>> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
>> - list = &pcp->lists[migratetype];
>> - if (list_empty(list)) {
>> - pcp->count += rmqueue_bulk(zone, 0,
>> - pcp->batch, list,
>> - migratetype, cold);
>> - if (unlikely(list_empty(list)))
>> - goto failed;
>> - }
>> + do {
>> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
>> + list = &pcp->lists[migratetype];
>> + if (list_empty(list)) {
>> + pcp->count += rmqueue_bulk(zone, 0,
>> + pcp->batch, list,
>> + migratetype, cold);
>> + if (unlikely(list_empty(list)))
>> + goto failed;
>> + }
>>
>> - if (cold)
>> - page = list_last_entry(list, struct page, lru);
>> - else
>> - page = list_first_entry(list, struct page, lru);
>> + if (cold)
>> + page = list_last_entry(list, struct page, lru);
>> + else
>> + page = list_first_entry(list, struct page, lru);
>> + } while (page && check_new_pcp(page));
>
> This causes infinite loop when check_new_pcp() returns 1, because the bad
> page is still in the list (I assume that a bad page never disappears).
> The original kernel is free from this problem because we do retry after
> list_del(). So moving the following 3 lines into this do-while block solves
> the problem?
>
> __dec_zone_state(zone, NR_ALLOC_BATCH);
> list_del(&page->lru);
> pcp->count--;
>
> There seems no infinit loop issue in order > 0 block below, because bad pages
> are deleted from free list in __rmqueue_smallest().
Ooops, thanks for catching this, wish it was sooner...
----8<----
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-05-18 7:51 ` Vlastimil Babka
@ 2016-05-18 7:55 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-05-18 7:55 UTC (permalink / raw)
To: Naoya Horiguchi, Mel Gorman
Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 05/18/2016 09:51 AM, Vlastimil Babka wrote:
> ----8<----
> From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 18 May 2016 09:41:01 +0200
> Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
>
> In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
> buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
> never removed from the pcp list. Fix this by removing the page before retrying.
> Also we don't need to check if page is non-NULL, because we simply grab it from
> the list which was just tested for being non-empty.
>
> Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
Wrong.
Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-pages-allocated-from-the-pcp.patch
> Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/page_alloc.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8c81e2e7b172..d5b93e5dd697 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2641,11 +2641,12 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> page = list_last_entry(list, struct page, lru);
> else
> page = list_first_entry(list, struct page, lru);
> - } while (page && check_new_pcp(page));
>
> - __dec_zone_state(zone, NR_ALLOC_BATCH);
> - list_del(&page->lru);
> - pcp->count--;
> + __dec_zone_state(zone, NR_ALLOC_BATCH);
> + list_del(&page->lru);
> + pcp->count--;
> +
> + } while (check_new_pcp(page));
> } else {
> /*
> * We most definitely don't want callers attempting to
>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-05-18 7:55 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-05-18 7:55 UTC (permalink / raw)
To: Naoya Horiguchi, Mel Gorman
Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 05/18/2016 09:51 AM, Vlastimil Babka wrote:
> ----8<----
> From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 18 May 2016 09:41:01 +0200
> Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
>
> In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
> buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
> never removed from the pcp list. Fix this by removing the page before retrying.
> Also we don't need to check if page is non-NULL, because we simply grab it from
> the list which was just tested for being non-empty.
>
> Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
Wrong.
Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-pages-allocated-from-the-pcp.patch
> Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/page_alloc.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8c81e2e7b172..d5b93e5dd697 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2641,11 +2641,12 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> page = list_last_entry(list, struct page, lru);
> else
> page = list_first_entry(list, struct page, lru);
> - } while (page && check_new_pcp(page));
>
> - __dec_zone_state(zone, NR_ALLOC_BATCH);
> - list_del(&page->lru);
> - pcp->count--;
> + __dec_zone_state(zone, NR_ALLOC_BATCH);
> + list_del(&page->lru);
> + pcp->count--;
> +
> + } while (check_new_pcp(page));
> } else {
> /*
> * We most definitely don't want callers attempting to
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
2016-05-18 7:51 ` Vlastimil Babka
@ 2016-05-18 8:49 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-05-18 8:49 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Naoya Horiguchi, Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Wed, May 18, 2016 at 09:51:58AM +0200, Vlastimil Babka wrote:
> On 05/17/2016 08:41 AM, Naoya Horiguchi wrote:
> >> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> >> struct list_head *list;
> >>
> >> local_irq_save(flags);
> >> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> - list = &pcp->lists[migratetype];
> >> - if (list_empty(list)) {
> >> - pcp->count += rmqueue_bulk(zone, 0,
> >> - pcp->batch, list,
> >> - migratetype, cold);
> >> - if (unlikely(list_empty(list)))
> >> - goto failed;
> >> - }
> >> + do {
> >> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> + list = &pcp->lists[migratetype];
> >> + if (list_empty(list)) {
> >> + pcp->count += rmqueue_bulk(zone, 0,
> >> + pcp->batch, list,
> >> + migratetype, cold);
> >> + if (unlikely(list_empty(list)))
> >> + goto failed;
> >> + }
> >>
> >> - if (cold)
> >> - page = list_last_entry(list, struct page, lru);
> >> - else
> >> - page = list_first_entry(list, struct page, lru);
> >> + if (cold)
> >> + page = list_last_entry(list, struct page, lru);
> >> + else
> >> + page = list_first_entry(list, struct page, lru);
> >> + } while (page && check_new_pcp(page));
> >
> > This causes infinite loop when check_new_pcp() returns 1, because the bad
> > page is still in the list (I assume that a bad page never disappears).
> > The original kernel is free from this problem because we do retry after
> > list_del(). So moving the following 3 lines into this do-while block solves
> > the problem?
> >
> > __dec_zone_state(zone, NR_ALLOC_BATCH);
> > list_del(&page->lru);
> > pcp->count--;
> >
> > There seems no infinit loop issue in order > 0 block below, because bad pages
> > are deleted from free list in __rmqueue_smallest().
>
> Ooops, thanks for catching this, wish it was sooner...
>
Still not too late fortunately! Thanks Naoya for identifying this and
Vlastimil for fixing it.
> ----8<----
> From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 18 May 2016 09:41:01 +0200
> Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
>
> In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
> buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
> never removed from the pcp list. Fix this by removing the page before retrying.
> Also we don't need to check if page is non-NULL, because we simply grab it from
> the list which was just tested for being non-empty.
>
> Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
> Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP
@ 2016-05-18 8:49 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-05-18 8:49 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Naoya Horiguchi, Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Wed, May 18, 2016 at 09:51:58AM +0200, Vlastimil Babka wrote:
> On 05/17/2016 08:41 AM, Naoya Horiguchi wrote:
> >> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> >> struct list_head *list;
> >>
> >> local_irq_save(flags);
> >> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> - list = &pcp->lists[migratetype];
> >> - if (list_empty(list)) {
> >> - pcp->count += rmqueue_bulk(zone, 0,
> >> - pcp->batch, list,
> >> - migratetype, cold);
> >> - if (unlikely(list_empty(list)))
> >> - goto failed;
> >> - }
> >> + do {
> >> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> + list = &pcp->lists[migratetype];
> >> + if (list_empty(list)) {
> >> + pcp->count += rmqueue_bulk(zone, 0,
> >> + pcp->batch, list,
> >> + migratetype, cold);
> >> + if (unlikely(list_empty(list)))
> >> + goto failed;
> >> + }
> >>
> >> - if (cold)
> >> - page = list_last_entry(list, struct page, lru);
> >> - else
> >> - page = list_first_entry(list, struct page, lru);
> >> + if (cold)
> >> + page = list_last_entry(list, struct page, lru);
> >> + else
> >> + page = list_first_entry(list, struct page, lru);
> >> + } while (page && check_new_pcp(page));
> >
> > This causes infinite loop when check_new_pcp() returns 1, because the bad
> > page is still in the list (I assume that a bad page never disappears).
> > The original kernel is free from this problem because we do retry after
> > list_del(). So moving the following 3 lines into this do-while block solves
> > the problem?
> >
> > __dec_zone_state(zone, NR_ALLOC_BATCH);
> > list_del(&page->lru);
> > pcp->count--;
> >
> > There seems no infinit loop issue in order > 0 block below, because bad pages
> > are deleted from free list in __rmqueue_smallest().
>
> Ooops, thanks for catching this, wish it was sooner...
>
Still not too late fortunately! Thanks Naoya for identifying this and
Vlastimil for fixing it.
> ----8<----
> From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 18 May 2016 09:41:01 +0200
> Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
>
> In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
> buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
> never removed from the pcp list. Fix this by removing the page before retrying.
> Also we don't need to check if page is non-NULL, because we simply grab it from
> the list which was just tested for being non-empty.
>
> Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
> Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
2016-04-15 9:07 ` Mel Gorman
@ 2016-04-26 12:04 ` Vlastimil Babka
-1 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 12:04 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> A check is made for an empty zonelist early in the page allocator fast path
> but it's unnecessary. When get_page_from_freelist() is called, it'll return
> NULL immediately. Removing the first check is slower for machines with
> memoryless nodes but that is a corner case that can live with the overhead.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index df03ccc7f07c..21aaef6ddd7a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> if (should_fail_alloc_page(gfp_mask, order))
> return NULL;
>
> - /*
> - * Check the zones suitable for the gfp_mask contain at least one
> - * valid zone. It's possible to have an empty zonelist as a result
> - * of __GFP_THISNODE and a memoryless node
> - */
> - if (unlikely(!zonelist->_zonerefs->zone))
> - return NULL;
> -
> if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
> alloc_flags |= ALLOC_CMA;
>
> @@ -3394,8 +3386,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> /* The preferred zone is used for statistics later */
> preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
> ac.nodemask, &ac.preferred_zone);
> - if (!ac.preferred_zone)
> - goto out;
Is this part really safe? Besides changelog doesn't mention preferred_zone. What
if somebody attempts e.g. a DMA allocation with ac.nodemask being set to
cpuset_current_mems_allowed and initially only containing nodes without
ZONE_DMA. Then ac.preferred_zone is NULL, yet we proceed to
get_page_from_freelist(). Meanwhile cpuset_current_mems_allowed gets changed so
in fact it does contains a suitable node, so we manage to get inside
for_each_zone_zonelist_nodemask(). Then there's zone_local(ac->preferred_zone,
zone), which will defererence the NULL ac->preferred_zone?
> ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> @@ -3418,7 +3408,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>
> trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
>
> -out:
> /*
> * When updating a task's mems_allowed, it is possible to race with
> * parallel threads in such a way that an allocation can fail while
>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
@ 2016-04-26 12:04 ` Vlastimil Babka
0 siblings, 0 replies; 160+ messages in thread
From: Vlastimil Babka @ 2016-04-26 12:04 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/15/2016 11:07 AM, Mel Gorman wrote:
> A check is made for an empty zonelist early in the page allocator fast path
> but it's unnecessary. When get_page_from_freelist() is called, it'll return
> NULL immediately. Removing the first check is slower for machines with
> memoryless nodes but that is a corner case that can live with the overhead.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index df03ccc7f07c..21aaef6ddd7a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> if (should_fail_alloc_page(gfp_mask, order))
> return NULL;
>
> - /*
> - * Check the zones suitable for the gfp_mask contain at least one
> - * valid zone. It's possible to have an empty zonelist as a result
> - * of __GFP_THISNODE and a memoryless node
> - */
> - if (unlikely(!zonelist->_zonerefs->zone))
> - return NULL;
> -
> if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
> alloc_flags |= ALLOC_CMA;
>
> @@ -3394,8 +3386,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> /* The preferred zone is used for statistics later */
> preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
> ac.nodemask, &ac.preferred_zone);
> - if (!ac.preferred_zone)
> - goto out;
Is this part really safe? Besides changelog doesn't mention preferred_zone. What
if somebody attempts e.g. a DMA allocation with ac.nodemask being set to
cpuset_current_mems_allowed and initially only containing nodes without
ZONE_DMA. Then ac.preferred_zone is NULL, yet we proceed to
get_page_from_freelist(). Meanwhile cpuset_current_mems_allowed gets changed so
in fact it does contains a suitable node, so we manage to get inside
for_each_zone_zonelist_nodemask(). Then there's zone_local(ac->preferred_zone,
zone), which will defererence the NULL ac->preferred_zone?
> ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
>
> /* First allocation attempt */
> @@ -3418,7 +3408,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>
> trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
>
> -out:
> /*
> * When updating a task's mems_allowed, it is possible to race with
> * parallel threads in such a way that an allocation can fail while
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
2016-04-26 12:04 ` Vlastimil Babka
@ 2016-04-26 13:00 ` Mel Gorman
-1 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-26 13:00 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 02:04:51PM +0200, Vlastimil Babka wrote:
> On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >A check is made for an empty zonelist early in the page allocator fast path
> >but it's unnecessary. When get_page_from_freelist() is called, it'll return
> >NULL immediately. Removing the first check is slower for machines with
> >memoryless nodes but that is a corner case that can live with the overhead.
> >
> >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >---
> > mm/page_alloc.c | 11 -----------
> > 1 file changed, 11 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index df03ccc7f07c..21aaef6ddd7a 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > if (should_fail_alloc_page(gfp_mask, order))
> > return NULL;
> >
> >- /*
> >- * Check the zones suitable for the gfp_mask contain at least one
> >- * valid zone. It's possible to have an empty zonelist as a result
> >- * of __GFP_THISNODE and a memoryless node
> >- */
> >- if (unlikely(!zonelist->_zonerefs->zone))
> >- return NULL;
> >-
> > if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
> > alloc_flags |= ALLOC_CMA;
> >
> >@@ -3394,8 +3386,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > /* The preferred zone is used for statistics later */
> > preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
> > ac.nodemask, &ac.preferred_zone);
> >- if (!ac.preferred_zone)
> >- goto out;
>
> Is this part really safe? Besides changelog doesn't mention preferred_zone.
> What if somebody attempts e.g. a DMA allocation with ac.nodemask being set
> to cpuset_current_mems_allowed and initially only containing nodes without
> ZONE_DMA. Then ac.preferred_zone is NULL, yet we proceed to
> get_page_from_freelist(). Meanwhile cpuset_current_mems_allowed gets changed
> so in fact it does contains a suitable node, so we manage to get inside
> for_each_zone_zonelist_nodemask(). Then there's
> zone_local(ac->preferred_zone, zone), which will defererence the NULL
> ac->preferred_zone?
>
You're right, this is a potential problem. I thought of a few solutions
but they're not necessarily cheaper than the current code. If Andrew is
watching, please drop this patch if possible. Otherwise, I'll post a revert
within the next 2 days and find an alternative solution that still saves
cycles.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
@ 2016-04-26 13:00 ` Mel Gorman
0 siblings, 0 replies; 160+ messages in thread
From: Mel Gorman @ 2016-04-26 13:00 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, Apr 26, 2016 at 02:04:51PM +0200, Vlastimil Babka wrote:
> On 04/15/2016 11:07 AM, Mel Gorman wrote:
> >A check is made for an empty zonelist early in the page allocator fast path
> >but it's unnecessary. When get_page_from_freelist() is called, it'll return
> >NULL immediately. Removing the first check is slower for machines with
> >memoryless nodes but that is a corner case that can live with the overhead.
> >
> >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >---
> > mm/page_alloc.c | 11 -----------
> > 1 file changed, 11 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index df03ccc7f07c..21aaef6ddd7a 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > if (should_fail_alloc_page(gfp_mask, order))
> > return NULL;
> >
> >- /*
> >- * Check the zones suitable for the gfp_mask contain at least one
> >- * valid zone. It's possible to have an empty zonelist as a result
> >- * of __GFP_THISNODE and a memoryless node
> >- */
> >- if (unlikely(!zonelist->_zonerefs->zone))
> >- return NULL;
> >-
> > if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
> > alloc_flags |= ALLOC_CMA;
> >
> >@@ -3394,8 +3386,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > /* The preferred zone is used for statistics later */
> > preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
> > ac.nodemask, &ac.preferred_zone);
> >- if (!ac.preferred_zone)
> >- goto out;
>
> Is this part really safe? Besides changelog doesn't mention preferred_zone.
> What if somebody attempts e.g. a DMA allocation with ac.nodemask being set
> to cpuset_current_mems_allowed and initially only containing nodes without
> ZONE_DMA. Then ac.preferred_zone is NULL, yet we proceed to
> get_page_from_freelist(). Meanwhile cpuset_current_mems_allowed gets changed
> so in fact it does contains a suitable node, so we manage to get inside
> for_each_zone_zonelist_nodemask(). Then there's
> zone_local(ac->preferred_zone, zone), which will defererence the NULL
> ac->preferred_zone?
>
You're right, this is a potential problem. I thought of a few solutions
but they're not necessarily cheaper than the current code. If Andrew is
watching, please drop this patch if possible. Otherwise, I'll post a revert
within the next 2 days and find an alternative solution that still saves
cycles.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
2016-04-26 13:00 ` Mel Gorman
@ 2016-04-26 19:11 ` Andrew Morton
-1 siblings, 0 replies; 160+ messages in thread
From: Andrew Morton @ 2016-04-26 19:11 UTC (permalink / raw)
To: Mel Gorman; +Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, 26 Apr 2016 14:00:11 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> If Andrew is watching, please drop this patch if possible.
Thud.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist
@ 2016-04-26 19:11 ` Andrew Morton
0 siblings, 0 replies; 160+ messages in thread
From: Andrew Morton @ 2016-04-26 19:11 UTC (permalink / raw)
To: Mel Gorman; +Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML
On Tue, 26 Apr 2016 14:00:11 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> If Andrew is watching, please drop this patch if possible.
Thud.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 160+ messages in thread