linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm, page_alloc: capture page in task context only
@ 2020-06-10 20:48 Hugh Dickins
  2020-06-11 15:43 ` Mel Gorman
  2020-06-12 10:30 ` Vlastimil Babka
  0 siblings, 2 replies; 11+ messages in thread
From: Hugh Dickins @ 2020-06-10 20:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Vlastimil Babka, Li Wang, Alex Shi, linux-kernel,
	linux-mm

While stressing compaction, one run oopsed on NULL capc->cc in
__free_one_page()'s task_capc(zone): compact_zone_order() had been
interrupted, and a page was being freed in the return from interrupt.

Though you would not expect it from the source, both gccs I was using
(a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with
the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
callq compact_zone - long after the "current->capture_control = &capc".
An interrupt in between those finds capc->cc NULL (zeroed by an earlier
rep stos).

This could presumably be fixed by a barrier() before setting
current->capture_control in compact_zone_order(); but would also need
more care on return from compact_zone(), in order not to risk leaking
a page captured by interrupt just before capture_control is reset.

Maybe that is the preferable fix, but I felt safer for task_capc() to
exclude the rather surprising possibility of capture at interrupt time.

Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/page_alloc.c |    1 +
 1 file changed, 1 insertion(+)

--- 5.8-rc0/mm/page_alloc.c	2020-06-08 14:38:47.298625588 -0700
+++ linux/mm/page_alloc.c	2020-06-10 12:12:34.982950441 -0700
@@ -814,6 +814,7 @@ static inline struct capture_control *ta
 	struct capture_control *capc = current->capture_control;
 
 	return capc &&
+		in_task() &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
 		capc->cc->zone == zone &&

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm, page_alloc: capture page in task context only
  2020-06-10 20:48 [PATCH] mm, page_alloc: capture page in task context only Hugh Dickins
@ 2020-06-11 15:43 ` Mel Gorman
  2020-06-12 10:30 ` Vlastimil Babka
  1 sibling, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2020-06-11 15:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Vlastimil Babka, Li Wang, Alex Shi, linux-kernel,
	linux-mm

On Wed, Jun 10, 2020 at 01:48:59PM -0700, Hugh Dickins wrote:
> While stressing compaction, one run oopsed on NULL capc->cc in
> __free_one_page()'s task_capc(zone): compact_zone_order() had been
> interrupted, and a page was being freed in the return from interrupt.
> 
> Though you would not expect it from the source, both gccs I was using
> (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with
> the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
> callq compact_zone - long after the "current->capture_control = &capc".
> An interrupt in between those finds capc->cc NULL (zeroed by an earlier
> rep stos).
> 
> This could presumably be fixed by a barrier() before setting
> current->capture_control in compact_zone_order(); but would also need
> more care on return from compact_zone(), in order not to risk leaking
> a page captured by interrupt just before capture_control is reset.
> 
> Maybe that is the preferable fix, but I felt safer for task_capc() to
> exclude the rather surprising possibility of capture at interrupt time.
> 
> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
> Cc: stable@vger.kernel.org # 5.1+
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm, page_alloc: capture page in task context only
  2020-06-10 20:48 [PATCH] mm, page_alloc: capture page in task context only Hugh Dickins
  2020-06-11 15:43 ` Mel Gorman
@ 2020-06-12 10:30 ` Vlastimil Babka
  2020-06-15 21:03   ` Hugh Dickins
  1 sibling, 1 reply; 11+ messages in thread
From: Vlastimil Babka @ 2020-06-12 10:30 UTC (permalink / raw)
  To: Hugh Dickins, Mel Gorman
  Cc: Andrew Morton, Li Wang, Alex Shi, linux-kernel, linux-mm

On 6/10/20 10:48 PM, Hugh Dickins wrote:
> While stressing compaction, one run oopsed on NULL capc->cc in
> __free_one_page()'s task_capc(zone): compact_zone_order() had been
> interrupted, and a page was being freed in the return from interrupt.
> 
> Though you would not expect it from the source, both gccs I was using
> (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with
> the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
> callq compact_zone - long after the "current->capture_control = &capc".
> An interrupt in between those finds capc->cc NULL (zeroed by an earlier
> rep stos).

Ugh, nasty. Same here with gcc 10.

> This could presumably be fixed by a barrier() before setting
> current->capture_control in compact_zone_order(); but would also need
> more care on return from compact_zone(), in order not to risk leaking
> a page captured by interrupt just before capture_control is reset.

I was hoping a WRITE_ONCE(current->capture_control) would be enough,
but apparently it's not (I tried).

> Maybe that is the preferable fix, but I felt safer for task_capc() to
> exclude the rather surprising possibility of capture at interrupt time.

> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
> Cc: stable@vger.kernel.org # 5.1+
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

But perhaps I would also make sure that we don't expose the half initialized
capture_control and run into this problem again later. It's not like this is a
fast path where barriers hurt. Something like this then? (with added comments)

diff --git a/mm/compaction.c b/mm/compaction.c
index fd988b7e5f2b..c89e26817278 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2316,15 +2316,17 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.page = NULL,
 	};
 
-	current->capture_control = &capc;
+	barrier();
+
+	WRITE_ONCE(current->capture_control, &capc);
 
 	ret = compact_zone(&cc, &capc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
 
-	*capture = capc.page;
-	current->capture_control = NULL;
+	WRITE_ONCE(current->capture_control, NULL);
+	*capture = READ_ONCE(capc.page);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm, page_alloc: capture page in task context only
  2020-06-12 10:30 ` Vlastimil Babka
@ 2020-06-15 21:03   ` Hugh Dickins
  2020-06-16  7:45     ` Vlastimil Babka
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-06-15 21:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Mel Gorman, Andrew Morton, Li Wang, Alex Shi,
	linux-kernel, linux-mm

On Fri, 12 Jun 2020, Vlastimil Babka wrote:
> On 6/10/20 10:48 PM, Hugh Dickins wrote:
> > While stressing compaction, one run oopsed on NULL capc->cc in
> > __free_one_page()'s task_capc(zone): compact_zone_order() had been
> > interrupted, and a page was being freed in the return from interrupt.
> > 
> > Though you would not expect it from the source, both gccs I was using
> > (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with
> > the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
> > callq compact_zone - long after the "current->capture_control = &capc".
> > An interrupt in between those finds capc->cc NULL (zeroed by an earlier
> > rep stos).
> 
> Ugh, nasty. Same here with gcc 10.

Thanks for checking, nice to know that I'm in good company :)

> 
> > This could presumably be fixed by a barrier() before setting
> > current->capture_control in compact_zone_order(); but would also need
> > more care on return from compact_zone(), in order not to risk leaking
> > a page captured by interrupt just before capture_control is reset.
> 
> I was hoping a WRITE_ONCE(current->capture_control) would be enough,
> but apparently it's not (I tried).

Right, I don't think volatiles themselves actually constitute barriers;
but I'd better keep quiet, I notice the READ_ONCE/WRITE_ONCE/data_race
industry has been busy recently, and I'm likely out-of-date and mistaken.

> 
> > Maybe that is the preferable fix, but I felt safer for task_capc() to
> > exclude the rather surprising possibility of capture at interrupt time.
> 
> > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
> > Cc: stable@vger.kernel.org # 5.1+
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks, and to Mel for his.

> 
> But perhaps I would also make sure that we don't expose the half initialized
> capture_control and run into this problem again later. It's not like this is a
> fast path where barriers hurt. Something like this then? (with added comments)

Would it be very rude if I leave that to you and to Mel? to add, or
to replace mine if you wish - go ahead.  I can easily see that more
sophistication at the compact_zone_order() end may be preferable to
another test and branch inside __free_one_page() (and would task_capc()
be better with an "unlikely" in it?).

But it seems unnecessary to have a fix at both ends, and I'm rather too
wound up in other things at the moment, to want to read up on the current
state of such barriers, and sign off on the Vlastipatch below myself (but
I do notice that READ_ONCE seems to have more in it today than I remember,
which probably accounts for why you did not put the barrier() I expected
to see on the way out).

Hugh

> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fd988b7e5f2b..c89e26817278 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2316,15 +2316,17 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.page = NULL,
>  	};
>  
> -	current->capture_control = &capc;
> +	barrier();
> +
> +	WRITE_ONCE(current->capture_control, &capc);
>  
>  	ret = compact_zone(&cc, &capc);
>  
>  	VM_BUG_ON(!list_empty(&cc.freepages));
>  	VM_BUG_ON(!list_empty(&cc.migratepages));
>  
> -	*capture = capc.page;
> -	current->capture_control = NULL;
> +	WRITE_ONCE(current->capture_control, NULL);
> +	*capture = READ_ONCE(capc.page);
>  
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm, page_alloc: capture page in task context only
  2020-06-15 21:03   ` Hugh Dickins
@ 2020-06-16  7:45     ` Vlastimil Babka
  2020-06-16  8:26       ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Vlastimil Babka
  0 siblings, 1 reply; 11+ messages in thread
From: Vlastimil Babka @ 2020-06-16  7:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Mel Gorman, Andrew Morton, Li Wang, Alex Shi, linux-kernel, linux-mm

On 6/15/20 11:03 PM, Hugh Dickins wrote:
> On Fri, 12 Jun 2020, Vlastimil Babka wrote:
>> > This could presumably be fixed by a barrier() before setting
>> > current->capture_control in compact_zone_order(); but would also need
>> > more care on return from compact_zone(), in order not to risk leaking
>> > a page captured by interrupt just before capture_control is reset.
>> 
>> I was hoping a WRITE_ONCE(current->capture_control) would be enough,
>> but apparently it's not (I tried).
> 
> Right, I don't think volatiles themselves actually constitute barriers;
> but I'd better keep quiet, I notice the READ_ONCE/WRITE_ONCE/data_race
> industry has been busy recently, and I'm likely out-of-date and mistaken.

Same here, but from what I've read, volatiles should enforce order against other
volatiles, but not non-volatiles (which is the struct initialization). So
barrier() is indeed necessary, and WRITE_ONCE just to prevent (very
hypothetical, hopefully) store tearing.

>> 
>> > Maybe that is the preferable fix, but I felt safer for task_capc() to
>> > exclude the rather surprising possibility of capture at interrupt time.
>> 
>> > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
>> > Cc: stable@vger.kernel.org # 5.1+
>> > Signed-off-by: Hugh Dickins <hughd@google.com>
>> 
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Thanks, and to Mel for his.
> 
>> 
>> But perhaps I would also make sure that we don't expose the half initialized
>> capture_control and run into this problem again later. It's not like this is a
>> fast path where barriers hurt. Something like this then? (with added comments)
> 
> Would it be very rude if I leave that to you and to Mel? to add, or

No problem.

> to replace mine if you wish - go ahead.  I can easily see that more
> sophistication at the compact_zone_order() end may be preferable to
> another test and branch inside __free_one_page()

Right, I think so, and will also generally sleep better if we don't put pointers
to unitialized structures to current.

> (and would task_capc()
> be better with an "unlikely" in it?).

I'll try and see if it generates better code. We should be also able to remove
the "capc->cc->direct_compaction" check, as the only place where we set capc is
compact_zone_order() which sets direct_compaction true unconditionally.

> But it seems unnecessary to have a fix at both ends, and I'm rather too
> wound up in other things at the moment, to want to read up on the current
> state of such barriers, and sign off on the Vlastipatch below myself (but
> I do notice that READ_ONCE seems to have more in it today than I remember,
> which probably accounts for why you did not put the barrier() I expected
> to see on the way out).

Right, minimally it's a volatile cast (I've checked 5.1 too, for stable reasons)
which should be enough.

So I'll send the proper patch.

Thanks!
Vlastimil

> Hugh
> 
>> 
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index fd988b7e5f2b..c89e26817278 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2316,15 +2316,17 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>>  		.page = NULL,
>>  	};
>>  
>> -	current->capture_control = &capc;
>> +	barrier();
>> +
>> +	WRITE_ONCE(current->capture_control, &capc);
>>  
>>  	ret = compact_zone(&cc, &capc);
>>  
>>  	VM_BUG_ON(!list_empty(&cc.freepages));
>>  	VM_BUG_ON(!list_empty(&cc.migratepages));
>>  
>> -	*capture = capc.page;
>> -	current->capture_control = NULL;
>> +	WRITE_ONCE(current->capture_control, NULL);
>> +	*capture = READ_ONCE(capc.page);
>>  
>>  	return ret;
>>  }
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts
  2020-06-16  7:45     ` Vlastimil Babka
@ 2020-06-16  8:26       ` Vlastimil Babka
  2020-06-16  8:26         ` [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc() Vlastimil Babka
  2020-06-16 20:18         ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Hugh Dickins
  0 siblings, 2 replies; 11+ messages in thread
From: Vlastimil Babka @ 2020-06-16  8:26 UTC (permalink / raw)
  To: vbabka
  Cc: akpm, alex.shi, hughd, linux-kernel, linux-mm, liwang, mgorman, stable

Hugh reports:

=====
While stressing compaction, one run oopsed on NULL capc->cc in
__free_one_page()'s task_capc(zone): compact_zone_order() had been
interrupted, and a page was being freed in the return from interrupt.

Though you would not expect it from the source, both gccs I was using
(a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with
the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
callq compact_zone - long after the "current->capture_control = &capc".
An interrupt in between those finds capc->cc NULL (zeroed by an earlier
rep stos).

This could presumably be fixed by a barrier() before setting
current->capture_control in compact_zone_order(); but would also need
more care on return from compact_zone(), in order not to risk leaking
a page captured by interrupt just before capture_control is reset.

Maybe that is the preferable fix, but I felt safer for task_capc() to
exclude the rather surprising possibility of capture at interrupt time.
=====

I have checked that gcc10 also behaves the same.

The advantage of fix in compact_zone_order() is that we don't add another
test in the page freeing hot path, and that it might prevent future problems
if we stop exposing pointers to unitialized structures in current task.

So this patch implements the suggestion for compact_zone_order() with barrier()
(and WRITE_ONCE() to prevent store tearing) for setting
current->capture_control, and prevents page leaking with WRITE_ONCE/READ_ONCE
in the proper order.

Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Cc: stable@vger.kernel.org # 5.1+
Reported-by: Hugh Dickins <hughd@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index fd988b7e5f2b..86375605faa9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2316,15 +2316,26 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.page = NULL,
 	};
 
-	current->capture_control = &capc;
+	/*
+	 * Make sure the structs are really initialized before we expose the
+	 * capture control, in case we are interrupted and the interrupt handler
+	 * frees a page.
+	 */
+	barrier();
+	WRITE_ONCE(current->capture_control, &capc);
 
 	ret = compact_zone(&cc, &capc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
 
-	*capture = capc.page;
-	current->capture_control = NULL;
+	/*
+	 * Make sure we hide capture control first before we read the captured
+	 * page pointer, otherwise an interrupt could free and capture a page
+	 * and we would leak it.
+	 */
+	WRITE_ONCE(current->capture_control, NULL);
+	*capture = READ_ONCE(capc.page);
 
 	return ret;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc()
  2020-06-16  8:26       ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Vlastimil Babka
@ 2020-06-16  8:26         ` Vlastimil Babka
  2020-06-16 20:29           ` Hugh Dickins
  2020-06-16 20:18         ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Hugh Dickins
  1 sibling, 1 reply; 11+ messages in thread
From: Vlastimil Babka @ 2020-06-16  8:26 UTC (permalink / raw)
  To: vbabka; +Cc: akpm, alex.shi, hughd, linux-kernel, linux-mm, liwang, mgorman

Hugh noted that task_capc() could use unlikely(), as most of the time there is
no capture in progress and we are in page freeing hot path. Indeed adding
unlikely() redirects produces assembly that better matches the assumption and
moves all the tests away from the hot path.

I have also noticed that we don't need to test for cc->direct_compaction as the
only place we set current->task_capture is compact_zone_order() which also
always sets cc->direct_compaction true.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48eb0f1410d4..8a4e342d7e8f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -813,11 +813,10 @@ static inline struct capture_control *task_capc(struct zone *zone)
 {
 	struct capture_control *capc = current->capture_control;
 
-	return capc &&
+	return unlikely(capc &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
-		capc->cc->zone == zone &&
-		capc->cc->direct_compaction ? capc : NULL;
+		capc->cc->zone == zone) ? capc : NULL;
 }
 
 static inline bool
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts
  2020-06-16  8:26       ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Vlastimil Babka
  2020-06-16  8:26         ` [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc() Vlastimil Babka
@ 2020-06-16 20:18         ` Hugh Dickins
  1 sibling, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2020-06-16 20:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, alex.shi, hughd, linux-kernel, linux-mm, liwang, mgorman, stable

On Tue, 16 Jun 2020, Vlastimil Babka wrote:

> Hugh reports:
> 
> =====
> While stressing compaction, one run oopsed on NULL capc->cc in
> __free_one_page()'s task_capc(zone): compact_zone_order() had been
> interrupted, and a page was being freed in the return from interrupt.
> 
> Though you would not expect it from the source, both gccs I was using
> (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with
> the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
> callq compact_zone - long after the "current->capture_control = &capc".
> An interrupt in between those finds capc->cc NULL (zeroed by an earlier
> rep stos).
> 
> This could presumably be fixed by a barrier() before setting
> current->capture_control in compact_zone_order(); but would also need
> more care on return from compact_zone(), in order not to risk leaking
> a page captured by interrupt just before capture_control is reset.
> 
> Maybe that is the preferable fix, but I felt safer for task_capc() to
> exclude the rather surprising possibility of capture at interrupt time.
> =====
> 
> I have checked that gcc10 also behaves the same.
> 
> The advantage of fix in compact_zone_order() is that we don't add another
> test in the page freeing hot path, and that it might prevent future problems
> if we stop exposing pointers to unitialized structures in current task.
> 
> So this patch implements the suggestion for compact_zone_order() with barrier()
> (and WRITE_ONCE() to prevent store tearing) for setting
> current->capture_control, and prevents page leaking with WRITE_ONCE/READ_ONCE
> in the proper order.
> 
> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
> Cc: stable@vger.kernel.org # 5.1+
> Reported-by: Hugh Dickins <hughd@google.com>
> Suggested-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  mm/compaction.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fd988b7e5f2b..86375605faa9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2316,15 +2316,26 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.page = NULL,
>  	};
>  
> -	current->capture_control = &capc;
> +	/*
> +	 * Make sure the structs are really initialized before we expose the
> +	 * capture control, in case we are interrupted and the interrupt handler
> +	 * frees a page.
> +	 */
> +	barrier();
> +	WRITE_ONCE(current->capture_control, &capc);
>  
>  	ret = compact_zone(&cc, &capc);
>  
>  	VM_BUG_ON(!list_empty(&cc.freepages));
>  	VM_BUG_ON(!list_empty(&cc.migratepages));
>  
> -	*capture = capc.page;
> -	current->capture_control = NULL;
> +	/*
> +	 * Make sure we hide capture control first before we read the captured
> +	 * page pointer, otherwise an interrupt could free and capture a page
> +	 * and we would leak it.
> +	 */
> +	WRITE_ONCE(current->capture_control, NULL);
> +	*capture = READ_ONCE(capc.page);
>  
>  	return ret;
>  }
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc()
  2020-06-16  8:26         ` [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc() Vlastimil Babka
@ 2020-06-16 20:29           ` Hugh Dickins
  2020-06-17  9:55             ` Vlastimil Babka
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-06-16 20:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, alex.shi, hughd, linux-kernel, linux-mm, liwang, mgorman

On Tue, 16 Jun 2020, Vlastimil Babka wrote:

> Hugh noted that task_capc() could use unlikely(), as most of the time there is
> no capture in progress and we are in page freeing hot path. Indeed adding
> unlikely() redirects produces assembly that better matches the assumption and
> moves all the tests away from the hot path.
> 
> I have also noticed that we don't need to test for cc->direct_compaction as the
> only place we set current->task_capture is compact_zone_order() which also
> always sets cc->direct_compaction true.
> 
> Suggested-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Hugh Dickins <hughd@googlecom>

Thanks for pursuing these, Vlastimil: I'm glad you were able
to remove a test and branch instead of adding one as I had.

One little thing, you've probably gone into this yourself and know
what you've written here is optimal: but I'd rather imagined it with
"unlikely(capc) && ..." instead of "unlikely(capc && ...)" - no need
to respond, please just give it a moment's consideration, Acked anyway.

> ---
>  mm/page_alloc.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48eb0f1410d4..8a4e342d7e8f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -813,11 +813,10 @@ static inline struct capture_control *task_capc(struct zone *zone)
>  {
>  	struct capture_control *capc = current->capture_control;
>  
> -	return capc &&
> +	return unlikely(capc &&
>  		!(current->flags & PF_KTHREAD) &&
>  		!capc->page &&
> -		capc->cc->zone == zone &&
> -		capc->cc->direct_compaction ? capc : NULL;
> +		capc->cc->zone == zone) ? capc : NULL;
>  }
>  
>  static inline bool
> -- 
> 2.27.0
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc()
  2020-06-16 20:29           ` Hugh Dickins
@ 2020-06-17  9:55             ` Vlastimil Babka
  2020-06-22  8:58               ` Mel Gorman
  0 siblings, 1 reply; 11+ messages in thread
From: Vlastimil Babka @ 2020-06-17  9:55 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: akpm, alex.shi, linux-kernel, linux-mm, liwang, mgorman


On 6/16/20 10:29 PM, Hugh Dickins wrote:
> On Tue, 16 Jun 2020, Vlastimil Babka wrote:
> 
>> Hugh noted that task_capc() could use unlikely(), as most of the time there is
>> no capture in progress and we are in page freeing hot path. Indeed adding
>> unlikely() redirects produces assembly that better matches the assumption and
>> moves all the tests away from the hot path.
>> 
>> I have also noticed that we don't need to test for cc->direct_compaction as the
>> only place we set current->task_capture is compact_zone_order() which also
>> always sets cc->direct_compaction true.
>> 
>> Suggested-by: Hugh Dickins <hughd@google.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Acked-by: Hugh Dickins <hughd@googlecom>

Thanks.

> Thanks for pursuing these, Vlastimil: I'm glad you were able
> to remove a test and branch instead of adding one as I had.
> 
> One little thing, you've probably gone into this yourself and know
> what you've written here is optimal: but I'd rather imagined it with
> "unlikely(capc) && ..." instead of "unlikely(capc && ...)" - no need
> to respond, please just give it a moment's consideration, Acked anyway.

It makes no difference, at least on my gcc10 which seems to be smart enough to
do the right thing. But yeah, your suggestion is more readable and precise and
maybe can work better with a less smart compiler. Thanks.

----8<----
From 615eea6f6abe288ffb708aa0d1bdfbeaf30a4cbd Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 16 Jun 2020 10:14:47 +0200
Subject: [PATCH] mm, page_alloc: use unlikely() in task_capc()

Hugh noted that task_capc() could use unlikely(), as most of the time there is
no capture in progress and we are in page freeing hot path. Indeed adding
unlikely() produces assembly that better matches the assumption and moves
all the tests away from the hot path.

I have also noticed that we don't need to test for cc->direct_compaction as the
only place we set current->task_capture is compact_zone_order() which also
always sets cc->direct_compaction true.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hugh Dickins <hughd@googlecom>
---
 mm/page_alloc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48eb0f1410d4..18d5aed3f97b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -813,11 +813,10 @@ static inline struct capture_control *task_capc(struct zone *zone)
 {
 	struct capture_control *capc = current->capture_control;
 
-	return capc &&
+	return unlikely(capc) &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
-		capc->cc->zone == zone &&
-		capc->cc->direct_compaction ? capc : NULL;
+		capc->cc->zone == zone ? capc : NULL;
 }
 
 static inline bool
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc()
  2020-06-17  9:55             ` Vlastimil Babka
@ 2020-06-22  8:58               ` Mel Gorman
  0 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2020-06-22  8:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, akpm, alex.shi, linux-kernel, linux-mm, liwang

On Wed, Jun 17, 2020 at 11:55:07AM +0200, Vlastimil Babka wrote:
> <SNIP>
>
> It makes no difference, at least on my gcc10 which seems to be smart enough to
> do the right thing. But yeah, your suggestion is more readable and precise and
> maybe can work better with a less smart compiler. Thanks.
> 
> ----8<----
> From 615eea6f6abe288ffb708aa0d1bdfbeaf30a4cbd Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 16 Jun 2020 10:14:47 +0200
> Subject: [PATCH] mm, page_alloc: use unlikely() in task_capc()
> 
> Hugh noted that task_capc() could use unlikely(), as most of the time there is
> no capture in progress and we are in page freeing hot path. Indeed adding
> unlikely() produces assembly that better matches the assumption and moves
> all the tests away from the hot path.
> 
> I have also noticed that we don't need to test for cc->direct_compaction as the
> only place we set current->task_capture is compact_zone_order() which also
> always sets cc->direct_compaction true.
> 
> Suggested-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Hugh Dickins <hughd@googlecom>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-06-22  8:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10 20:48 [PATCH] mm, page_alloc: capture page in task context only Hugh Dickins
2020-06-11 15:43 ` Mel Gorman
2020-06-12 10:30 ` Vlastimil Babka
2020-06-15 21:03   ` Hugh Dickins
2020-06-16  7:45     ` Vlastimil Babka
2020-06-16  8:26       ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Vlastimil Babka
2020-06-16  8:26         ` [PATCH 2/2] mm, page_alloc: use unlikely() in task_capc() Vlastimil Babka
2020-06-16 20:29           ` Hugh Dickins
2020-06-17  9:55             ` Vlastimil Babka
2020-06-22  8:58               ` Mel Gorman
2020-06-16 20:18         ` [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).