[RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size
@ 2020-09-14 10:06 mateusznosek0
  2020-09-14 14:22 ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: mateusznosek0 @ 2020-09-14 10:06 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mateusz Nosek, akpm

From: Mateusz Nosek <mateusznosek0@gmail.com>

Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold.
Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom'
to reduce critical section size and improve overall system performance.

Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
---
 mm/page_alloc.c | 45 ++++++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9bd75cacf02..b07f950a5825 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		.order = order,
 	};
 	struct page *page;
-
-	*did_some_progress = 0;
-
-	/*
-	 * Acquire the oom lock.  If that fails, somebody else is
-	 * making progress for us.
-	 */
-	if (!mutex_trylock(&oom_lock)) {
-		*did_some_progress = 1;
-		schedule_timeout_uninterruptible(1);
-		return NULL;
-	}
+	bool success;
 
 	/*
 	 * Go through the zonelist yet one more time, keep very high watermark
@@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 				      ~__GFP_DIRECT_RECLAIM, order,
 				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
 	if (page)
-		goto out;
+		return page;
+
+	/* Check if somebody else is making progress for us. */
+	*did_some_progress = mutex_is_locked(&oom_lock);
 
 	/* Coredumps can quickly deplete all memory reserves */
 	if (current->flags & PF_DUMPCORE)
-		goto out;
+		return NULL;
 	/* The OOM killer will not help higher order allocs */
 	if (order > PAGE_ALLOC_COSTLY_ORDER)
-		goto out;
+		return NULL;
 	/*
 	 * We have already exhausted all our reclaim opportunities without any
 	 * success so it is time to admit defeat. We will skip the OOM killer
@@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	 * The OOM killer may not free memory on a specific node.
 	 */
 	if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE))
-		goto out;
+		return NULL;
 	/* The OOM killer does not needlessly kill tasks for lowmem */
 	if (ac->highest_zoneidx < ZONE_NORMAL)
-		goto out;
+		return NULL;
 	if (pm_suspended_storage())
-		goto out;
+		return NULL;
 	/*
 	 * XXX: GFP_NOFS allocations should rather fail than rely on
 	 * other request to make a forward progress.
@@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	 * failures more gracefully we should just bail out here.
 	 */
 
+	/*
+	 * Acquire the oom lock.  If that fails, somebody else is
+	 * making progress for us.
+	 */
+	if (!mutex_trylock(&oom_lock)) {
+		*did_some_progress = 1;
+		schedule_timeout_uninterruptible(1);
+		return NULL;
+	}
+	success = out_of_memory(&oc);
+	mutex_unlock(&oom_lock);
+
 	/* Exhausted what can be done so it's blame time */
-	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
+	if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
 		*did_some_progress = 1;
 
 		/*
@@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
 					ALLOC_NO_WATERMARKS, ac);
 	}
-out:
-	mutex_unlock(&oom_lock);
+
 	return page;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size
  2020-09-14 10:06 [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size mateusznosek0
@ 2020-09-14 14:22 ` Michal Hocko
  2020-09-15 13:09   ` Mateusz Nosek
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2020-09-14 14:22 UTC (permalink / raw)
  To: mateusznosek0; +Cc: linux-mm, linux-kernel, akpm

On Mon 14-09-20 12:06:54, mateusznosek0@gmail.com wrote:
> From: Mateusz Nosek <mateusznosek0@gmail.com>
> 
> Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold.
> Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom'
> to reduce critical section size and improve overall system performance.

This is a real slow path. What is the point of optimizing it? Do you
have any numbers?

Also I am not convinced the patch is entirely safe. At least the last
allocation attempt is meant to be done under the lock to allow only one
task to perform this. I have forgot the complete reasoning behind that
but at least the changelog should argue why that is ok.

> Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
> ---
>  mm/page_alloc.c | 45 ++++++++++++++++++++++++---------------------
>  1 file changed, 24 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b9bd75cacf02..b07f950a5825 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  		.order = order,
>  	};
>  	struct page *page;
> -
> -	*did_some_progress = 0;
> -
> -	/*
> -	 * Acquire the oom lock.  If that fails, somebody else is
> -	 * making progress for us.
> -	 */
> -	if (!mutex_trylock(&oom_lock)) {
> -		*did_some_progress = 1;
> -		schedule_timeout_uninterruptible(1);
> -		return NULL;
> -	}
> +	bool success;
>  
>  	/*
>  	 * Go through the zonelist yet one more time, keep very high watermark
> @@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  				      ~__GFP_DIRECT_RECLAIM, order,
>  				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
>  	if (page)
> -		goto out;
> +		return page;
> +
> +	/* Check if somebody else is making progress for us. */
> +	*did_some_progress = mutex_is_locked(&oom_lock);
>  
>  	/* Coredumps can quickly deplete all memory reserves */
>  	if (current->flags & PF_DUMPCORE)
> -		goto out;
> +		return NULL;
>  	/* The OOM killer will not help higher order allocs */
>  	if (order > PAGE_ALLOC_COSTLY_ORDER)
> -		goto out;
> +		return NULL;
>  	/*
>  	 * We have already exhausted all our reclaim opportunities without any
>  	 * success so it is time to admit defeat. We will skip the OOM killer
> @@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	 * The OOM killer may not free memory on a specific node.
>  	 */
>  	if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE))
> -		goto out;
> +		return NULL;
>  	/* The OOM killer does not needlessly kill tasks for lowmem */
>  	if (ac->highest_zoneidx < ZONE_NORMAL)
> -		goto out;
> +		return NULL;
>  	if (pm_suspended_storage())
> -		goto out;
> +		return NULL;
>  	/*
>  	 * XXX: GFP_NOFS allocations should rather fail than rely on
>  	 * other request to make a forward progress.
> @@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	 * failures more gracefully we should just bail out here.
>  	 */
>  
> +	/*
> +	 * Acquire the oom lock.  If that fails, somebody else is
> +	 * making progress for us.
> +	 */
> +	if (!mutex_trylock(&oom_lock)) {
> +		*did_some_progress = 1;
> +		schedule_timeout_uninterruptible(1);
> +		return NULL;
> +	}
> +	success = out_of_memory(&oc);
> +	mutex_unlock(&oom_lock);
> +
>  	/* Exhausted what can be done so it's blame time */
> -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +	if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>  		*did_some_progress = 1;
>  
>  		/*
> @@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
>  					ALLOC_NO_WATERMARKS, ac);
>  	}
> -out:
> -	mutex_unlock(&oom_lock);
> +
>  	return page;
>  }
>  
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size
  2020-09-14 14:22 ` Michal Hocko
@ 2020-09-15 13:09   ` Mateusz Nosek
  2020-09-15 14:04     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Mateusz Nosek @ 2020-09-15 13:09 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, linux-kernel, akpm



On 9/14/2020 4:22 PM, Michal Hocko wrote:
> On Mon 14-09-20 12:06:54, mateusznosek0@gmail.com wrote:
>> From: Mateusz Nosek <mateusznosek0@gmail.com>
>>
>> Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold.
>> Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom'
>> to reduce critical section size and improve overall system performance.
> 
> This is a real slow path. What is the point of optimizing it? Do you
> have any numbers?
> 

I agree that as this is the slow path, then the hard, complicated 
optimizations are not recommended. In my humble opinion introduced patch 
is not complex and does not decrease code readability or 
maintainability. In a nutshell I see no drawbacks of applying it.

Even despite the fact that optimization is 'micro' I think that code in 
question should be optimized, since it is possible without introducing 
additional problems, (assuming after verifying that it is safe as you 
mentioned below) and especially as it is in 'the heart' of memory 
allocation in kernel.

> Also I am not convinced the patch is entirely safe. At least the last
> allocation attempt is meant to be done under the lock to allow only one
> task to perform this. I have forgot the complete reasoning behind that
> but at least the changelog should argue why that is ok.
> 

The last allocation just calls 'get_page_from_freelist' with two 
different sets of flags. I cannot see an obvious reason for it to 
require synchronization. I am investigating this deeper but if you 
happen to remember the reasoning or just any part of it and would be 
willing to share it, it would come with a lot of help.

>> Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
>> ---
>>   mm/page_alloc.c | 45 ++++++++++++++++++++++++---------------------
>>   1 file changed, 24 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index b9bd75cacf02..b07f950a5825 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>   		.order = order,
>>   	};
>>   	struct page *page;
>> -
>> -	*did_some_progress = 0;
>> -
>> -	/*
>> -	 * Acquire the oom lock.  If that fails, somebody else is
>> -	 * making progress for us.
>> -	 */
>> -	if (!mutex_trylock(&oom_lock)) {
>> -		*did_some_progress = 1;
>> -		schedule_timeout_uninterruptible(1);
>> -		return NULL;
>> -	}
>> +	bool success;
>>   
>>   	/*
>>   	 * Go through the zonelist yet one more time, keep very high watermark
>> @@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>   				      ~__GFP_DIRECT_RECLAIM, order,
>>   				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
>>   	if (page)
>> -		goto out;
>> +		return page;
>> +
>> +	/* Check if somebody else is making progress for us. */
>> +	*did_some_progress = mutex_is_locked(&oom_lock);
>>   
>>   	/* Coredumps can quickly deplete all memory reserves */
>>   	if (current->flags & PF_DUMPCORE)
>> -		goto out;
>> +		return NULL;
>>   	/* The OOM killer will not help higher order allocs */
>>   	if (order > PAGE_ALLOC_COSTLY_ORDER)
>> -		goto out;
>> +		return NULL;
>>   	/*
>>   	 * We have already exhausted all our reclaim opportunities without any
>>   	 * success so it is time to admit defeat. We will skip the OOM killer
>> @@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>   	 * The OOM killer may not free memory on a specific node.
>>   	 */
>>   	if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE))
>> -		goto out;
>> +		return NULL;
>>   	/* The OOM killer does not needlessly kill tasks for lowmem */
>>   	if (ac->highest_zoneidx < ZONE_NORMAL)
>> -		goto out;
>> +		return NULL;
>>   	if (pm_suspended_storage())
>> -		goto out;
>> +		return NULL;
>>   	/*
>>   	 * XXX: GFP_NOFS allocations should rather fail than rely on
>>   	 * other request to make a forward progress.
>> @@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>   	 * failures more gracefully we should just bail out here.
>>   	 */
>>   
>> +	/*
>> +	 * Acquire the oom lock.  If that fails, somebody else is
>> +	 * making progress for us.
>> +	 */
>> +	if (!mutex_trylock(&oom_lock)) {
>> +		*did_some_progress = 1;
>> +		schedule_timeout_uninterruptible(1);
>> +		return NULL;
>> +	}
>> +	success = out_of_memory(&oc);
>> +	mutex_unlock(&oom_lock);
>> +
>>   	/* Exhausted what can be done so it's blame time */
>> -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>> +	if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>>   		*did_some_progress = 1;
>>   
>>   		/*
>> @@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>   			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
>>   					ALLOC_NO_WATERMARKS, ac);
>>   	}
>> -out:
>> -	mutex_unlock(&oom_lock);
>> +
>>   	return page;
>>   }
>>   
>> -- 
>> 2.20.1
>>
> 
Sincerely yours,
Mateusz Nosek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size
  2020-09-15 13:09   ` Mateusz Nosek
@ 2020-09-15 14:04     ` Michal Hocko
  2020-09-16 11:13       ` Mateusz Nosek
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2020-09-15 14:04 UTC (permalink / raw)
  To: Mateusz Nosek; +Cc: linux-mm, linux-kernel, akpm

On Tue 15-09-20 15:09:59, Mateusz Nosek wrote:
> 
> 
> On 9/14/2020 4:22 PM, Michal Hocko wrote:
> > On Mon 14-09-20 12:06:54, mateusznosek0@gmail.com wrote:
> > > From: Mateusz Nosek <mateusznosek0@gmail.com>
> > > 
> > > Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold.
> > > Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom'
> > > to reduce critical section size and improve overall system performance.
> > 
> > This is a real slow path. What is the point of optimizing it? Do you
> > have any numbers?
> > 
> 
> I agree that as this is the slow path, then the hard, complicated
> optimizations are not recommended. In my humble opinion introduced patch is
> not complex and does not decrease code readability or maintainability. In a
> nutshell I see no drawbacks of applying it.

This is clearly a matter of taste. I do not see a good reason to apply
it TBH. It is a claimed optimization without any numbers to back that
claim. It is also a tricky area so I am usually very careful to touch
this code without a strong reason.  Others might feel differently of
course.

[...]

Anyway, I have only now looked closer at the patch...

> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index b9bd75cacf02..b07f950a5825 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > >   		.order = order,
> > >   	};
> > >   	struct page *page;
> > > -
> > > -	*did_some_progress = 0;
> > > -
> > > -	/*
> > > -	 * Acquire the oom lock.  If that fails, somebody else is
> > > -	 * making progress for us.
> > > -	 */
> > > -	if (!mutex_trylock(&oom_lock)) {
> > > -		*did_some_progress = 1;
> > > -		schedule_timeout_uninterruptible(1);
> > > -		return NULL;
> > > -	}
> > > +	bool success;
> > >   	/*
> > >   	 * Go through the zonelist yet one more time, keep very high watermark
> > > @@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > >   				      ~__GFP_DIRECT_RECLAIM, order,
> > >   				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
> > >   	if (page)
> > > -		goto out;
> > > +		return page;
> > > +
> > > +	/* Check if somebody else is making progress for us. */
> > > +	*did_some_progress = mutex_is_locked(&oom_lock);

This is not only quite ugly but wrong as well. In general checking for a
lock state is racy unless the lock is taken somewhere up the call chain.

In this particular case it wouldn't be a big deal because an additional
retry (did_some_progress = 1) is not really critical. It would likely be
nicer to be deterministic here and not retry on all the early bailouts
regardless of the lock state.

> > >   	/* Coredumps can quickly deplete all memory reserves */
> > >   	if (current->flags & PF_DUMPCORE)
> > > -		goto out;
> > > +		return NULL;
> > >   	/* The OOM killer will not help higher order allocs */
> > >   	if (order > PAGE_ALLOC_COSTLY_ORDER)
> > > -		goto out;
> > > +		return NULL;
> > >   	/*
> > >   	 * We have already exhausted all our reclaim opportunities without any
> > >   	 * success so it is time to admit defeat. We will skip the OOM killer
> > > @@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > >   	 * The OOM killer may not free memory on a specific node.
> > >   	 */
> > >   	if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE))
> > > -		goto out;
> > > +		return NULL;
> > >   	/* The OOM killer does not needlessly kill tasks for lowmem */
> > >   	if (ac->highest_zoneidx < ZONE_NORMAL)
> > > -		goto out;
> > > +		return NULL;
> > >   	if (pm_suspended_storage())
> > > -		goto out;
> > > +		return NULL;
> > >   	/*
> > >   	 * XXX: GFP_NOFS allocations should rather fail than rely on
> > >   	 * other request to make a forward progress.
> > > @@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > >   	 * failures more gracefully we should just bail out here.
> > >   	 */
> > > +	/*
> > > +	 * Acquire the oom lock.  If that fails, somebody else is
> > > +	 * making progress for us.
> > > +	 */
> > > +	if (!mutex_trylock(&oom_lock)) {
> > > +		*did_some_progress = 1;
> > > +		schedule_timeout_uninterruptible(1);
> > > +		return NULL;
> > > +	}
> > > +	success = out_of_memory(&oc);
> > > +	mutex_unlock(&oom_lock);
> > > +
> > >   	/* Exhausted what can be done so it's blame time */
> > > -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> > > +	if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> > >   		*did_some_progress = 1;
> > >   		/*
> > > @@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > >   			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
> > >   					ALLOC_NO_WATERMARKS, ac);
> > >   	}
> > > -out:
> > > -	mutex_unlock(&oom_lock);
> > > +
> > >   	return page;
> > >   }
> > > -- 
> > > 2.20.1
> > > 
> > 
> Sincerely yours,
> Mateusz Nosek

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size
  2020-09-15 14:04     ` Michal Hocko
@ 2020-09-16 11:13       ` Mateusz Nosek
  0 siblings, 0 replies; 5+ messages in thread
From: Mateusz Nosek @ 2020-09-16 11:13 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, linux-kernel, akpm

Hello,

Thank you for your comments.
I will modify the patch where necessary and resend v2, but first I will 
make 100% sure about the lack of synchronization problem, that might 
potentially be there as you mentioned in previous mail , and try to 
check some numbers for my support.

Sincerely yours,
Mateusz Nosek

On 9/15/2020 4:04 PM, Michal Hocko wrote:
> On Tue 15-09-20 15:09:59, Mateusz Nosek wrote:
>>
>>
>> On 9/14/2020 4:22 PM, Michal Hocko wrote:
>>> On Mon 14-09-20 12:06:54, mateusznosek0@gmail.com wrote:
>>>> From: Mateusz Nosek <mateusznosek0@gmail.com>
>>>>
>>>> Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold.
>>>> Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom'
>>>> to reduce critical section size and improve overall system performance.
>>>
>>> This is a real slow path. What is the point of optimizing it? Do you
>>> have any numbers?
>>>
>>
>> I agree that as this is the slow path, then the hard, complicated
>> optimizations are not recommended. In my humble opinion introduced patch is
>> not complex and does not decrease code readability or maintainability. In a
>> nutshell I see no drawbacks of applying it.
> 
> This is clearly a matter of taste. I do not see a good reason to apply
> it TBH. It is a claimed optimization without any numbers to back that
> claim. It is also a tricky area so I am usually very careful to touch
> this code without a strong reason.  Others might feel differently of
> course.
> 
> [...]
> 
> Anyway, I have only now looked closer at the patch...
> 
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index b9bd75cacf02..b07f950a5825 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>>>    		.order = order,
>>>>    	};
>>>>    	struct page *page;
>>>> -
>>>> -	*did_some_progress = 0;
>>>> -
>>>> -	/*
>>>> -	 * Acquire the oom lock.  If that fails, somebody else is
>>>> -	 * making progress for us.
>>>> -	 */
>>>> -	if (!mutex_trylock(&oom_lock)) {
>>>> -		*did_some_progress = 1;
>>>> -		schedule_timeout_uninterruptible(1);
>>>> -		return NULL;
>>>> -	}
>>>> +	bool success;
>>>>    	/*
>>>>    	 * Go through the zonelist yet one more time, keep very high watermark
>>>> @@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>>>    				      ~__GFP_DIRECT_RECLAIM, order,
>>>>    				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
>>>>    	if (page)
>>>> -		goto out;
>>>> +		return page;
>>>> +
>>>> +	/* Check if somebody else is making progress for us. */
>>>> +	*did_some_progress = mutex_is_locked(&oom_lock);
> 
> This is not only quite ugly but wrong as well. In general checking for a
> lock state is racy unless the lock is taken somewhere up the call chain.
> 
> In this particular case it wouldn't be a big deal because an additional
> retry (did_some_progress = 1) is not really critical. It would likely be
> nicer to be deterministic here and not retry on all the early bailouts
> regardless of the lock state.
> 
>>>>    	/* Coredumps can quickly deplete all memory reserves */
>>>>    	if (current->flags & PF_DUMPCORE)
>>>> -		goto out;
>>>> +		return NULL;
>>>>    	/* The OOM killer will not help higher order allocs */
>>>>    	if (order > PAGE_ALLOC_COSTLY_ORDER)
>>>> -		goto out;
>>>> +		return NULL;
>>>>    	/*
>>>>    	 * We have already exhausted all our reclaim opportunities without any
>>>>    	 * success so it is time to admit defeat. We will skip the OOM killer
>>>> @@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>>>    	 * The OOM killer may not free memory on a specific node.
>>>>    	 */
>>>>    	if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE))
>>>> -		goto out;
>>>> +		return NULL;
>>>>    	/* The OOM killer does not needlessly kill tasks for lowmem */
>>>>    	if (ac->highest_zoneidx < ZONE_NORMAL)
>>>> -		goto out;
>>>> +		return NULL;
>>>>    	if (pm_suspended_storage())
>>>> -		goto out;
>>>> +		return NULL;
>>>>    	/*
>>>>    	 * XXX: GFP_NOFS allocations should rather fail than rely on
>>>>    	 * other request to make a forward progress.
>>>> @@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>>>    	 * failures more gracefully we should just bail out here.
>>>>    	 */
>>>> +	/*
>>>> +	 * Acquire the oom lock.  If that fails, somebody else is
>>>> +	 * making progress for us.
>>>> +	 */
>>>> +	if (!mutex_trylock(&oom_lock)) {
>>>> +		*did_some_progress = 1;
>>>> +		schedule_timeout_uninterruptible(1);
>>>> +		return NULL;
>>>> +	}
>>>> +	success = out_of_memory(&oc);
>>>> +	mutex_unlock(&oom_lock);
>>>> +
>>>>    	/* Exhausted what can be done so it's blame time */
>>>> -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>>>> +	if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>>>>    		*did_some_progress = 1;
>>>>    		/*
>>>> @@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>>>    			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
>>>>    					ALLOC_NO_WATERMARKS, ac);
>>>>    	}
>>>> -out:
>>>> -	mutex_unlock(&oom_lock);
>>>> +
>>>>    	return page;
>>>>    }
>>>> -- 
>>>> 2.20.1
>>>>
>>>
>> Sincerely yours,
>> Mateusz Nosek
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-16 16:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 10:06 [RFC PATCH] mm/page_alloc.c: micro-optimization reduce oom critical section size mateusznosek0
2020-09-14 14:22 ` Michal Hocko
2020-09-15 13:09   ` Mateusz Nosek
2020-09-15 14:04     ` Michal Hocko
2020-09-16 11:13       ` Mateusz Nosek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.