linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: increase default min_free_kbytes bound
@ 2020-02-20 15:01 Joel Savitz
  2020-02-21  0:40 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Joel Savitz @ 2020-02-20 15:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joel Savitz, Andrew Morton, Rafael Aquini, linux-mm


Currently, the vm.min_free_kbytes sysctl value is capped at a hardcoded
64M in init_per_zone_wmark_min (unless it is overridden by khugepaged
initialization).

This value has not been modified since 2005, and enterprise-grade
systems now frequently have hundreds of GB of RAM and multiple 10, 40,
or even 100 GB NICs. We have seen page allocation failures on heavily
loaded systems related to NIC drivers. These issues were resolved by an
increase to vm.min_free_kbytes.

This patch increases the hardcoded value by a factor of 4 as a temporary
solution.

Further work to make the calculation of vm.min_free_kbytes more
consistent throughout the kernel would be desirable.

As an example of the inconsistency of the current method, this value is
recalculated by init_per_zone_wmark_min() in the case of memory hotplug
which will override the value set by set_recommended_min_free_kbytes()
called during khugepaged initialization even if khugepaged remains
enabled, however an on/off toggle of khugepaged will then recalculate
and set the value via set_recommended_min_free_kbytes().

Signed-off-by: Joel Savitz <jsavitz@redhat.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..32cbfb13e958 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7867,8 +7867,8 @@ int __meminit init_per_zone_wmark_min(void)
 		min_free_kbytes = new_min_free_kbytes;
 		if (min_free_kbytes < 128)
 			min_free_kbytes = 128;
-		if (min_free_kbytes > 65536)
-			min_free_kbytes = 65536;
+		if (min_free_kbytes > 262144)
+			min_free_kbytes = 262144;
 	} else {
 		pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
 				new_min_free_kbytes, user_min_free_kbytes);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: increase default min_free_kbytes bound
  2020-02-20 15:01 [PATCH] mm/page_alloc: increase default min_free_kbytes bound Joel Savitz
@ 2020-02-21  0:40 ` Andrew Morton
  2020-02-21  1:27 ` Mike Kravetz
  2020-02-21  1:53 ` John Hubbard
  2 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2020-02-21  0:40 UTC (permalink / raw)
  To: Joel Savitz; +Cc: linux-kernel, Rafael Aquini, linux-mm

On Thu, 20 Feb 2020 10:01:03 -0500 Joel Savitz <jsavitz@redhat.com> wrote:

> 
> Currently, the vm.min_free_kbytes sysctl value is capped at a hardcoded
> 64M in init_per_zone_wmark_min (unless it is overridden by khugepaged
> initialization).
> 
> This value has not been modified since 2005, and enterprise-grade
> systems now frequently have hundreds of GB of RAM and multiple 10, 40,
> or even 100 GB NICs. We have seen page allocation failures on heavily
> loaded systems related to NIC drivers. These issues were resolved by an
> increase to vm.min_free_kbytes.
> 
> This patch increases the hardcoded value by a factor of 4 as a temporary
> solution.

OK, better than nothing I guess.

> Further work to make the calculation of vm.min_free_kbytes more
> consistent throughout the kernel would be desirable.
> 
> As an example of the inconsistency of the current method, this value is
> recalculated by init_per_zone_wmark_min() in the case of memory hotplug
> which will override the value set by set_recommended_min_free_kbytes()
> called during khugepaged initialization even if khugepaged remains
> enabled, however an on/off toggle of khugepaged will then recalculate
> and set the value via set_recommended_min_free_kbytes().
> 

Yup, these hardcoded numbers are lame.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: increase default min_free_kbytes bound
  2020-02-20 15:01 [PATCH] mm/page_alloc: increase default min_free_kbytes bound Joel Savitz
  2020-02-21  0:40 ` Andrew Morton
@ 2020-02-21  1:27 ` Mike Kravetz
  2020-02-21  1:53 ` John Hubbard
  2 siblings, 0 replies; 5+ messages in thread
From: Mike Kravetz @ 2020-02-21  1:27 UTC (permalink / raw)
  To: Joel Savitz, linux-kernel; +Cc: Andrew Morton, Rafael Aquini, linux-mm

On 2/20/20 7:01 AM, Joel Savitz wrote:
> 
> Further work to make the calculation of vm.min_free_kbytes more
> consistent throughout the kernel would be desirable.
> 
> As an example of the inconsistency of the current method, this value is
> recalculated by init_per_zone_wmark_min() in the case of memory hotplug
> which will override the value set by set_recommended_min_free_kbytes()
> called during khugepaged initialization even if khugepaged remains
> enabled, however an on/off toggle of khugepaged will then recalculate
> and set the value via set_recommended_min_free_kbytes().

I took a shot at fixing some of those inconsistencies.

https://lore.kernel.org/linux-mm/20200210190121.10670-1-mike.kravetz@oracle.com/
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: increase default min_free_kbytes bound
  2020-02-20 15:01 [PATCH] mm/page_alloc: increase default min_free_kbytes bound Joel Savitz
  2020-02-21  0:40 ` Andrew Morton
  2020-02-21  1:27 ` Mike Kravetz
@ 2020-02-21  1:53 ` John Hubbard
  2020-02-27 12:45   ` Vlastimil Babka
  2 siblings, 1 reply; 5+ messages in thread
From: John Hubbard @ 2020-02-21  1:53 UTC (permalink / raw)
  To: Joel Savitz, linux-kernel; +Cc: Andrew Morton, Rafael Aquini, linux-mm

On 2/20/20 7:01 AM, Joel Savitz wrote:
> 
> Currently, the vm.min_free_kbytes sysctl value is capped at a hardcoded
> 64M in init_per_zone_wmark_min (unless it is overridden by khugepaged
> initialization).
> 
> This value has not been modified since 2005, and enterprise-grade
> systems now frequently have hundreds of GB of RAM and multiple 10, 40,
> or even 100 GB NICs. We have seen page allocation failures on heavily
> loaded systems related to NIC drivers. These issues were resolved by an
> increase to vm.min_free_kbytes.
> 
> This patch increases the hardcoded value by a factor of 4 as a temporary
> solution.
> 
> Further work to make the calculation of vm.min_free_kbytes more
> consistent throughout the kernel would be desirable.
> 
> As an example of the inconsistency of the current method, this value is
> recalculated by init_per_zone_wmark_min() in the case of memory hotplug
> which will override the value set by set_recommended_min_free_kbytes()
> called during khugepaged initialization even if khugepaged remains
> enabled, however an on/off toggle of khugepaged will then recalculate
> and set the value via set_recommended_min_free_kbytes().
> 
> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> ---
>  mm/page_alloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..32cbfb13e958 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7867,8 +7867,8 @@ int __meminit init_per_zone_wmark_min(void)
>  		min_free_kbytes = new_min_free_kbytes;
>  		if (min_free_kbytes < 128)
>  			min_free_kbytes = 128;
> -		if (min_free_kbytes > 65536)
> -			min_free_kbytes = 65536;
> +		if (min_free_kbytes > 262144)
> +			min_free_kbytes = 262144;


Would it be any better to at least use symbols, instead of numbers, in the
routine? Like this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..e705636bb644 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -149,6 +149,9 @@ DEFINE_STATIC_KEY_FALSE(init_on_free);
 #endif
 EXPORT_SYMBOL(init_on_free);
 
+static const int MIN_FREE_KBYTES_LOWER_LIMIT = 128;
+static const int MIN_FREE_KBYTES_UPPER_LIMIT = 262144;
+
 static int __init early_init_on_alloc(char *buf)
 {
        int ret;
@@ -7865,10 +7868,10 @@ int __meminit init_per_zone_wmark_min(void)
 
        if (new_min_free_kbytes > user_min_free_kbytes) {
                min_free_kbytes = new_min_free_kbytes;
-               if (min_free_kbytes < 128)
-                       min_free_kbytes = 128;
-               if (min_free_kbytes > 65536)
-                       min_free_kbytes = 65536;
+               if (min_free_kbytes < MIN_FREE_KBYTES_LOWER_LIMIT)
+                       min_free_kbytes = MIN_FREE_KBYTES_LOWER_LIMIT;
+               if (min_free_kbytes > MIN_FREE_KBYTES_UPPER_LIMIT)
+                       min_free_kbytes = MIN_FREE_KBYTES_UPPER_LIMIT;
        } else {
                pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
                                new_min_free_kbytes, user_min_free_kbytes);


thanks,
-- 
John Hubbard
NVIDIA


>  	} else {
>  		pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
>  				new_min_free_kbytes, user_min_free_kbytes);
> 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: increase default min_free_kbytes bound
  2020-02-21  1:53 ` John Hubbard
@ 2020-02-27 12:45   ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2020-02-27 12:45 UTC (permalink / raw)
  To: John Hubbard, Joel Savitz, linux-kernel
  Cc: Andrew Morton, Rafael Aquini, linux-mm

On 2/21/20 2:53 AM, John Hubbard wrote:
> On 2/20/20 7:01 AM, Joel Savitz wrote:
>> 
>> Currently, the vm.min_free_kbytes sysctl value is capped at a hardcoded
>> 64M in init_per_zone_wmark_min (unless it is overridden by khugepaged
>> initialization).
>> 
>> This value has not been modified since 2005, and enterprise-grade
>> systems now frequently have hundreds of GB of RAM and multiple 10, 40,
>> or even 100 GB NICs. We have seen page allocation failures on heavily
>> loaded systems related to NIC drivers. These issues were resolved by an
>> increase to vm.min_free_kbytes.
>> 
>> This patch increases the hardcoded value by a factor of 4 as a temporary
>> solution.
>> 
>> Further work to make the calculation of vm.min_free_kbytes more
>> consistent throughout the kernel would be desirable.
>> 
>> As an example of the inconsistency of the current method, this value is
>> recalculated by init_per_zone_wmark_min() in the case of memory hotplug
>> which will override the value set by set_recommended_min_free_kbytes()
>> called during khugepaged initialization even if khugepaged remains
>> enabled, however an on/off toggle of khugepaged will then recalculate
>> and set the value via set_recommended_min_free_kbytes().
>> 
>> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
>> ---
>>  mm/page_alloc.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3c4eb750a199..32cbfb13e958 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7867,8 +7867,8 @@ int __meminit init_per_zone_wmark_min(void)
>>  		min_free_kbytes = new_min_free_kbytes;
>>  		if (min_free_kbytes < 128)
>>  			min_free_kbytes = 128;
>> -		if (min_free_kbytes > 65536)
>> -			min_free_kbytes = 65536;
>> +		if (min_free_kbytes > 262144)
>> +			min_free_kbytes = 262144;
> 
> 
> Would it be any better to at least use symbols, instead of numbers, in the
> routine? Like this:

+1

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-27 12:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 15:01 [PATCH] mm/page_alloc: increase default min_free_kbytes bound Joel Savitz
2020-02-21  0:40 ` Andrew Morton
2020-02-21  1:27 ` Mike Kravetz
2020-02-21  1:53 ` John Hubbard
2020-02-27 12:45   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).