linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* threads-max observe limits
@ 2019-09-17 10:03 Michal Hocko
  2019-09-17 15:28 ` Heinrich Schuchardt
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-09-17 10:03 UTC (permalink / raw)
  To: Heinrich Schuchardt, Andrew Morton; +Cc: LKML

Hi,
I have just stmbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
observe limits") and I am really wondering what is the motivation behind
the patch. We've had a customer noticing the threads_max autoscaling
differences btween 3.12 and 4.4 kernels and wanted to override the auto
tuning from the userspace, just to find out that this is not possible.

Why do we override user admin like that? I find it quite dubious to be
honest. Especially when the auto-tunning is just a very rough estimation
and it seems quite arbitrary.

Thanks
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-17 10:03 threads-max observe limits Michal Hocko
@ 2019-09-17 15:28 ` Heinrich Schuchardt
  2019-09-17 15:38   ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Heinrich Schuchardt @ 2019-09-17 15:28 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton; +Cc: LKML


On 9/17/19 12:03 PM, Michal Hocko wrote:
> Hi,
> I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
> observe limits") and I am really wondering what is the motivation behind
> the patch. We've had a customer noticing the threads_max autoscaling
> differences btween 3.12 and 4.4 kernels and wanted to override the auto
> tuning from the userspace, just to find out that this is not possible.

set_max_threads() sets the upper limit (max_threads_suggested) for
threads such that at a maximum 1/8th of the total memory can be occupied
by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
system this results in 254313 threads.

With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
a user cannot set an arbitrarily high number for
/proc/sys/kernel/threads-max which could lead to a system stalling
because the thread headers occupy all the memory.

When developing the patch I remarked that on a system where memory is
installed dynamically it might be a good idea to recalculate this limit.
If you have a system that boots with let's say 8 GiB and than
dynamically installs a few TiB of RAM this might make sense. But such a
dynamic update of thread_max_suggested was left out for the sake of
simplicity.

Anyway if more than 100,000 threads are used on a system, I would wonder
if the software should not be changed to use thread-pools instead.

Best regards

Heinrich

>
> Why do we override user admin like that? I find it quite dubious to be
> honest. Especially when the auto-tunning is just a very rough estimation
> and it seems quite arbitrary.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-17 15:28 ` Heinrich Schuchardt
@ 2019-09-17 15:38   ` Michal Hocko
  2019-09-17 17:26     ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-09-17 15:38 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: Andrew Morton, LKML

On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote:
> 
> On 9/17/19 12:03 PM, Michal Hocko wrote:
> > Hi,
> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
> > observe limits") and I am really wondering what is the motivation behind
> > the patch. We've had a customer noticing the threads_max autoscaling
> > differences btween 3.12 and 4.4 kernels and wanted to override the auto
> > tuning from the userspace, just to find out that this is not possible.
> 
> set_max_threads() sets the upper limit (max_threads_suggested) for
> threads such that at a maximum 1/8th of the total memory can be occupied
> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
> system this results in 254313 threads.

This is quite arbitrary, isn't it? What would happen if the limit was
twice as large?

> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> a user cannot set an arbitrarily high number for
> /proc/sys/kernel/threads-max which could lead to a system stalling
> because the thread headers occupy all the memory.

This is still a decision of the admin to make.  You can consume the
memory by other means and that is why we have measures in place. E.g.
memcg accounting.

> When developing the patch I remarked that on a system where memory is
> installed dynamically it might be a good idea to recalculate this limit.
> If you have a system that boots with let's say 8 GiB and than
> dynamically installs a few TiB of RAM this might make sense. But such a
> dynamic update of thread_max_suggested was left out for the sake of
> simplicity.
> 
> Anyway if more than 100,000 threads are used on a system, I would wonder
> if the software should not be changed to use thread-pools instead.

You do not change the software to overcome artificial bounds based on
guessing.

So can we get back to the justification of the patch. What kind of
real life problem does it solve and why is it ok to override an admin
decision?
If there is no strong justification then the patch should be reverted
because from what I have heard it has been noticed and it has broken
a certain deployment. I am not really clear about technical details yet
but it seems that there are workloads that believe they need to touch
this tuning and complain if that is not possible.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-17 15:38   ` Michal Hocko
@ 2019-09-17 17:26     ` Eric W. Biederman
  2019-09-18  7:15       ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2019-09-17 17:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Heinrich Schuchardt, Andrew Morton, LKML

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote:
>> 
>> On 9/17/19 12:03 PM, Michal Hocko wrote:
>> > Hi,
>> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
>> > observe limits") and I am really wondering what is the motivation behind
>> > the patch. We've had a customer noticing the threads_max autoscaling
>> > differences btween 3.12 and 4.4 kernels and wanted to override the auto
>> > tuning from the userspace, just to find out that this is not possible.
>> 
>> set_max_threads() sets the upper limit (max_threads_suggested) for
>> threads such that at a maximum 1/8th of the total memory can be occupied
>> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
>> system this results in 254313 threads.
>
> This is quite arbitrary, isn't it? What would happen if the limit was
> twice as large?
>
>> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
>> a user cannot set an arbitrarily high number for
>> /proc/sys/kernel/threads-max which could lead to a system stalling
>> because the thread headers occupy all the memory.
>
> This is still a decision of the admin to make.  You can consume the
> memory by other means and that is why we have measures in place. E.g.
> memcg accounting.
>
>> When developing the patch I remarked that on a system where memory is
>> installed dynamically it might be a good idea to recalculate this limit.
>> If you have a system that boots with let's say 8 GiB and than
>> dynamically installs a few TiB of RAM this might make sense. But such a
>> dynamic update of thread_max_suggested was left out for the sake of
>> simplicity.
>> 
>> Anyway if more than 100,000 threads are used on a system, I would wonder
>> if the software should not be changed to use thread-pools instead.
>
> You do not change the software to overcome artificial bounds based on
> guessing.
>
> So can we get back to the justification of the patch. What kind of
> real life problem does it solve and why is it ok to override an admin
> decision?
> If there is no strong justification then the patch should be reverted
> because from what I have heard it has been noticed and it has broken
> a certain deployment. I am not really clear about technical details yet
> but it seems that there are workloads that believe they need to touch
> this tuning and complain if that is not possible.

Taking a quick look myself.

I am completely mystified by both sides of this conversation.

a) The logic to set the default number of threads in a system
   has not changed since 2.6.12-rc2 (the start of the git history).

The implementation has changed but we should still get the same
value.  So anyone seeing threads_max autoscaling differences
between kernels is either seeing a bug in the rewritten formula
or something else weird is going on.

Michal is it a very small effect your customers are seeing?
Is it another bug somewhere else?

b) Not being able to bump threads_max to the physical limit of
   the machine is very clearly a regression.

 Limiting threads_max to THREADS_MIN on the low end and THREAD_MAX on
 the high end is reasonable, because linux can't cope with values
 outside of that range.  Limiting threads_max to the auto-scaling value
 is a regression.

The point of limits like threads_max is to have something that 99%
of people won't hit and if they do it will indicate a bug in their
application.  And to generally keep the kernel working when an
application bug happens.

But there are always cases where heuristics fail so it is completely
reasonable to allow these values to be manually tuned.

Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-17 17:26     ` Eric W. Biederman
@ 2019-09-18  7:15       ` Michal Hocko
  2019-09-19  7:59         ` Michal Hocko
  2019-09-19 19:33         ` Eric W. Biederman
  0 siblings, 2 replies; 15+ messages in thread
From: Michal Hocko @ 2019-09-18  7:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Heinrich Schuchardt, Andrew Morton, LKML

On Tue 17-09-19 12:26:18, Eric W. Biederman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote:
> >> 
> >> On 9/17/19 12:03 PM, Michal Hocko wrote:
> >> > Hi,
> >> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
> >> > observe limits") and I am really wondering what is the motivation behind
> >> > the patch. We've had a customer noticing the threads_max autoscaling
> >> > differences btween 3.12 and 4.4 kernels and wanted to override the auto
> >> > tuning from the userspace, just to find out that this is not possible.
> >> 
> >> set_max_threads() sets the upper limit (max_threads_suggested) for
> >> threads such that at a maximum 1/8th of the total memory can be occupied
> >> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
> >> system this results in 254313 threads.
> >
> > This is quite arbitrary, isn't it? What would happen if the limit was
> > twice as large?
> >
> >> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> >> a user cannot set an arbitrarily high number for
> >> /proc/sys/kernel/threads-max which could lead to a system stalling
> >> because the thread headers occupy all the memory.
> >
> > This is still a decision of the admin to make.  You can consume the
> > memory by other means and that is why we have measures in place. E.g.
> > memcg accounting.
> >
> >> When developing the patch I remarked that on a system where memory is
> >> installed dynamically it might be a good idea to recalculate this limit.
> >> If you have a system that boots with let's say 8 GiB and than
> >> dynamically installs a few TiB of RAM this might make sense. But such a
> >> dynamic update of thread_max_suggested was left out for the sake of
> >> simplicity.
> >> 
> >> Anyway if more than 100,000 threads are used on a system, I would wonder
> >> if the software should not be changed to use thread-pools instead.
> >
> > You do not change the software to overcome artificial bounds based on
> > guessing.
> >
> > So can we get back to the justification of the patch. What kind of
> > real life problem does it solve and why is it ok to override an admin
> > decision?
> > If there is no strong justification then the patch should be reverted
> > because from what I have heard it has been noticed and it has broken
> > a certain deployment. I am not really clear about technical details yet
> > but it seems that there are workloads that believe they need to touch
> > this tuning and complain if that is not possible.
> 
> Taking a quick look myself.
> 
> I am completely mystified by both sides of this conversation.
> 
> a) The logic to set the default number of threads in a system
>    has not changed since 2.6.12-rc2 (the start of the git history).
> 
> The implementation has changed but we should still get the same
> value.  So anyone seeing threads_max autoscaling differences
> between kernels is either seeing a bug in the rewritten formula
> or something else weird is going on.
> 
> Michal is it a very small effect your customers are seeing?
> Is it another bug somewhere else?

I am still trying to get more information. Reportedly they see a
different auto tuned limit between two kernel versions which results in
an applicaton complaining. As already mentioned this might be a side
effect of something else and this is not yet fully analyzed. My main
point for bringing up this discussion is ...

> b) Not being able to bump threads_max to the physical limit of
>    the machine is very clearly a regression.

... exactly this part. The changelog of the respective patch doesn't
really exaplain why it is needed except of "it sounds like a good idea
to be consistent".
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-18  7:15       ` Michal Hocko
@ 2019-09-19  7:59         ` Michal Hocko
  2019-09-19 19:38           ` Andrew Morton
  2019-09-19 19:33         ` Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-09-19  7:59 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: Eric W. Biederman, Andrew Morton, LKML

On Wed 18-09-19 09:15:41, Michal Hocko wrote:
> On Tue 17-09-19 12:26:18, Eric W. Biederman wrote:
[...]
> > b) Not being able to bump threads_max to the physical limit of
> >    the machine is very clearly a regression.
> 
> ... exactly this part. The changelog of the respective patch doesn't
> really exaplain why it is needed except of "it sounds like a good idea
> to be consistent".

Any take on this Heinrich? If there really is not strong reasoning about
the restricting user input then I will suggest reverting 16db3d3f1170
("kernel/sysctl.c: threads-max observe limits")

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-18  7:15       ` Michal Hocko
  2019-09-19  7:59         ` Michal Hocko
@ 2019-09-19 19:33         ` Eric W. Biederman
  2019-09-22  6:58           ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2019-09-19 19:33 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Heinrich Schuchardt, Andrew Morton, LKML

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 17-09-19 12:26:18, Eric W. Biederman wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>> 
>> > On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote:
>> >> 
>> >> On 9/17/19 12:03 PM, Michal Hocko wrote:
>> >> > Hi,
>> >> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
>> >> > observe limits") and I am really wondering what is the motivation behind
>> >> > the patch. We've had a customer noticing the threads_max autoscaling
>> >> > differences btween 3.12 and 4.4 kernels and wanted to override the auto
>> >> > tuning from the userspace, just to find out that this is not possible.
>> >> 
>> >> set_max_threads() sets the upper limit (max_threads_suggested) for
>> >> threads such that at a maximum 1/8th of the total memory can be occupied
>> >> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
>> >> system this results in 254313 threads.
>> >
>> > This is quite arbitrary, isn't it? What would happen if the limit was
>> > twice as large?
>> >
>> >> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
>> >> a user cannot set an arbitrarily high number for
>> >> /proc/sys/kernel/threads-max which could lead to a system stalling
>> >> because the thread headers occupy all the memory.
>> >
>> > This is still a decision of the admin to make.  You can consume the
>> > memory by other means and that is why we have measures in place. E.g.
>> > memcg accounting.
>> >
>> >> When developing the patch I remarked that on a system where memory is
>> >> installed dynamically it might be a good idea to recalculate this limit.
>> >> If you have a system that boots with let's say 8 GiB and than
>> >> dynamically installs a few TiB of RAM this might make sense. But such a
>> >> dynamic update of thread_max_suggested was left out for the sake of
>> >> simplicity.
>> >> 
>> >> Anyway if more than 100,000 threads are used on a system, I would wonder
>> >> if the software should not be changed to use thread-pools instead.
>> >
>> > You do not change the software to overcome artificial bounds based on
>> > guessing.
>> >
>> > So can we get back to the justification of the patch. What kind of
>> > real life problem does it solve and why is it ok to override an admin
>> > decision?
>> > If there is no strong justification then the patch should be reverted
>> > because from what I have heard it has been noticed and it has broken
>> > a certain deployment. I am not really clear about technical details yet
>> > but it seems that there are workloads that believe they need to touch
>> > this tuning and complain if that is not possible.
>> 
>> Taking a quick look myself.
>> 
>> I am completely mystified by both sides of this conversation.
>> 
>> a) The logic to set the default number of threads in a system
>>    has not changed since 2.6.12-rc2 (the start of the git history).
>> 
>> The implementation has changed but we should still get the same
>> value.  So anyone seeing threads_max autoscaling differences
>> between kernels is either seeing a bug in the rewritten formula
>> or something else weird is going on.
>> 
>> Michal is it a very small effect your customers are seeing?
>> Is it another bug somewhere else?
>
> I am still trying to get more information. Reportedly they see a
> different auto tuned limit between two kernel versions which results in
> an applicaton complaining. As already mentioned this might be a side
> effect of something else and this is not yet fully analyzed. My main
> point for bringing up this discussion is ...

Please this sounds like the kind of issue that will reveal something
deeper about what is going on.

>
>> b) Not being able to bump threads_max to the physical limit of
>>    the machine is very clearly a regression.
>
> ... exactly this part. The changelog of the respective patch doesn't
> really exaplain why it is needed except of "it sounds like a good idea
> to be consistent".

I suggest doing a partial revert to just:

diff --git a/kernel/fork.c b/kernel/fork.c
index 7a74ade4e7d6..de8264ea34a7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2943,7 +2943,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
 	if (ret || !write)
 		return ret;
 
-	set_max_threads(threads);
+	max_threads = threads;
 
 	return 0;
 }

proc_dointvec_minmax limiting the values to MIN_THREADS and MAX_THREADS
is justifiable.  Those are the minimum and maximum values the kernel can
function with.

With a good changelog we should be able to backport that change without
any fear.

Eric


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-19  7:59         ` Michal Hocko
@ 2019-09-19 19:38           ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2019-09-19 19:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Heinrich Schuchardt, Eric W. Biederman, LKML

On Thu, 19 Sep 2019 09:59:11 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 18-09-19 09:15:41, Michal Hocko wrote:
> > On Tue 17-09-19 12:26:18, Eric W. Biederman wrote:
> [...]
> > > b) Not being able to bump threads_max to the physical limit of
> > >    the machine is very clearly a regression.
> > 
> > ... exactly this part. The changelog of the respective patch doesn't
> > really exaplain why it is needed except of "it sounds like a good idea
> > to be consistent".
> 
> Any take on this Heinrich? If there really is not strong reasoning about
> the restricting user input then I will suggest reverting 16db3d3f1170
> ("kernel/sysctl.c: threads-max observe limits")

I agree, based on what I'm seeing in this thread.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-19 19:33         ` Eric W. Biederman
@ 2019-09-22  6:58           ` Michal Hocko
  2019-09-22 15:31             ` Heinrich Schuchardt
  2019-09-22 21:24             ` Eric W. Biederman
  0 siblings, 2 replies; 15+ messages in thread
From: Michal Hocko @ 2019-09-22  6:58 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Heinrich Schuchardt, Andrew Morton, LKML

On Thu 19-09-19 14:33:24, Eric W. Biederman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Tue 17-09-19 12:26:18, Eric W. Biederman wrote:
[...]
> >> Michal is it a very small effect your customers are seeing?
> >> Is it another bug somewhere else?
> >
> > I am still trying to get more information. Reportedly they see a
> > different auto tuned limit between two kernel versions which results in
> > an applicaton complaining. As already mentioned this might be a side
> > effect of something else and this is not yet fully analyzed. My main
> > point for bringing up this discussion is ...
> 
> Please this sounds like the kind of issue that will reveal something
> deeper about what is going on.

Yes, I have pushed for that information even before starting discussion
here. I haven't heard from the customer yet unfortunatelly.
 
> >> b) Not being able to bump threads_max to the physical limit of
> >>    the machine is very clearly a regression.
> >
> > ... exactly this part. The changelog of the respective patch doesn't
> > really exaplain why it is needed except of "it sounds like a good idea
> > to be consistent".
> 
> I suggest doing a partial revert to just:
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 7a74ade4e7d6..de8264ea34a7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2943,7 +2943,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
>  	if (ret || !write)
>  		return ret;
>  
> -	set_max_threads(threads);
> +	max_threads = threads;
>  
>  	return 0;
>  }
> 
> proc_dointvec_minmax limiting the values to MIN_THREADS and MAX_THREADS
> is justifiable.  Those are the minimum and maximum values the kernel can
> function with.

MAX_THREADS limit makes perfect sense because crossing it reflects a
real constrain for correctness. MIN_THREADS, on the other hand, is only
the low gate for auto tuning to not come with an unbootable system. I do
not think we should jump into the admin call on the lower bound. There
might be a good reason to restrict the number of threads to 1. So here
is what I would like to go with.

From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Sun, 22 Sep 2019 08:45:28 +0200
Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by
 userspace

Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
because the patch is causing a regression to any workload which needs to
override the auto-tuning of the limit provided by kernel.

set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change. It is also very dubious to override
an admin decision by an estimation that doesn't have any direct relation
to correctness of the kernel operation.

Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.

Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
Cc: stable
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/fork.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 2852d0e76ea3..ef865be37e98 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
 	struct ctl_table t;
 	int ret;
 	int threads = max_threads;
-	int min = MIN_THREADS;
+	int min = 1;
 	int max = MAX_THREADS;
 
 	t = *table;
@@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
 	if (ret || !write)
 		return ret;
 
-	set_max_threads(threads);
+	max_threads = threads;
 
 	return 0;
 }
-- 
2.20.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-22  6:58           ` Michal Hocko
@ 2019-09-22 15:31             ` Heinrich Schuchardt
  2019-09-22 21:40               ` Eric W. Biederman
  2019-09-22 21:24             ` Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Heinrich Schuchardt @ 2019-09-22 15:31 UTC (permalink / raw)
  To: Michal Hocko, Eric W. Biederman; +Cc: Andrew Morton, LKML

On 9/22/19 8:58 AM, Michal Hocko wrote:
> On Thu 19-09-19 14:33:24, Eric W. Biederman wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>>
>>> On Tue 17-09-19 12:26:18, Eric W. Biederman wrote:
> [...]
>>>> Michal is it a very small effect your customers are seeing?
>>>> Is it another bug somewhere else?
>>>
>>> I am still trying to get more information. Reportedly they see a
>>> different auto tuned limit between two kernel versions which results in
>>> an applicaton complaining. As already mentioned this might be a side
>>> effect of something else and this is not yet fully analyzed. My main
>>> point for bringing up this discussion is ...
>>
>> Please this sounds like the kind of issue that will reveal something
>> deeper about what is going on.
>
> Yes, I have pushed for that information even before starting discussion
> here. I haven't heard from the customer yet unfortunatelly.
>
>>>> b) Not being able to bump threads_max to the physical limit of
>>>>    the machine is very clearly a regression.
>>>
>>> ... exactly this part. The changelog of the respective patch doesn't
>>> really exaplain why it is needed except of "it sounds like a good idea
>>> to be consistent".
>>
>> I suggest doing a partial revert to just:
>>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 7a74ade4e7d6..de8264ea34a7 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2943,7 +2943,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
>>  	if (ret || !write)
>>  		return ret;
>>
>> -	set_max_threads(threads);
>> +	max_threads = threads;
>>
>>  	return 0;
>>  }
>>
>> proc_dointvec_minmax limiting the values to MIN_THREADS and MAX_THREADS
>> is justifiable.  Those are the minimum and maximum values the kernel can
>> function with.
>
> MAX_THREADS limit makes perfect sense because crossing it reflects a
> real constrain for correctness. MIN_THREADS, on the other hand, is only
> the low gate for auto tuning to not come with an unbootable system. I do
> not think we should jump into the admin call on the lower bound. There
> might be a good reason to restrict the number of threads to 1. So here
> is what I would like to go with.

Did this patch when applied to the customer's kernel solve any problem?

WebSphere MQ is a messaging application. If it hits the current limits
of threads-max, there is a bug in the software or in the way that it has
been set up at the customer. Instead of messing around with the kernel
the application should be fixed.

With this patch you allow administrators to set values that will crash
their system. And they will not even have a way to find out the limits
which he should adhere to. So expect a lot of systems to be downed this way.

Best regards

Heinrich

>
>>From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Sun, 22 Sep 2019 08:45:28 +0200
> Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by
>  userspace
>
> Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> because the patch is causing a regression to any workload which needs to
> override the auto-tuning of the limit provided by kernel.
>
> set_max_threads is implementing a boot time guesstimate to provide a
> sensible limit of the concurrently running threads so that runaways will
> not deplete all the memory. This is a good thing in general but there
> are workloads which might need to increase this limit for an application
> to run (reportedly WebSpher MQ is affected) and that is simply not
> possible after the mentioned change. It is also very dubious to override
> an admin decision by an estimation that doesn't have any direct relation
> to correctness of the kernel operation.
>
> Fix this by dropping set_max_threads from sysctl_max_threads so any
> value is accepted as long as it fits into MAX_THREADS which is important
> to check because allowing more threads could break internal robust futex
> restriction. While at it, do not use MIN_THREADS as the lower boundary
> because it is also only a heuristic for automatic estimation and admin
> might have a good reason to stop new threads to be created even when
> below this limit.
>
> Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  kernel/fork.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 2852d0e76ea3..ef865be37e98 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
>  	struct ctl_table t;
>  	int ret;
>  	int threads = max_threads;
> -	int min = MIN_THREADS;
> +	int min = 1;
>  	int max = MAX_THREADS;
>
>  	t = *table;
> @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
>  	if (ret || !write)
>  		return ret;
>
> -	set_max_threads(threads);
> +	max_threads = threads;
>
>  	return 0;
>  }
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-22  6:58           ` Michal Hocko
  2019-09-22 15:31             ` Heinrich Schuchardt
@ 2019-09-22 21:24             ` Eric W. Biederman
  2019-09-23  8:08               ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2019-09-22 21:24 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Heinrich Schuchardt, Andrew Morton, LKML

Michal Hocko <mhocko@kernel.org> writes:

> From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Sun, 22 Sep 2019 08:45:28 +0200
> Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by
>  userspace
>
> Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> because the patch is causing a regression to any workload which needs to
> override the auto-tuning of the limit provided by kernel.
>
> set_max_threads is implementing a boot time guesstimate to provide a
> sensible limit of the concurrently running threads so that runaways will
> not deplete all the memory. This is a good thing in general but there
> are workloads which might need to increase this limit for an application
> to run (reportedly WebSpher MQ is affected) and that is simply not
> possible after the mentioned change. It is also very dubious to override
> an admin decision by an estimation that doesn't have any direct relation
> to correctness of the kernel operation.
>
> Fix this by dropping set_max_threads from sysctl_max_threads so any
> value is accepted as long as it fits into MAX_THREADS which is important
> to check because allowing more threads could break internal robust futex
> restriction. While at it, do not use MIN_THREADS as the lower boundary
> because it is also only a heuristic for automatic estimation and admin
> might have a good reason to stop new threads to be created even when
> below this limit.
>
> Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

> ---
>  kernel/fork.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 2852d0e76ea3..ef865be37e98 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
>  	struct ctl_table t;
>  	int ret;
>  	int threads = max_threads;
> -	int min = MIN_THREADS;
> +	int min = 1;
>  	int max = MAX_THREADS;
>  
>  	t = *table;
> @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
>  	if (ret || !write)
>  		return ret;
>  
> -	set_max_threads(threads);
> +	max_threads = threads;
>  
>  	return 0;
>  }
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-22 15:31             ` Heinrich Schuchardt
@ 2019-09-22 21:40               ` Eric W. Biederman
  0 siblings, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2019-09-22 21:40 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: Michal Hocko, Andrew Morton, LKML

Heinrich Schuchardt <xypron.glpk@gmx.de> writes:

> Did this patch when applied to the customer's kernel solve any problem?
>
> WebSphere MQ is a messaging application. If it hits the current limits
> of threads-max, there is a bug in the software or in the way that it has
> been set up at the customer. Instead of messing around with the kernel
> the application should be fixed.

While it is true that almost every workload will be buggy if it exceeds
1/8 of memory with just the kernel data structures for threads.  It is
not necessary true of every application.  I can easily imagine cases
up around 1/2 of memory where things could work reasonably.

Further we can exhaust all of memory much more simply in a default
configuration by malloc'ing more memory that in physically present
and zeroing it all.

Henrich, you were the one messed with the kernel by breaking a
reasonable kernel tunable.  AKA you caused a regression.  That violates
the no regression rule.

As much as possible we fix regressions so software that used to work
continues to work.  Removing footguns is not a reason to introduce a
regression.

I do agree that Michal's customer's problem sounds like it is something
else but if the kernel did not have a regression we could focus on the
real problem instead of being side tracked by the regression.

> With this patch you allow administrators to set values that will crash
> their system. And they will not even have a way to find out the limits
> which he should adhere to. So expect a lot of systems to be downed
> this way.

Nope.  The system administrator just setting a higher value whon't crash
their system.  Only using that many resources would crash the system.

Nor is a sysctl like this for discovering the physical limits of a
machine.  Which the current value is vastly inappropriate for.  As
the physical limits of many machines are much higher than 1/8 of memory.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-22 21:24             ` Eric W. Biederman
@ 2019-09-23  8:08               ` Michal Hocko
  2019-09-23 21:23                 ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-09-23  8:08 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman; +Cc: Heinrich Schuchardt, LKML

Andrew, do you want me to send the patch or you can grab it from here?

On Sun 22-09-19 16:24:10, Eric W. Biederman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Sun, 22 Sep 2019 08:45:28 +0200
> > Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by
> >  userspace
> >
> > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> > because the patch is causing a regression to any workload which needs to
> > override the auto-tuning of the limit provided by kernel.
> >
> > set_max_threads is implementing a boot time guesstimate to provide a
> > sensible limit of the concurrently running threads so that runaways will
> > not deplete all the memory. This is a good thing in general but there
> > are workloads which might need to increase this limit for an application
> > to run (reportedly WebSpher MQ is affected) and that is simply not
> > possible after the mentioned change. It is also very dubious to override
> > an admin decision by an estimation that doesn't have any direct relation
> > to correctness of the kernel operation.
> >
> > Fix this by dropping set_max_threads from sysctl_max_threads so any
> > value is accepted as long as it fits into MAX_THREADS which is important
> > to check because allowing more threads could break internal robust futex
> > restriction. While at it, do not use MIN_THREADS as the lower boundary
> > because it is also only a heuristic for automatic estimation and admin
> > might have a good reason to stop new threads to be created even when
> > below this limit.
> >
> > Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> > Cc: stable
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

Thanks Eric.

> > ---
> >  kernel/fork.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 2852d0e76ea3..ef865be37e98 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
> >  	struct ctl_table t;
> >  	int ret;
> >  	int threads = max_threads;
> > -	int min = MIN_THREADS;
> > +	int min = 1;
> >  	int max = MAX_THREADS;
> >  
> >  	t = *table;
> > @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
> >  	if (ret || !write)
> >  		return ret;
> >  
> > -	set_max_threads(threads);
> > +	max_threads = threads;
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.20.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-23  8:08               ` Michal Hocko
@ 2019-09-23 21:23                 ` Eric W. Biederman
  2019-09-24  8:48                   ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2019-09-23 21:23 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Heinrich Schuchardt, LKML


Michal,

Thinking about this I have a hunch about what changed.  I think at some
point we changed from 4k to 8k kernel stacks.  So I suspect if your
client is seeing a lower threads-max it is because the size of the
kernel data structures increased.

Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threads-max observe limits
  2019-09-23 21:23                 ` Eric W. Biederman
@ 2019-09-24  8:48                   ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2019-09-24  8:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, Heinrich Schuchardt, LKML

On Mon 23-09-19 16:23:40, Eric W. Biederman wrote:
> 
> Michal,
> 
> Thinking about this I have a hunch about what changed.  I think at some
> point we changed from 4k to 8k kernel stacks.  So I suspect if your
> client is seeing a lower threads-max it is because the size of the
> kernel data structures increased.

This is indeed the case. Starting since 6538b8ea886e ("x86_64: expand
kernel stack to 16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that
halved the auto-tuned value.

In the particular case
3.12
kernel.threads-max = 515561

4.4
kernel.threads-max = 200000

Neither of the two values is really insane on 32GB machine. 

I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.

Thanks
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-09-24  8:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17 10:03 threads-max observe limits Michal Hocko
2019-09-17 15:28 ` Heinrich Schuchardt
2019-09-17 15:38   ` Michal Hocko
2019-09-17 17:26     ` Eric W. Biederman
2019-09-18  7:15       ` Michal Hocko
2019-09-19  7:59         ` Michal Hocko
2019-09-19 19:38           ` Andrew Morton
2019-09-19 19:33         ` Eric W. Biederman
2019-09-22  6:58           ` Michal Hocko
2019-09-22 15:31             ` Heinrich Schuchardt
2019-09-22 21:40               ` Eric W. Biederman
2019-09-22 21:24             ` Eric W. Biederman
2019-09-23  8:08               ` Michal Hocko
2019-09-23 21:23                 ` Eric W. Biederman
2019-09-24  8:48                   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).