* threads-max observe limits @ 2019-09-17 10:03 Michal Hocko 2019-09-17 15:28 ` Heinrich Schuchardt 0 siblings, 1 reply; 15+ messages in thread From: Michal Hocko @ 2019-09-17 10:03 UTC (permalink / raw) To: Heinrich Schuchardt, Andrew Morton; +Cc: LKML Hi, I have just stmbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") and I am really wondering what is the motivation behind the patch. We've had a customer noticing the threads_max autoscaling differences btween 3.12 and 4.4 kernels and wanted to override the auto tuning from the userspace, just to find out that this is not possible. Why do we override user admin like that? I find it quite dubious to be honest. Especially when the auto-tunning is just a very rough estimation and it seems quite arbitrary. Thanks -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-17 10:03 threads-max observe limits Michal Hocko @ 2019-09-17 15:28 ` Heinrich Schuchardt 2019-09-17 15:38 ` Michal Hocko 0 siblings, 1 reply; 15+ messages in thread From: Heinrich Schuchardt @ 2019-09-17 15:28 UTC (permalink / raw) To: Michal Hocko, Andrew Morton; +Cc: LKML On 9/17/19 12:03 PM, Michal Hocko wrote: > Hi, > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max > observe limits") and I am really wondering what is the motivation behind > the patch. We've had a customer noticing the threads_max autoscaling > differences btween 3.12 and 4.4 kernels and wanted to override the auto > tuning from the userspace, just to find out that this is not possible. set_max_threads() sets the upper limit (max_threads_suggested) for threads such that at a maximum 1/8th of the total memory can be occupied by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB system this results in 254313 threads. With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") a user cannot set an arbitrarily high number for /proc/sys/kernel/threads-max which could lead to a system stalling because the thread headers occupy all the memory. When developing the patch I remarked that on a system where memory is installed dynamically it might be a good idea to recalculate this limit. If you have a system that boots with let's say 8 GiB and than dynamically installs a few TiB of RAM this might make sense. But such a dynamic update of thread_max_suggested was left out for the sake of simplicity. Anyway if more than 100,000 threads are used on a system, I would wonder if the software should not be changed to use thread-pools instead. Best regards Heinrich > > Why do we override user admin like that? I find it quite dubious to be > honest. Especially when the auto-tunning is just a very rough estimation > and it seems quite arbitrary. > > Thanks > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-17 15:28 ` Heinrich Schuchardt @ 2019-09-17 15:38 ` Michal Hocko 2019-09-17 17:26 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Michal Hocko @ 2019-09-17 15:38 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: Andrew Morton, LKML On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote: > > On 9/17/19 12:03 PM, Michal Hocko wrote: > > Hi, > > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max > > observe limits") and I am really wondering what is the motivation behind > > the patch. We've had a customer noticing the threads_max autoscaling > > differences btween 3.12 and 4.4 kernels and wanted to override the auto > > tuning from the userspace, just to find out that this is not possible. > > set_max_threads() sets the upper limit (max_threads_suggested) for > threads such that at a maximum 1/8th of the total memory can be occupied > by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB > system this results in 254313 threads. This is quite arbitrary, isn't it? What would happen if the limit was twice as large? > With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > a user cannot set an arbitrarily high number for > /proc/sys/kernel/threads-max which could lead to a system stalling > because the thread headers occupy all the memory. This is still a decision of the admin to make. You can consume the memory by other means and that is why we have measures in place. E.g. memcg accounting. > When developing the patch I remarked that on a system where memory is > installed dynamically it might be a good idea to recalculate this limit. > If you have a system that boots with let's say 8 GiB and than > dynamically installs a few TiB of RAM this might make sense. But such a > dynamic update of thread_max_suggested was left out for the sake of > simplicity. > > Anyway if more than 100,000 threads are used on a system, I would wonder > if the software should not be changed to use thread-pools instead. You do not change the software to overcome artificial bounds based on guessing. So can we get back to the justification of the patch. What kind of real life problem does it solve and why is it ok to override an admin decision? If there is no strong justification then the patch should be reverted because from what I have heard it has been noticed and it has broken a certain deployment. I am not really clear about technical details yet but it seems that there are workloads that believe they need to touch this tuning and complain if that is not possible. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-17 15:38 ` Michal Hocko @ 2019-09-17 17:26 ` Eric W. Biederman 2019-09-18 7:15 ` Michal Hocko 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2019-09-17 17:26 UTC (permalink / raw) To: Michal Hocko; +Cc: Heinrich Schuchardt, Andrew Morton, LKML Michal Hocko <mhocko@kernel.org> writes: > On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote: >> >> On 9/17/19 12:03 PM, Michal Hocko wrote: >> > Hi, >> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max >> > observe limits") and I am really wondering what is the motivation behind >> > the patch. We've had a customer noticing the threads_max autoscaling >> > differences btween 3.12 and 4.4 kernels and wanted to override the auto >> > tuning from the userspace, just to find out that this is not possible. >> >> set_max_threads() sets the upper limit (max_threads_suggested) for >> threads such that at a maximum 1/8th of the total memory can be occupied >> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB >> system this results in 254313 threads. > > This is quite arbitrary, isn't it? What would happen if the limit was > twice as large? > >> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") >> a user cannot set an arbitrarily high number for >> /proc/sys/kernel/threads-max which could lead to a system stalling >> because the thread headers occupy all the memory. > > This is still a decision of the admin to make. You can consume the > memory by other means and that is why we have measures in place. E.g. > memcg accounting. > >> When developing the patch I remarked that on a system where memory is >> installed dynamically it might be a good idea to recalculate this limit. >> If you have a system that boots with let's say 8 GiB and than >> dynamically installs a few TiB of RAM this might make sense. But such a >> dynamic update of thread_max_suggested was left out for the sake of >> simplicity. >> >> Anyway if more than 100,000 threads are used on a system, I would wonder >> if the software should not be changed to use thread-pools instead. > > You do not change the software to overcome artificial bounds based on > guessing. > > So can we get back to the justification of the patch. What kind of > real life problem does it solve and why is it ok to override an admin > decision? > If there is no strong justification then the patch should be reverted > because from what I have heard it has been noticed and it has broken > a certain deployment. I am not really clear about technical details yet > but it seems that there are workloads that believe they need to touch > this tuning and complain if that is not possible. Taking a quick look myself. I am completely mystified by both sides of this conversation. a) The logic to set the default number of threads in a system has not changed since 2.6.12-rc2 (the start of the git history). The implementation has changed but we should still get the same value. So anyone seeing threads_max autoscaling differences between kernels is either seeing a bug in the rewritten formula or something else weird is going on. Michal is it a very small effect your customers are seeing? Is it another bug somewhere else? b) Not being able to bump threads_max to the physical limit of the machine is very clearly a regression. Limiting threads_max to THREADS_MIN on the low end and THREAD_MAX on the high end is reasonable, because linux can't cope with values outside of that range. Limiting threads_max to the auto-scaling value is a regression. The point of limits like threads_max is to have something that 99% of people won't hit and if they do it will indicate a bug in their application. And to generally keep the kernel working when an application bug happens. But there are always cases where heuristics fail so it is completely reasonable to allow these values to be manually tuned. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-17 17:26 ` Eric W. Biederman @ 2019-09-18 7:15 ` Michal Hocko 2019-09-19 7:59 ` Michal Hocko 2019-09-19 19:33 ` Eric W. Biederman 0 siblings, 2 replies; 15+ messages in thread From: Michal Hocko @ 2019-09-18 7:15 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Heinrich Schuchardt, Andrew Morton, LKML On Tue 17-09-19 12:26:18, Eric W. Biederman wrote: > Michal Hocko <mhocko@kernel.org> writes: > > > On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote: > >> > >> On 9/17/19 12:03 PM, Michal Hocko wrote: > >> > Hi, > >> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max > >> > observe limits") and I am really wondering what is the motivation behind > >> > the patch. We've had a customer noticing the threads_max autoscaling > >> > differences btween 3.12 and 4.4 kernels and wanted to override the auto > >> > tuning from the userspace, just to find out that this is not possible. > >> > >> set_max_threads() sets the upper limit (max_threads_suggested) for > >> threads such that at a maximum 1/8th of the total memory can be occupied > >> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB > >> system this results in 254313 threads. > > > > This is quite arbitrary, isn't it? What would happen if the limit was > > twice as large? > > > >> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > >> a user cannot set an arbitrarily high number for > >> /proc/sys/kernel/threads-max which could lead to a system stalling > >> because the thread headers occupy all the memory. > > > > This is still a decision of the admin to make. You can consume the > > memory by other means and that is why we have measures in place. E.g. > > memcg accounting. > > > >> When developing the patch I remarked that on a system where memory is > >> installed dynamically it might be a good idea to recalculate this limit. > >> If you have a system that boots with let's say 8 GiB and than > >> dynamically installs a few TiB of RAM this might make sense. But such a > >> dynamic update of thread_max_suggested was left out for the sake of > >> simplicity. > >> > >> Anyway if more than 100,000 threads are used on a system, I would wonder > >> if the software should not be changed to use thread-pools instead. > > > > You do not change the software to overcome artificial bounds based on > > guessing. > > > > So can we get back to the justification of the patch. What kind of > > real life problem does it solve and why is it ok to override an admin > > decision? > > If there is no strong justification then the patch should be reverted > > because from what I have heard it has been noticed and it has broken > > a certain deployment. I am not really clear about technical details yet > > but it seems that there are workloads that believe they need to touch > > this tuning and complain if that is not possible. > > Taking a quick look myself. > > I am completely mystified by both sides of this conversation. > > a) The logic to set the default number of threads in a system > has not changed since 2.6.12-rc2 (the start of the git history). > > The implementation has changed but we should still get the same > value. So anyone seeing threads_max autoscaling differences > between kernels is either seeing a bug in the rewritten formula > or something else weird is going on. > > Michal is it a very small effect your customers are seeing? > Is it another bug somewhere else? I am still trying to get more information. Reportedly they see a different auto tuned limit between two kernel versions which results in an applicaton complaining. As already mentioned this might be a side effect of something else and this is not yet fully analyzed. My main point for bringing up this discussion is ... > b) Not being able to bump threads_max to the physical limit of > the machine is very clearly a regression. ... exactly this part. The changelog of the respective patch doesn't really exaplain why it is needed except of "it sounds like a good idea to be consistent". -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-18 7:15 ` Michal Hocko @ 2019-09-19 7:59 ` Michal Hocko 2019-09-19 19:38 ` Andrew Morton 2019-09-19 19:33 ` Eric W. Biederman 1 sibling, 1 reply; 15+ messages in thread From: Michal Hocko @ 2019-09-19 7:59 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: Eric W. Biederman, Andrew Morton, LKML On Wed 18-09-19 09:15:41, Michal Hocko wrote: > On Tue 17-09-19 12:26:18, Eric W. Biederman wrote: [...] > > b) Not being able to bump threads_max to the physical limit of > > the machine is very clearly a regression. > > ... exactly this part. The changelog of the respective patch doesn't > really exaplain why it is needed except of "it sounds like a good idea > to be consistent". Any take on this Heinrich? If there really is not strong reasoning about the restricting user input then I will suggest reverting 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-19 7:59 ` Michal Hocko @ 2019-09-19 19:38 ` Andrew Morton 0 siblings, 0 replies; 15+ messages in thread From: Andrew Morton @ 2019-09-19 19:38 UTC (permalink / raw) To: Michal Hocko; +Cc: Heinrich Schuchardt, Eric W. Biederman, LKML On Thu, 19 Sep 2019 09:59:11 +0200 Michal Hocko <mhocko@kernel.org> wrote: > On Wed 18-09-19 09:15:41, Michal Hocko wrote: > > On Tue 17-09-19 12:26:18, Eric W. Biederman wrote: > [...] > > > b) Not being able to bump threads_max to the physical limit of > > > the machine is very clearly a regression. > > > > ... exactly this part. The changelog of the respective patch doesn't > > really exaplain why it is needed except of "it sounds like a good idea > > to be consistent". > > Any take on this Heinrich? If there really is not strong reasoning about > the restricting user input then I will suggest reverting 16db3d3f1170 > ("kernel/sysctl.c: threads-max observe limits") I agree, based on what I'm seeing in this thread. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-18 7:15 ` Michal Hocko 2019-09-19 7:59 ` Michal Hocko @ 2019-09-19 19:33 ` Eric W. Biederman 2019-09-22 6:58 ` Michal Hocko 1 sibling, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2019-09-19 19:33 UTC (permalink / raw) To: Michal Hocko; +Cc: Heinrich Schuchardt, Andrew Morton, LKML Michal Hocko <mhocko@kernel.org> writes: > On Tue 17-09-19 12:26:18, Eric W. Biederman wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >> > On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote: >> >> >> >> On 9/17/19 12:03 PM, Michal Hocko wrote: >> >> > Hi, >> >> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max >> >> > observe limits") and I am really wondering what is the motivation behind >> >> > the patch. We've had a customer noticing the threads_max autoscaling >> >> > differences btween 3.12 and 4.4 kernels and wanted to override the auto >> >> > tuning from the userspace, just to find out that this is not possible. >> >> >> >> set_max_threads() sets the upper limit (max_threads_suggested) for >> >> threads such that at a maximum 1/8th of the total memory can be occupied >> >> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB >> >> system this results in 254313 threads. >> > >> > This is quite arbitrary, isn't it? What would happen if the limit was >> > twice as large? >> > >> >> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") >> >> a user cannot set an arbitrarily high number for >> >> /proc/sys/kernel/threads-max which could lead to a system stalling >> >> because the thread headers occupy all the memory. >> > >> > This is still a decision of the admin to make. You can consume the >> > memory by other means and that is why we have measures in place. E.g. >> > memcg accounting. >> > >> >> When developing the patch I remarked that on a system where memory is >> >> installed dynamically it might be a good idea to recalculate this limit. >> >> If you have a system that boots with let's say 8 GiB and than >> >> dynamically installs a few TiB of RAM this might make sense. But such a >> >> dynamic update of thread_max_suggested was left out for the sake of >> >> simplicity. >> >> >> >> Anyway if more than 100,000 threads are used on a system, I would wonder >> >> if the software should not be changed to use thread-pools instead. >> > >> > You do not change the software to overcome artificial bounds based on >> > guessing. >> > >> > So can we get back to the justification of the patch. What kind of >> > real life problem does it solve and why is it ok to override an admin >> > decision? >> > If there is no strong justification then the patch should be reverted >> > because from what I have heard it has been noticed and it has broken >> > a certain deployment. I am not really clear about technical details yet >> > but it seems that there are workloads that believe they need to touch >> > this tuning and complain if that is not possible. >> >> Taking a quick look myself. >> >> I am completely mystified by both sides of this conversation. >> >> a) The logic to set the default number of threads in a system >> has not changed since 2.6.12-rc2 (the start of the git history). >> >> The implementation has changed but we should still get the same >> value. So anyone seeing threads_max autoscaling differences >> between kernels is either seeing a bug in the rewritten formula >> or something else weird is going on. >> >> Michal is it a very small effect your customers are seeing? >> Is it another bug somewhere else? > > I am still trying to get more information. Reportedly they see a > different auto tuned limit between two kernel versions which results in > an applicaton complaining. As already mentioned this might be a side > effect of something else and this is not yet fully analyzed. My main > point for bringing up this discussion is ... Please this sounds like the kind of issue that will reveal something deeper about what is going on. > >> b) Not being able to bump threads_max to the physical limit of >> the machine is very clearly a regression. > > ... exactly this part. The changelog of the respective patch doesn't > really exaplain why it is needed except of "it sounds like a good idea > to be consistent". I suggest doing a partial revert to just: diff --git a/kernel/fork.c b/kernel/fork.c index 7a74ade4e7d6..de8264ea34a7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2943,7 +2943,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, if (ret || !write) return ret; - set_max_threads(threads); + max_threads = threads; return 0; } proc_dointvec_minmax limiting the values to MIN_THREADS and MAX_THREADS is justifiable. Those are the minimum and maximum values the kernel can function with. With a good changelog we should be able to backport that change without any fear. Eric ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-19 19:33 ` Eric W. Biederman @ 2019-09-22 6:58 ` Michal Hocko 2019-09-22 15:31 ` Heinrich Schuchardt 2019-09-22 21:24 ` Eric W. Biederman 0 siblings, 2 replies; 15+ messages in thread From: Michal Hocko @ 2019-09-22 6:58 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Heinrich Schuchardt, Andrew Morton, LKML On Thu 19-09-19 14:33:24, Eric W. Biederman wrote: > Michal Hocko <mhocko@kernel.org> writes: > > > On Tue 17-09-19 12:26:18, Eric W. Biederman wrote: [...] > >> Michal is it a very small effect your customers are seeing? > >> Is it another bug somewhere else? > > > > I am still trying to get more information. Reportedly they see a > > different auto tuned limit between two kernel versions which results in > > an applicaton complaining. As already mentioned this might be a side > > effect of something else and this is not yet fully analyzed. My main > > point for bringing up this discussion is ... > > Please this sounds like the kind of issue that will reveal something > deeper about what is going on. Yes, I have pushed for that information even before starting discussion here. I haven't heard from the customer yet unfortunatelly. > >> b) Not being able to bump threads_max to the physical limit of > >> the machine is very clearly a regression. > > > > ... exactly this part. The changelog of the respective patch doesn't > > really exaplain why it is needed except of "it sounds like a good idea > > to be consistent". > > I suggest doing a partial revert to just: > > diff --git a/kernel/fork.c b/kernel/fork.c > index 7a74ade4e7d6..de8264ea34a7 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2943,7 +2943,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > if (ret || !write) > return ret; > > - set_max_threads(threads); > + max_threads = threads; > > return 0; > } > > proc_dointvec_minmax limiting the values to MIN_THREADS and MAX_THREADS > is justifiable. Those are the minimum and maximum values the kernel can > function with. MAX_THREADS limit makes perfect sense because crossing it reflects a real constrain for correctness. MIN_THREADS, on the other hand, is only the low gate for auto tuning to not come with an unbootable system. I do not think we should jump into the admin call on the lower bound. There might be a good reason to restrict the number of threads to 1. So here is what I would like to go with. From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Sun, 22 Sep 2019 08:45:28 +0200 Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by userspace Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") because the patch is causing a regression to any workload which needs to override the auto-tuning of the limit provided by kernel. set_max_threads is implementing a boot time guesstimate to provide a sensible limit of the concurrently running threads so that runaways will not deplete all the memory. This is a good thing in general but there are workloads which might need to increase this limit for an application to run (reportedly WebSpher MQ is affected) and that is simply not possible after the mentioned change. It is also very dubious to override an admin decision by an estimation that doesn't have any direct relation to correctness of the kernel operation. Fix this by dropping set_max_threads from sysctl_max_threads so any value is accepted as long as it fits into MAX_THREADS which is important to check because allowing more threads could break internal robust futex restriction. While at it, do not use MIN_THREADS as the lower boundary because it is also only a heuristic for automatic estimation and admin might have a good reason to stop new threads to be created even when below this limit. Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") Cc: stable Signed-off-by: Michal Hocko <mhocko@suse.com> --- kernel/fork.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 2852d0e76ea3..ef865be37e98 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, struct ctl_table t; int ret; int threads = max_threads; - int min = MIN_THREADS; + int min = 1; int max = MAX_THREADS; t = *table; @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, if (ret || !write) return ret; - set_max_threads(threads); + max_threads = threads; return 0; } -- 2.20.1 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-22 6:58 ` Michal Hocko @ 2019-09-22 15:31 ` Heinrich Schuchardt 2019-09-22 21:40 ` Eric W. Biederman 2019-09-22 21:24 ` Eric W. Biederman 1 sibling, 1 reply; 15+ messages in thread From: Heinrich Schuchardt @ 2019-09-22 15:31 UTC (permalink / raw) To: Michal Hocko, Eric W. Biederman; +Cc: Andrew Morton, LKML On 9/22/19 8:58 AM, Michal Hocko wrote: > On Thu 19-09-19 14:33:24, Eric W. Biederman wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >>> On Tue 17-09-19 12:26:18, Eric W. Biederman wrote: > [...] >>>> Michal is it a very small effect your customers are seeing? >>>> Is it another bug somewhere else? >>> >>> I am still trying to get more information. Reportedly they see a >>> different auto tuned limit between two kernel versions which results in >>> an applicaton complaining. As already mentioned this might be a side >>> effect of something else and this is not yet fully analyzed. My main >>> point for bringing up this discussion is ... >> >> Please this sounds like the kind of issue that will reveal something >> deeper about what is going on. > > Yes, I have pushed for that information even before starting discussion > here. I haven't heard from the customer yet unfortunatelly. > >>>> b) Not being able to bump threads_max to the physical limit of >>>> the machine is very clearly a regression. >>> >>> ... exactly this part. The changelog of the respective patch doesn't >>> really exaplain why it is needed except of "it sounds like a good idea >>> to be consistent". >> >> I suggest doing a partial revert to just: >> >> diff --git a/kernel/fork.c b/kernel/fork.c >> index 7a74ade4e7d6..de8264ea34a7 100644 >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -2943,7 +2943,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, >> if (ret || !write) >> return ret; >> >> - set_max_threads(threads); >> + max_threads = threads; >> >> return 0; >> } >> >> proc_dointvec_minmax limiting the values to MIN_THREADS and MAX_THREADS >> is justifiable. Those are the minimum and maximum values the kernel can >> function with. > > MAX_THREADS limit makes perfect sense because crossing it reflects a > real constrain for correctness. MIN_THREADS, on the other hand, is only > the low gate for auto tuning to not come with an unbootable system. I do > not think we should jump into the admin call on the lower bound. There > might be a good reason to restrict the number of threads to 1. So here > is what I would like to go with. Did this patch when applied to the customer's kernel solve any problem? WebSphere MQ is a messaging application. If it hits the current limits of threads-max, there is a bug in the software or in the way that it has been set up at the customer. Instead of messing around with the kernel the application should be fixed. With this patch you allow administrators to set values that will crash their system. And they will not even have a way to find out the limits which he should adhere to. So expect a lot of systems to be downed this way. Best regards Heinrich > >>From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Sun, 22 Sep 2019 08:45:28 +0200 > Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by > userspace > > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > because the patch is causing a regression to any workload which needs to > override the auto-tuning of the limit provided by kernel. > > set_max_threads is implementing a boot time guesstimate to provide a > sensible limit of the concurrently running threads so that runaways will > not deplete all the memory. This is a good thing in general but there > are workloads which might need to increase this limit for an application > to run (reportedly WebSpher MQ is affected) and that is simply not > possible after the mentioned change. It is also very dubious to override > an admin decision by an estimation that doesn't have any direct relation > to correctness of the kernel operation. > > Fix this by dropping set_max_threads from sysctl_max_threads so any > value is accepted as long as it fits into MAX_THREADS which is important > to check because allowing more threads could break internal robust futex > restriction. While at it, do not use MIN_THREADS as the lower boundary > because it is also only a heuristic for automatic estimation and admin > might have a good reason to stop new threads to be created even when > below this limit. > > Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > Cc: stable > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > kernel/fork.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 2852d0e76ea3..ef865be37e98 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > struct ctl_table t; > int ret; > int threads = max_threads; > - int min = MIN_THREADS; > + int min = 1; > int max = MAX_THREADS; > > t = *table; > @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > if (ret || !write) > return ret; > > - set_max_threads(threads); > + max_threads = threads; > > return 0; > } > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-22 15:31 ` Heinrich Schuchardt @ 2019-09-22 21:40 ` Eric W. Biederman 0 siblings, 0 replies; 15+ messages in thread From: Eric W. Biederman @ 2019-09-22 21:40 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: Michal Hocko, Andrew Morton, LKML Heinrich Schuchardt <xypron.glpk@gmx.de> writes: > Did this patch when applied to the customer's kernel solve any problem? > > WebSphere MQ is a messaging application. If it hits the current limits > of threads-max, there is a bug in the software or in the way that it has > been set up at the customer. Instead of messing around with the kernel > the application should be fixed. While it is true that almost every workload will be buggy if it exceeds 1/8 of memory with just the kernel data structures for threads. It is not necessary true of every application. I can easily imagine cases up around 1/2 of memory where things could work reasonably. Further we can exhaust all of memory much more simply in a default configuration by malloc'ing more memory that in physically present and zeroing it all. Henrich, you were the one messed with the kernel by breaking a reasonable kernel tunable. AKA you caused a regression. That violates the no regression rule. As much as possible we fix regressions so software that used to work continues to work. Removing footguns is not a reason to introduce a regression. I do agree that Michal's customer's problem sounds like it is something else but if the kernel did not have a regression we could focus on the real problem instead of being side tracked by the regression. > With this patch you allow administrators to set values that will crash > their system. And they will not even have a way to find out the limits > which he should adhere to. So expect a lot of systems to be downed > this way. Nope. The system administrator just setting a higher value whon't crash their system. Only using that many resources would crash the system. Nor is a sysctl like this for discovering the physical limits of a machine. Which the current value is vastly inappropriate for. As the physical limits of many machines are much higher than 1/8 of memory. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-22 6:58 ` Michal Hocko 2019-09-22 15:31 ` Heinrich Schuchardt @ 2019-09-22 21:24 ` Eric W. Biederman 2019-09-23 8:08 ` Michal Hocko 1 sibling, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2019-09-22 21:24 UTC (permalink / raw) To: Michal Hocko; +Cc: Heinrich Schuchardt, Andrew Morton, LKML Michal Hocko <mhocko@kernel.org> writes: > From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Sun, 22 Sep 2019 08:45:28 +0200 > Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by > userspace > > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > because the patch is causing a regression to any workload which needs to > override the auto-tuning of the limit provided by kernel. > > set_max_threads is implementing a boot time guesstimate to provide a > sensible limit of the concurrently running threads so that runaways will > not deplete all the memory. This is a good thing in general but there > are workloads which might need to increase this limit for an application > to run (reportedly WebSpher MQ is affected) and that is simply not > possible after the mentioned change. It is also very dubious to override > an admin decision by an estimation that doesn't have any direct relation > to correctness of the kernel operation. > > Fix this by dropping set_max_threads from sysctl_max_threads so any > value is accepted as long as it fits into MAX_THREADS which is important > to check because allowing more threads could break internal robust futex > restriction. While at it, do not use MIN_THREADS as the lower boundary > because it is also only a heuristic for automatic estimation and admin > might have a good reason to stop new threads to be created even when > below this limit. > > Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > Cc: stable > Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > kernel/fork.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 2852d0e76ea3..ef865be37e98 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > struct ctl_table t; > int ret; > int threads = max_threads; > - int min = MIN_THREADS; > + int min = 1; > int max = MAX_THREADS; > > t = *table; > @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > if (ret || !write) > return ret; > > - set_max_threads(threads); > + max_threads = threads; > > return 0; > } > -- > 2.20.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-22 21:24 ` Eric W. Biederman @ 2019-09-23 8:08 ` Michal Hocko 2019-09-23 21:23 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Michal Hocko @ 2019-09-23 8:08 UTC (permalink / raw) To: Andrew Morton, Eric W. Biederman; +Cc: Heinrich Schuchardt, LKML Andrew, do you want me to send the patch or you can grab it from here? On Sun 22-09-19 16:24:10, Eric W. Biederman wrote: > Michal Hocko <mhocko@kernel.org> writes: > > > From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.com> > > Date: Sun, 22 Sep 2019 08:45:28 +0200 > > Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by > > userspace > > > > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > > because the patch is causing a regression to any workload which needs to > > override the auto-tuning of the limit provided by kernel. > > > > set_max_threads is implementing a boot time guesstimate to provide a > > sensible limit of the concurrently running threads so that runaways will > > not deplete all the memory. This is a good thing in general but there > > are workloads which might need to increase this limit for an application > > to run (reportedly WebSpher MQ is affected) and that is simply not > > possible after the mentioned change. It is also very dubious to override > > an admin decision by an estimation that doesn't have any direct relation > > to correctness of the kernel operation. > > > > Fix this by dropping set_max_threads from sysctl_max_threads so any > > value is accepted as long as it fits into MAX_THREADS which is important > > to check because allowing more threads could break internal robust futex > > restriction. While at it, do not use MIN_THREADS as the lower boundary > > because it is also only a heuristic for automatic estimation and admin > > might have a good reason to stop new threads to be created even when > > below this limit. > > > > Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > > Cc: stable > > Signed-off-by: Michal Hocko <mhocko@suse.com> > Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Thanks Eric. > > --- > > kernel/fork.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 2852d0e76ea3..ef865be37e98 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > > struct ctl_table t; > > int ret; > > int threads = max_threads; > > - int min = MIN_THREADS; > > + int min = 1; > > int max = MAX_THREADS; > > > > t = *table; > > @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int write, > > if (ret || !write) > > return ret; > > > > - set_max_threads(threads); > > + max_threads = threads; > > > > return 0; > > } > > -- > > 2.20.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-23 8:08 ` Michal Hocko @ 2019-09-23 21:23 ` Eric W. Biederman 2019-09-24 8:48 ` Michal Hocko 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2019-09-23 21:23 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Heinrich Schuchardt, LKML Michal, Thinking about this I have a hunch about what changed. I think at some point we changed from 4k to 8k kernel stacks. So I suspect if your client is seeing a lower threads-max it is because the size of the kernel data structures increased. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: threads-max observe limits 2019-09-23 21:23 ` Eric W. Biederman @ 2019-09-24 8:48 ` Michal Hocko 0 siblings, 0 replies; 15+ messages in thread From: Michal Hocko @ 2019-09-24 8:48 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Andrew Morton, Heinrich Schuchardt, LKML On Mon 23-09-19 16:23:40, Eric W. Biederman wrote: > > Michal, > > Thinking about this I have a hunch about what changed. I think at some > point we changed from 4k to 8k kernel stacks. So I suspect if your > client is seeing a lower threads-max it is because the size of the > kernel data structures increased. This is indeed the case. Starting since 6538b8ea886e ("x86_64: expand kernel stack to 16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned value. In the particular case 3.12 kernel.threads-max = 515561 4.4 kernel.threads-max = 200000 Neither of the two values is really insane on 32GB machine. I am not sure we want/need to tune the max_thread value further. If anything the tuning should be removed altogether if proven not useful in general. But we definitely need a way to override this auto-tuning. Thanks -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-09-24 8:48 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-17 10:03 threads-max observe limits Michal Hocko 2019-09-17 15:28 ` Heinrich Schuchardt 2019-09-17 15:38 ` Michal Hocko 2019-09-17 17:26 ` Eric W. Biederman 2019-09-18 7:15 ` Michal Hocko 2019-09-19 7:59 ` Michal Hocko 2019-09-19 19:38 ` Andrew Morton 2019-09-19 19:33 ` Eric W. Biederman 2019-09-22 6:58 ` Michal Hocko 2019-09-22 15:31 ` Heinrich Schuchardt 2019-09-22 21:40 ` Eric W. Biederman 2019-09-22 21:24 ` Eric W. Biederman 2019-09-23 8:08 ` Michal Hocko 2019-09-23 21:23 ` Eric W. Biederman 2019-09-24 8:48 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).