linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Topi Miettinen <toiwoton@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 00/18] Present useful limits to user
Date: Mon, 20 Jun 2016 13:37:22 -0400	[thread overview]
Message-ID: <d026123c-76a2-95ad-8b32-ffec2f098e61@gmail.com> (raw)
In-Reply-To: <CALYGNiPTAkAAXRh1WK2A=NeD9b6Tjo1Ov4H=ih=qWeUeHuZ=+g@mail.gmail.com>

On 2016-06-18 10:45, Konstantin Khlebnikov wrote:
> On Wed, Jun 15, 2016 at 5:47 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-06-14 15:03, Konstantin Khlebnikov wrote:
>>>
>>> I don't like the idea of this patchset.
>>>
>>> All limitations are context dependent and that context changes rapidly.
>>> You'll never dump enough information for predicting future errors or
>>> investigating reson of errors in past. You could try to reproduce all
>>> kernel logic but model always will be aproximate.
>>
>> It's still better than what we have now, and there is one particular use for
>> the cgroup stuff that I find intriguing, you can create a cgroup, populate
>> it, set no limits, and then run a simulated workload against it and see how
>> it reacts.  This in general will probably provide a better starting point
>> for what to actually set the limits to than just making an arbitrary guess.
>> Certain applications in particular come to mind which will just hang when
>> they can't start a new thread or process (Dropbox is particularly guilty of
>> this).  In such cases, setting the limit too low doesn't result in a crash,
>> it results in the program just not appearing to work yet still running
>> otherwise normally.
>>
>> In general, I could see the rlimit stuff being in the same situation, it's
>> not for figuring out why something failed (good software will tell you
>> somewhere), but figuring out limits so it doesn't fail but still is
>> reasonably contained.  A lot of things that seem at face value like they
>> shouldn't need specific exceptions to limits do.  Most normal users probably
>> wouldn't guess that acpid needs a RLIMIT_NPROC count of at least 4 or more
>> to work with the default rules.  Similarly, there's probably not many normal
>> users who know that the Dropbox daemon spawns an insanely large thread pool
>> and preallocates significant amounts of memory and will just hang if either
>> of these fail.  By having a way to get running max counts of resource usage,
>> it makes it easier for people to know what the minimum limit they need to
>> put on something is.
>
> Rlimits work only if resource usage could be estimated apriori.
> They allows app limit itself to prevent failures is something goes wrong.
And yet many apps allow the _user_ to specify rlimits.  Avahi has the 
option for the user to set every single rlimit, ntpd (the reference 
implementation) lets the user configure MEMLOCK, and quite a few other 
daemons I've seen that are very widely used allow similar manual 
configuration of limits.  Most of these are network service daemons, 
which _can't_ reasonably limit themselves, because they can't know what 
type of workload they'll run against.
>
> Rlimits are useless for controlling resource destribition: just use
> cgroups for that.
The only rlimit that has a cgroup specifically for managing it is NPROC. 
  There's a bunch of memory ones that can't be individually controlled 
in the memcg.  MEMLOCK is actually pretty widely used from what I've 
seen, but there is no way to control it at all with cgroups right now. 
NOFILE, LOCKS, FSIZE, and CORE all deal with the filesystem and have no 
cgroup that controls such resources (the only two that might be useful 
this way are NOFILE and LOCKS, but I doubt that those will get in, 
because they technically tie in with the kernel memory accounting in 
memcg).  NICE and RTPRIO are nonsensical in a cgroup context, although I 
don't think I've ever talked to anyone who actually uses them.  CPU and 
RTTIME have no equivalent in cgroups, but could in theory be tacked onto 
the cpu controller, but they haven't been and until that happens, people 
still have to use them instead of cgroups.
>
>>>
>>> If you want to track origin of failures in user space applications when it
>>> hits
>>> some limit you should track errors. For example rlimits and other
>>> limitation
>>> subsystems could provide resonable amount of tracepoints which could
>>> tell what exactly happened before error. If you need highwater of some
>>> values you could track it in userspace, or maybe tracing subsystem could
>>> provide postpocessing for tracepoint parameters. Anyway, systemtap and
>>> other monsters can do this right now.
>>
>> Userspace tracking of some things just isn't practical.  Take RLIMIT_NPROC
>> for example.  There's not really any reliable way to track this from
>> userspace without modifying the process which is being tracked, which is not
>> a user friendly way of doing things, and in some cases is functionally
>> impossible for an end user to do.
>
> You cannot get reliable upper bound for nr-proc from black box observations.
> Highwater mark is very racy - tiny timing shifts can change it drammaticaly.
You can't get a perfectly reliable upper bound for any type of resource 
usage with just black box observations, period.  You also can't do so 
with tracing without some significant secondary work either for _exactly 
the same reason_.  The thing to remember though is that in a majority of 
cases, what most people need is simply a reasonable estimate which is 
guaranteed to not be below the actual usage.  They don't care exactly 
how many processes application Y uses at most, they just care that it 
uses fewer than some reasonable limit under normal usage.  To go back to 
the NPROC example, most people want to be able to set a limit that will 
catch things if they start to get out of hand, but absolutely have to 
estimate high because almost nothing handles a fork failure gracefully 
without completely shutting down.  In such a situation, it doesn't 
matter if it's a bit racy, as long as they have some reasonable lower 
bound to base the estimate off of and the specifics of it not being 100% 
reliable are properly documented.
>
>>
>>>
>>> On Mon, Jun 13, 2016 at 10:44 PM, Topi Miettinen <toiwoton@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> There are many basic ways to control processes, including capabilities,
>>>> cgroups and resource limits. However, there are far fewer ways to find
>>>> out
>>>> useful values for the limits, except blind trial and error.
>>>>
>>>> This patch series attempts to fix that by giving at least a nice starting
>>>> point from the actual maximum values. I looked where each limit is
>>>> checked
>>>> and added a call to limit bump nearby.
>>>>
>>>>
>>>> Capabilities
>>>> [RFC 01/18] capabilities: track actually used capabilities
>>>>
>>>> Currently, there is no way to know which capabilities are actually used.
>>>> Even
>>>> the source code is only implicit, in-depth knowledge of each capability
>>>> must
>>>> be used when analyzing a program to judge which capabilities the program
>>>> will
>>>> exercise.
>>>>
>>>> Cgroups
>>>> [RFC 02/18] cgroup_pids: track maximum pids
>>>> [RFC 03/18] memcontrol: present maximum used memory also for
>>>> [RFC 04/18] device_cgroup: track and present accessed devices
>>>>
>>>> For tasks and memory cgroup limits the situation is somewhat better as
>>>> the
>>>> current tasks and memory status can be easily seen with ps(1). However,
>>>> any
>>>> transient tasks or temporary higher memory use might slip from the view.
>>>> Device use may be seen with advanced MAC tools, like TOMOYO, but there is
>>>> no
>>>> universal method. Program sources typically give no useful indication
>>>> about
>>>> memory use or how many tasks there could be.
>>>>
>>>> Resource limits
>>>> [RFC 05/18] limits: track and present RLIMIT_NOFILE actual max
>>>> [RFC 06/18] limits: present RLIMIT_CPU and RLIMIT_RTTIMER current
>>>> [RFC 07/18] limits: track RLIMIT_FSIZE actual max
>>>> [RFC 08/18] limits: track RLIMIT_DATA actual max
>>>> [RFC 09/18] limits: track RLIMIT_CORE actual max
>>>> [RFC 10/18] limits: track RLIMIT_STACK actual max
>>>> [RFC 11/18] limits: track and present RLIMIT_NPROC actual max
>>>> [RFC 12/18] limits: track RLIMIT_MEMLOCK actual max
>>>> [RFC 13/18] limits: track RLIMIT_AS actual max
>>>> [RFC 14/18] limits: track RLIMIT_SIGPENDING actual max
>>>> [RFC 15/18] limits: track RLIMIT_MSGQUEUE actual max
>>>> [RFC 16/18] limits: track RLIMIT_NICE actual max
>>>> [RFC 17/18] limits: track RLIMIT_RTPRIO actual max
>>>> [RFC 18/18] proc: present VM_LOCKED memory in /proc/self/maps
>>>>
>>>> Current number of files and current VM usage (data pages, address space
>>>> size)
>>>> could be calculated from available /proc files. Again, any temporarily
>>>> higher
>>>> values could be easily missed. For many limits, there is no way to see
>>>> what
>>>> is the current situation and source code is mostly useless.
>>>>
>>>> As a side note, the resouce limits seem to be in bad shape. For example,
>>>> RLIMIT_MEMLOCK is used incoherently and I think VM statistics can miss
>>>> some changes. Adding RLIMIT_CODE could be useful.
>>>>
>>>> The current maximum values for the resource limits are now shown in
>>>> /proc/task/limits. If this is deemed too confusing for the existing
>>>> programs which rely on the exact format, I can change that to a new file.
>>>>
>>>>
>>>> Finally, the patches work in my testing but I have probably missed finer
>>>> lock/RCU details.
>>>>
>>>> -Topi
>>>>
>>

      parent reply	other threads:[~2016-06-20 17:43 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 19:44 [RFC 00/18] Present useful limits to user Topi Miettinen
2016-06-13 19:44 ` [RFC 01/18] capabilities: track actually used capabilities Topi Miettinen
2016-06-13 20:32   ` Andy Lutomirski
2016-06-13 20:45     ` Topi Miettinen
2016-06-13 21:12       ` Andy Lutomirski
2016-06-13 21:48         ` Topi Miettinen
2016-06-13 19:44 ` [RFC 02/18] cgroup_pids: track maximum pids Topi Miettinen
2016-06-13 21:12   ` Tejun Heo
2016-06-13 21:29     ` Topi Miettinen
2016-06-13 21:33       ` Tejun Heo
2016-06-13 21:59         ` Topi Miettinen
2016-06-13 22:09           ` Tejun Heo
2016-07-17 20:11         ` Topi Miettinen
2016-07-19  1:09           ` Tejun Heo
2016-07-19 16:59             ` Topi Miettinen
2016-07-19 18:13               ` Tejun Heo
2016-06-13 19:44 ` [RFC 03/18] memcontrol: present maximum used memory also for cgroup-v2 Topi Miettinen
2016-06-14  7:01   ` Michal Hocko
2016-06-14 15:47     ` Topi Miettinen
2016-06-14 16:04       ` Johannes Weiner
2016-06-14 17:15         ` Topi Miettinen
2016-06-16 10:27           ` Michal Hocko
2016-06-13 19:44 ` [RFC 04/18] device_cgroup: track and present accessed devices Topi Miettinen
2016-06-17 15:22   ` Serge E. Hallyn
2016-06-13 19:44 ` [RFC 05/18] limits: track and present RLIMIT_NOFILE actual max Topi Miettinen
2016-06-13 20:40   ` Andy Lutomirski
2016-06-13 21:13     ` Topi Miettinen
2016-06-13 21:16       ` Andy Lutomirski
2016-06-14 15:21         ` Topi Miettinen
2016-06-13 19:44 ` [RFC 06/18] limits: present RLIMIT_CPU and RLIMIT_RTTIMER current status Topi Miettinen
2016-06-14  9:14   ` Alexey Dobriyan
2016-06-13 19:44 ` [RFC 07/18] limits: track RLIMIT_FSIZE actual max Topi Miettinen
2016-06-13 19:44 ` [RFC 08/18] limits: track RLIMIT_DATA " Topi Miettinen
2016-06-13 19:44 ` [RFC 09/18] limits: track RLIMIT_CORE " Topi Miettinen
2016-06-13 19:44 ` [RFC 10/18] limits: track RLIMIT_STACK " Topi Miettinen
2016-06-13 19:44 ` [RFC 11/18] limits: track and present RLIMIT_NPROC " Topi Miettinen
2016-06-13 22:27   ` Jann Horn
2016-06-14 15:40     ` Topi Miettinen
2016-06-14 23:15       ` Jann Horn
2016-06-13 19:44 ` [RFC 13/18] limits: track RLIMIT_AS " Topi Miettinen
2016-06-13 19:44 ` [RFC 14/18] limits: track RLIMIT_SIGPENDING " Topi Miettinen
2016-06-14 14:50   ` Oleg Nesterov
2016-06-14 15:51     ` Topi Miettinen
2016-06-13 19:44 ` [RFC 15/18] limits: track RLIMIT_MSGQUEUE " Topi Miettinen
2016-06-17 19:52   ` Doug Ledford
2016-06-13 19:44 ` [RFC 16/18] limits: track RLIMIT_NICE " Topi Miettinen
2016-06-13 19:44 ` [RFC 17/18] limits: track RLIMIT_RTPRIO " Topi Miettinen
2016-06-13 19:44 ` [RFC 18/18] proc: present VM_LOCKED memory in /proc/self/maps Topi Miettinen
2016-06-13 20:43   ` Kees Cook
2016-06-13 20:52     ` Topi Miettinen
2016-06-14 19:03 ` [RFC 00/18] Present useful limits to user Konstantin Khlebnikov
2016-06-14 19:46   ` Topi Miettinen
2016-06-15 14:47   ` Austin S. Hemmelgarn
2016-06-18 14:45     ` Konstantin Khlebnikov
2016-06-19  6:38       ` Topi Miettinen
2016-06-20 17:37       ` Austin S. Hemmelgarn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d026123c-76a2-95ad-8b32-ffec2f098e61@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=toiwoton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).