xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien@xen.org>
To: Juergen Gross <jgross@suse.com>, xen-devel@lists.xenproject.org
Cc: "Ian Jackson" <iwj@xenproject.org>, "Wei Liu" <wl@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH] tools/xenstore: claim resources when running as daemon
Date: Mon, 17 May 2021 16:55:47 +0100	[thread overview]
Message-ID: <39860a0c-5ac5-2537-532f-6ce288cc7219@xen.org> (raw)
In-Reply-To: <fe5f1e6a-1a89-ea12-feb5-318f25d4281f@suse.com>

Hi Juergen,

On 17/05/2021 07:47, Juergen Gross wrote:
> On 14.05.21 22:19, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 14/05/2021 09:41, Juergen Gross wrote:
>>> Xenstored is absolutely mandatory for a Xen host and it can't be
>>> restarted, so being killed by OOM-killer in case of memory shortage is
>>> to be avoided.
>>>
>>> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
>>> xenstored to use large amounts of memory without being killed.
>>>
>>> In order to support large numbers of domains the limit for open file
>>> descriptors might need to be raised. Each domain needs 2 file
>>> descriptors (one for the event channel and one for the xl per-domain
>>> daemon to connect to xenstored).
>>
>> Hmmm... AFAICT there is only one file descriptor to handle all the 
>> event channels. Could you point out the code showing one event FD per 
>> domain?
> 
> I have let me fooled by just counting the file descriptors used with one
> or two domain active.
> 
> So you are right that all event channels only use one fd, but each xl
> daemon will use two (which should be fixed, IMO). And thinking more
> about it it is even worse: each qemu process will at least require one
> additional fd.
> 
>>
>>>
>>> Try to raise ulimit for open files to 65536. First the hard limit if
>>> needed, and then the soft limit.
>>
>> I am not sure this is right to impose this limit to everyone. For 
>> instance, one admin may know that there will be no more than 100 
>> domains on its system.
> 
> Is setting a higher limit really a problem?

I am quite unease to set a limit that nearly nobody will reach unless 
something went horribly wrong on the system.

> 
>> So the admin should be able to configure them. At this point, I think 
>> the two limit should be set my the initscript rather than xenstored 
>> itself.
> 
> But the admin would need to know the Xen internals for selecting the
> correct limits. In the end I'd be fine with moving this modification to
> the script starting Xenstore (which would be launch-xenstore), but the
> configuration item should be "max number of domains to support".

I would be fine with "max numer of domains to support". What I care the 
most here is the limits are actually applied most of (if not all) the time.

> 
>>
>> This would also avoid the problem where Xenstored is not allowed to 
>> modify its limit (see more below).
>>
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   tools/xenstore/xenstored_core.c   |  2 ++
>>>   tools/xenstore/xenstored_core.h   |  3 ++
>>>   tools/xenstore/xenstored_minios.c |  4 +++
>>>   tools/xenstore/xenstored_posix.c  | 46 +++++++++++++++++++++++++++++++
>>>   4 files changed, 55 insertions(+)
>>>
>>> diff --git a/tools/xenstore/xenstored_core.c 
>>> b/tools/xenstore/xenstored_core.c
>>> index b66d119a98..964e693450 100644
>>> --- a/tools/xenstore/xenstored_core.c
>>> +++ b/tools/xenstore/xenstored_core.c
>>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>>           xprintf = trace;
>>>   #endif
>>> +    claim_resources();
>>> +
>>>       signal(SIGHUP, trigger_reopen_log);
>>>       if (tracefile)
>>>           tracefile = talloc_strdup(NULL, tracefile);
>>> diff --git a/tools/xenstore/xenstored_core.h 
>>> b/tools/xenstore/xenstored_core.h
>>> index 1467270476..ac26973648 100644
>>> --- a/tools/xenstore/xenstored_core.h
>>> +++ b/tools/xenstore/xenstored_core.h
>>> @@ -255,6 +255,9 @@ void daemonize(void);
>>>   /* Close stdin/stdout/stderr to complete daemonize */
>>>   void finish_daemonize(void);
>>> +/* Set OOM-killer score and raise ulimit. */
>>> +void claim_resources(void);
>>> +
>>>   /* Open a pipe for signal handling */
>>>   void init_pipe(int reopen_log_pipe[2]);
>>> diff --git a/tools/xenstore/xenstored_minios.c 
>>> b/tools/xenstore/xenstored_minios.c
>>> index c94493e52a..df8ff580b0 100644
>>> --- a/tools/xenstore/xenstored_minios.c
>>> +++ b/tools/xenstore/xenstored_minios.c
>>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>>   {
>>>   }
>>> +void claim_resources(void)
>>> +{
>>> +}
>>> +
>>>   void init_pipe(int reopen_log_pipe[2])
>>>   {
>>>       reopen_log_pipe[0] = -1;
>>> diff --git a/tools/xenstore/xenstored_posix.c 
>>> b/tools/xenstore/xenstored_posix.c
>>> index 48c37ffe3e..0074fbd8b2 100644
>>> --- a/tools/xenstore/xenstored_posix.c
>>> +++ b/tools/xenstore/xenstored_posix.c
>>> @@ -22,6 +22,7 @@
>>>   #include <fcntl.h>
>>>   #include <stdlib.h>
>>>   #include <sys/mman.h>
>>> +#include <sys/resource.h>
>>>   #include "utils.h"
>>>   #include "xenstored_core.h"
>>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>>       close(devnull);
>>>   }
>>> +static void avoid_oom_killer(void)
>>> +{
>>> +    char path[32];
>>> +    char val[] = "-500";
>>> +    int fd;
>>> +
>>> +    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", 
>>> (int)getpid());
>>
>> This looks Linux specific. How about other OSes?
> 
> I don't know whether other OSes have an OOM killer, and if they do, how
> to configure it. It is a best effort attempt, after all.

I have CCed Roger who should be able to help for FreeBSD.

> 
>>
>>> +
>>> +    fd = open(path, O_WRONLY);
>>> +    /* Do nothing if file doesn't exist. */
>>
>> Your commit message leads to think that we *must* configure the OOM. 
>> If not, then we should not continue. But here, this suggest this is 
>> optional. In fact...
> 
> I can modify the commit message by adding a "Try to".
> 
>>
>>> +    if (fd < 0)
>>> +        return;
>>> +    /* Ignore errors. */
>>> +    write(fd, val, sizeof(val));
>>
>> ... xenstored may not be allowed to modify its own parameters. So this 
>> would continue silently without the admin necessarily knowning the 
>> limit wasn't applied.
> 
> I can add a line in the Xenstore log in this regard.

This feels wrong to me. If a limit cannot be applied then it should fail 
early rather than possibly at the wrong moment a few days (months?) after.

Cheers,

-- 
Julien Grall


  reply	other threads:[~2021-05-17 15:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14  8:41 Juergen Gross
2021-05-14 20:19 ` Julien Grall
2021-05-17  6:47   ` Juergen Gross
2021-05-17 15:55     ` Julien Grall [this message]
2021-05-18  6:43       ` Juergen Gross
2021-05-18 13:09         ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39860a0c-5ac5-2537-532f-6ce288cc7219@xen.org \
    --to=julien@xen.org \
    --cc=iwj@xenproject.org \
    --cc=jgross@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    --subject='Re: [PATCH] tools/xenstore: claim resources when running as daemon' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).