xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Julien Grall <julien@xen.org>, xen-devel@lists.xenproject.org
Cc: "Ian Jackson" <iwj@xenproject.org>, "Wei Liu" <wl@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH] tools/xenstore: claim resources when running as daemon
Date: Tue, 18 May 2021 08:43:13 +0200	[thread overview]
Message-ID: <e69f7d4c-a616-1265-e909-fd14feea7412@suse.com> (raw)
In-Reply-To: <39860a0c-5ac5-2537-532f-6ce288cc7219@xen.org>


[-- Attachment #1.1.1: Type: text/plain, Size: 7127 bytes --]

On 17.05.21 17:55, Julien Grall wrote:
> Hi Juergen,
> 
> On 17/05/2021 07:47, Juergen Gross wrote:
>> On 14.05.21 22:19, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 14/05/2021 09:41, Juergen Gross wrote:
>>>> Xenstored is absolutely mandatory for a Xen host and it can't be
>>>> restarted, so being killed by OOM-killer in case of memory shortage is
>>>> to be avoided.
>>>>
>>>> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
>>>> xenstored to use large amounts of memory without being killed.
>>>>
>>>> In order to support large numbers of domains the limit for open file
>>>> descriptors might need to be raised. Each domain needs 2 file
>>>> descriptors (one for the event channel and one for the xl per-domain
>>>> daemon to connect to xenstored).
>>>
>>> Hmmm... AFAICT there is only one file descriptor to handle all the 
>>> event channels. Could you point out the code showing one event FD per 

>>> domain?
>>
>> I have let me fooled by just counting the file descriptors used with one
>> or two domain active.
>>
>> So you are right that all event channels only use one fd, but each xl
>> daemon will use two (which should be fixed, IMO). And thinking more
>> about it it is even worse: each qemu process will at least require one
>> additional fd.
>>
>>>
>>>>
>>>> Try to raise ulimit for open files to 65536. First the hard limit if
>>>> needed, and then the soft limit.
>>>
>>> I am not sure this is right to impose this limit to everyone. For 
>>> instance, one admin may know that there will be no more than 100 
>>> domains on its system.
>>
>> Is setting a higher limit really a problem?
> 
> I am quite unease to set a limit that nearly nobody will reach unless 
> something went horribly wrong on the system.

Hmm, I really don't see the downside of having the possibility to let
xenstored open lots of files.

Anyway we can do it via prlimit if you like that better.

> 
>>
>>> So the admin should be able to configure them. At this point, I think 

>>> the two limit should be set my the initscript rather than xenstored 
>>> itself.
>>
>> But the admin would need to know the Xen internals for selecting the
>> correct limits. In the end I'd be fine with moving this modification to
>> the script starting Xenstore (which would be launch-xenstore), but the
>> configuration item should be "max number of domains to support".
> 
> I would be fine with "max numer of domains to support". What I care the 

> most here is the limits are actually applied most of (if not all) the time.

I did another test and found:

- the xl daemon for a guest will use 2 socket connections
- qemu for a HVM guest will use 3 socket connections
- qemu for a PV guest is using one socket connection
- 14 other files are used by xenstored

So we should set the limit to 5 * n_dom + 100 (some headroom will be
nice IMO).

> 
>>
>>>
>>> This would also avoid the problem where Xenstored is not allowed to 
>>> modify its limit (see more below).
>>>
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>   tools/xenstore/xenstored_core.c   |  2 ++
>>>>   tools/xenstore/xenstored_core.h   |  3 ++
>>>>   tools/xenstore/xenstored_minios.c |  4 +++
>>>>   tools/xenstore/xenstored_posix.c  | 46 
>>>> +++++++++++++++++++++++++++++++
>>>>   4 files changed, 55 insertions(+)
>>>>
>>>> diff --git a/tools/xenstore/xenstored_core.c 
>>>> b/tools/xenstore/xenstored_core.c
>>>> index b66d119a98..964e693450 100644
>>>> --- a/tools/xenstore/xenstored_core.c
>>>> +++ b/tools/xenstore/xenstored_core.c
>>>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>>>           xprintf = trace;
>>>>   #endif
>>>> +    claim_resources();
>>>> +
>>>>       signal(SIGHUP, trigger_reopen_log);
>>>>       if (tracefile)
>>>>           tracefile = 
talloc_strdup(NULL, tracefile);
>>>> diff --git a/tools/xenstore/xenstored_core.h 
>>>> b/tools/xenstore/xenstored_core.h
>>>> index 1467270476..ac26973648 100644
>>>> --- a/tools/xenstore/xenstored_core.h
>>>> +++ b/tools/xenstore/xenstored_core.h
>>>> @@ -255,6 +255,9 @@ void daemonize(void);
>>>>   /* Close stdin/stdout/stderr to complete daemonize */
>>>>   void finish_daemonize(void);
>>>> +/* Set OOM-killer score and raise ulimit. */
>>>> +void claim_resources(void);
>>>> +
>>>>   /* Open a pipe for signal handling */
>>>>   void init_pipe(int reopen_log_pipe[2]);
>>>> diff --git a/tools/xenstore/xenstored_minios.c 
>>>> b/tools/xenstore/xenstored_minios.c
>>>> index c94493e52a..df8ff580b0 100644
>>>> --- a/tools/xenstore/xenstored_minios.c
>>>> +++ b/tools/xenstore/xenstored_minios.c
>>>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>>>   {
>>>>   }
>>>> +void claim_resources(void)
>>>> +{
>>>> +}
>>>> +
>>>>   void init_pipe(int reopen_log_pipe[2])
>>>>   {
>>>>       reopen_log_pipe[0] = -1;
>>>> diff --git a/tools/xenstore/xenstored_posix.c 
>>>> b/tools/xenstore/xenstored_posix.c
>>>> index 48c37ffe3e..0074fbd8b2 100644
>>>> --- a/tools/xenstore/xenstored_posix.c
>>>> +++ b/tools/xenstore/xenstored_posix.c
>>>> @@ -22,6 +22,7 @@
>>>>   #include <fcntl.h>
>>>>   #include <stdlib.h>
>>>>   #include <sys/mman.h>
>>>> +#include <sys/resource.h>
>>>>   #include "utils.h"
>>>>   #include "xenstored_core.h"
>>>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>>>       close(devnull);
>>>>   }
>>>> +static void avoid_oom_killer(void)
>>>> +{
>>>> +    char path[32];
>>>> +    char val[] = "-500";
>>>> +    int fd;
>>>> +
>>>> +    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", 
>>>> (int)getpid());
>>>
>>> This looks Linux specific. How about other OSes?
>>
>> I don't know whether other OSes have an OOM killer, and if they do, how
>> to configure it. It is a best effort attempt, after all.
> 
> I have CCed Roger who should be able to help for FreeBSD.

It would be possible to set the OOM-score from the launch script, too.

> 
>>
>>>
>>>> +
>>>> +    fd = open(path, O_WRONLY);
>>>> +    /* Do nothing if file doesn't exist. */
>>>
>>> Your commit message leads to think that we *must* configure the OOM. 
>>> If not, then we should not continue. But here, this suggest this is 
>>> optional. In fact...
>>
>> I can modify the commit message by adding a "Try to".
>>
>>>
>>>> +    if (fd < 0)
>>>> +        return;
>>>> +    /* Ignore errors. */
>>>> +    write(fd, val, sizeof(val));
>>>
>>> ... xenstored may not be allowed to modify its own parameters. So 
>>> this would continue silently without the admin necessarily knowning 
>>> the limit wasn't applied.
>>
>> I can add a line in the Xenstore log in this regard.
> 
> This feels wrong to me. If a limit cannot be applied then it should fail 
> early rather than possibly at the wrong moment a few days (months?) after.

I think issuing a warning would be better here. We are running with
no OOM adjustments since years now.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

  reply	other threads:[~2021-05-18  6:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14  8:41 Juergen Gross
2021-05-14 20:19 ` Julien Grall
2021-05-17  6:47   ` Juergen Gross
2021-05-17 15:55     ` Julien Grall
2021-05-18  6:43       ` Juergen Gross [this message]
2021-05-18 13:09         ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e69f7d4c-a616-1265-e909-fd14feea7412@suse.com \
    --to=jgross@suse.com \
    --cc=iwj@xenproject.org \
    --cc=julien@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    --subject='Re: [PATCH] tools/xenstore: claim resources when running as daemon' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).