From: Julien Grall <julien@xen.org>
To: Juergen Gross <jgross@suse.com>, xen-devel@lists.xenproject.org
Cc: "Ian Jackson" <iwj@xenproject.org>, "Wei Liu" <wl@xen.org>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH] tools/xenstore: claim resources when running as daemon
Date: Mon, 17 May 2021 16:55:47 +0100 [thread overview]
Message-ID: <39860a0c-5ac5-2537-532f-6ce288cc7219@xen.org> (raw)
In-Reply-To: <fe5f1e6a-1a89-ea12-feb5-318f25d4281f@suse.com>
Hi Juergen,
On 17/05/2021 07:47, Juergen Gross wrote:
> On 14.05.21 22:19, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 14/05/2021 09:41, Juergen Gross wrote:
>>> Xenstored is absolutely mandatory for a Xen host and it can't be
>>> restarted, so being killed by OOM-killer in case of memory shortage is
>>> to be avoided.
>>>
>>> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
>>> xenstored to use large amounts of memory without being killed.
>>>
>>> In order to support large numbers of domains the limit for open file
>>> descriptors might need to be raised. Each domain needs 2 file
>>> descriptors (one for the event channel and one for the xl per-domain
>>> daemon to connect to xenstored).
>>
>> Hmmm... AFAICT there is only one file descriptor to handle all the
>> event channels. Could you point out the code showing one event FD per
>> domain?
>
> I have let me fooled by just counting the file descriptors used with one
> or two domain active.
>
> So you are right that all event channels only use one fd, but each xl
> daemon will use two (which should be fixed, IMO). And thinking more
> about it it is even worse: each qemu process will at least require one
> additional fd.
>
>>
>>>
>>> Try to raise ulimit for open files to 65536. First the hard limit if
>>> needed, and then the soft limit.
>>
>> I am not sure this is right to impose this limit to everyone. For
>> instance, one admin may know that there will be no more than 100
>> domains on its system.
>
> Is setting a higher limit really a problem?
I am quite unease to set a limit that nearly nobody will reach unless
something went horribly wrong on the system.
>
>> So the admin should be able to configure them. At this point, I think
>> the two limit should be set my the initscript rather than xenstored
>> itself.
>
> But the admin would need to know the Xen internals for selecting the
> correct limits. In the end I'd be fine with moving this modification to
> the script starting Xenstore (which would be launch-xenstore), but the
> configuration item should be "max number of domains to support".
I would be fine with "max numer of domains to support". What I care the
most here is the limits are actually applied most of (if not all) the time.
>
>>
>> This would also avoid the problem where Xenstored is not allowed to
>> modify its limit (see more below).
>>
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> tools/xenstore/xenstored_core.c | 2 ++
>>> tools/xenstore/xenstored_core.h | 3 ++
>>> tools/xenstore/xenstored_minios.c | 4 +++
>>> tools/xenstore/xenstored_posix.c | 46 +++++++++++++++++++++++++++++++
>>> 4 files changed, 55 insertions(+)
>>>
>>> diff --git a/tools/xenstore/xenstored_core.c
>>> b/tools/xenstore/xenstored_core.c
>>> index b66d119a98..964e693450 100644
>>> --- a/tools/xenstore/xenstored_core.c
>>> +++ b/tools/xenstore/xenstored_core.c
>>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>> xprintf = trace;
>>> #endif
>>> + claim_resources();
>>> +
>>> signal(SIGHUP, trigger_reopen_log);
>>> if (tracefile)
>>> tracefile = talloc_strdup(NULL, tracefile);
>>> diff --git a/tools/xenstore/xenstored_core.h
>>> b/tools/xenstore/xenstored_core.h
>>> index 1467270476..ac26973648 100644
>>> --- a/tools/xenstore/xenstored_core.h
>>> +++ b/tools/xenstore/xenstored_core.h
>>> @@ -255,6 +255,9 @@ void daemonize(void);
>>> /* Close stdin/stdout/stderr to complete daemonize */
>>> void finish_daemonize(void);
>>> +/* Set OOM-killer score and raise ulimit. */
>>> +void claim_resources(void);
>>> +
>>> /* Open a pipe for signal handling */
>>> void init_pipe(int reopen_log_pipe[2]);
>>> diff --git a/tools/xenstore/xenstored_minios.c
>>> b/tools/xenstore/xenstored_minios.c
>>> index c94493e52a..df8ff580b0 100644
>>> --- a/tools/xenstore/xenstored_minios.c
>>> +++ b/tools/xenstore/xenstored_minios.c
>>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>> {
>>> }
>>> +void claim_resources(void)
>>> +{
>>> +}
>>> +
>>> void init_pipe(int reopen_log_pipe[2])
>>> {
>>> reopen_log_pipe[0] = -1;
>>> diff --git a/tools/xenstore/xenstored_posix.c
>>> b/tools/xenstore/xenstored_posix.c
>>> index 48c37ffe3e..0074fbd8b2 100644
>>> --- a/tools/xenstore/xenstored_posix.c
>>> +++ b/tools/xenstore/xenstored_posix.c
>>> @@ -22,6 +22,7 @@
>>> #include <fcntl.h>
>>> #include <stdlib.h>
>>> #include <sys/mman.h>
>>> +#include <sys/resource.h>
>>> #include "utils.h"
>>> #include "xenstored_core.h"
>>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>> close(devnull);
>>> }
>>> +static void avoid_oom_killer(void)
>>> +{
>>> + char path[32];
>>> + char val[] = "-500";
>>> + int fd;
>>> +
>>> + snprintf(path, sizeof(path), "/proc/%d/oom_score_adj",
>>> (int)getpid());
>>
>> This looks Linux specific. How about other OSes?
>
> I don't know whether other OSes have an OOM killer, and if they do, how
> to configure it. It is a best effort attempt, after all.
I have CCed Roger who should be able to help for FreeBSD.
>
>>
>>> +
>>> + fd = open(path, O_WRONLY);
>>> + /* Do nothing if file doesn't exist. */
>>
>> Your commit message leads to think that we *must* configure the OOM.
>> If not, then we should not continue. But here, this suggest this is
>> optional. In fact...
>
> I can modify the commit message by adding a "Try to".
>
>>
>>> + if (fd < 0)
>>> + return;
>>> + /* Ignore errors. */
>>> + write(fd, val, sizeof(val));
>>
>> ... xenstored may not be allowed to modify its own parameters. So this
>> would continue silently without the admin necessarily knowning the
>> limit wasn't applied.
>
> I can add a line in the Xenstore log in this regard.
This feels wrong to me. If a limit cannot be applied then it should fail
early rather than possibly at the wrong moment a few days (months?) after.
Cheers,
--
Julien Grall
next prev parent reply other threads:[~2021-05-17 15:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-14 8:41 [PATCH] tools/xenstore: claim resources when running as daemon Juergen Gross
2021-05-14 20:19 ` Julien Grall
2021-05-17 6:47 ` Juergen Gross
2021-05-17 15:55 ` Julien Grall [this message]
2021-05-18 6:43 ` Juergen Gross
2021-05-18 13:09 ` Julien Grall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39860a0c-5ac5-2537-532f-6ce288cc7219@xen.org \
--to=julien@xen.org \
--cc=iwj@xenproject.org \
--cc=jgross@suse.com \
--cc=roger.pau@citrix.com \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).