qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Felipe Franciosi <felipe@nutanix.com>
To: "Marc-André Lureau" <marcandre.lureau@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Daniel Berrange <berrange@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [PATCH] fence: introduce a file-based self-fence mechanism
Date: Thu, 5 Dec 2019 14:22:29 +0000	[thread overview]
Message-ID: <D503A2EF-70B6-4C9E-A0DB-889C915F1977@nutanix.com> (raw)
In-Reply-To: <CAJ+F1CJP_+bcAMgFFv4nS9wNZcAAsT7BTHcyiLFr0yfF0j8XHw@mail.gmail.com>

Heya,

> On Nov 26, 2019, at 12:18 PM, Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
> 
> On Mon, Nov 25, 2019 at 8:14 PM Felipe Franciosi <felipe@nutanix.com> wrote:
>> 
>> This introduces a self-fence mechanism to Qemu, causing it to die if a
>> heartbeat condition is not met. Currently, a file-based heartbeat is
>> available and can be configured as follows:
>> 
>> -object file-fence,id=ff0,file=/foo,qtimeout=20,ktimeout=25,signal=kill
>> 
>> Qemu will watch 'file' for attribute changes. Touching the file works as
>> a heartbeat. This parameter is mandatory.
>> 
>> Fencing happens after 'qtimeout' or 'ktimeout' seconds elapse without a
>> heartbeat. At least one of these must be specified. Both may be used.
>> 
>> When using 'qtimeout', an internal Qemu timer is used. Fencing with this
>> method gives Qemu a chance to write a log message indicating which file
>> caused the event. If Qemu's main loop is hung for whatever reason, this
>> method won't successfully kill Qemu.
>> 
>> When using 'ktimeout', a kernel timer is used. In this case, 'signal'
>> can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using
>> SIGQUIT may be preferred for obtaining core dumps. If Qemu is hung
>> (eg. uninterruptable sleep), this method won't successfully kill Qemu.
>> 
>> It is worth noting that even successfully killing Qemu may not be
>> sufficient to completely fence a VM as certain operations like network
>> packets or block commands may be pending in the kernel. If that is a
>> concern, systems should consider using further fencing mechanisms like
>> hardware watchdogs either in addition or in conjunction with this for
>> additional protection.
>> 
>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
>> ---
>> Based-on: <20191125153619.39893-2-felipe@nutanix.com>
>> 
>> Makefile.objs       |   1 +
>> fence/Makefile.objs |   1 +
>> fence/file_fence.c  | 381 ++++++++++++++++++++++++++++++++++++++++++++
> 
> I think it could be under backends/

I thought about it and couldn't make up my mind. My decision was based on:
- Doesn't really feel like a "backend".
- I envision other types of self-fencing heartbeats (eg. network-based),
  in which case this would probably be split in a "common" file.

Arguably other objects in backends/ also fall within these categories,
so I'm happy to move if you think it's better. Let me know.


> And a slight preference for - seperated words in filenames over qemu codebase.

Sure, will change.

> 
>> qemu-options.hx     |  27 +++-
>> 4 files changed, 409 insertions(+), 1 deletion(-)
>> create mode 100644 fence/Makefile.objs
>> create mode 100644 fence/file_fence.c
>> 
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 11ba1a36bd..998eed4796 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -75,6 +75,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
>> 
>> common-obj-y += backends/
>> common-obj-y += chardev/
>> +common-obj-y += fence/
>> 
>> common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
>> qemu-seccomp.o-cflags := $(SECCOMP_CFLAGS)
>> diff --git a/fence/Makefile.objs b/fence/Makefile.objs
>> new file mode 100644
>> index 0000000000..2ed2092568
>> --- /dev/null
>> +++ b/fence/Makefile.objs
>> @@ -0,0 +1 @@
>> +common-obj-y += file_fence.o
>> diff --git a/fence/file_fence.c b/fence/file_fence.c
>> new file mode 100644
>> index 0000000000..5b743e69d2
>> --- /dev/null
>> +++ b/fence/file_fence.c
>> @@ -0,0 +1,381 @@
>> +/*
>> + * QEMU file-based self-fence mechanism
>> + *
>> + * Copyright (c) 2019 Nutanix Inc. All rights reserved.
>> + *
>> + * Authors:
>> + *    Felipe Franciosi <felipe@nutanix.com>
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gnu.org_licenses_&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=CCrJKVC5zGot8PrnI-iYe00MdX4pgdQfMRigp14Ptmk&m=edzXQ5P0GCmmzZ4-20uvsmgqIreV3BKYC8JYikgHVH4&s=q3GMprakVoRJw8zkbbjXPqAbExv93DzJ-HgI2z00408&e= >.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/filemonitor.h"
>> +#include "qemu/timer.h"
>> +
>> +#include <time.h>
>> +
>> +#define TYPE_FILE_FENCE "file-fence"
>> +
>> +typedef struct FileFence {
>> +    Object parent_obj;
>> +
>> +    gchar *dir;
>> +    gchar *file;
>> +    uint32_t qtimeout;
>> +    uint32_t ktimeout;
>> +    int signal;
>> +
>> +    timer_t ktimer;
>> +    QEMUTimer *qtimer;
>> +
>> +    QFileMonitor *fm;
>> +    uint64_t id;
>> +} FileFence;
>> +
>> +#define FILE_FENCE(obj) \
>> +    OBJECT_CHECK(FileFence, (obj), TYPE_FILE_FENCE)
>> +
>> +static void
>> +timer_update(FileFence *ff)
>> +{
>> +    struct itimerspec its = {
>> +        .it_value.tv_sec = ff->ktimeout,
>> +    };
>> +    int err;
>> +
>> +    if (ff->qtimeout) {
>> +        timer_mod(ff->qtimer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>> +                              ff->qtimeout * 1000);
>> +    }
>> +
>> +    if (ff->ktimeout) {
>> +        err = timer_settime(ff->ktimer, 0, &its, NULL);
>> +        g_assert(err == 0);
>> +    }
>> +}
>> +
>> +static void
>> +file_fence_abort_cb(void *opaque)
>> +{
>> +    FileFence *ff = opaque;
>> +    printf("Fencing after %u seconds on '%s'\n",
>> +           ff->qtimeout, g_strconcat(ff->dir, "/", ff->file, NULL));
> 
> May be error_printf() instead.

Makes sense.

> 
>> +    abort();
>> +}
>> +
>> +static void
>> +file_fence_watch_cb(int64_t id, QFileMonitorEvent ev, const char *file,
>> +                    void *opaque)
>> +{
>> +    FileFence *ff = opaque;
>> +
>> +    if (ev != QFILE_MONITOR_EVENT_ATTRIBUTES) {
>> +        return;
>> +    }
>> +
>> +    if (g_strcmp0(file, ff->file) != 0) {
> 
> Does it actually happen? Apparently the code in
> util/filemonitor-inotify.c already checks for g_str_equal(filename)

You are right, it does. I think I saw it, but for some reason decided
to be over protective. I will remove this check.

> 
>> +        return;
>> +    }
>> +
>> +    timer_update(ff);
>> +}
>> +
>> +static void
>> +ktimer_tear(FileFence *ff)
>> +{
>> +    int err;
>> +
>> +    if (ff->ktimer) {
>> +        err = timer_delete(ff->ktimer);
>> +        g_assert(err == 0);
>> +        ff->ktimer = NULL;
>> +    }
>> +}
>> +
>> +static void
>> +ktimer_setup(FileFence *ff, Error **errp)
>> +{
>> +    int err;
>> +
>> +    struct sigevent sev = {
>> +        .sigev_notify = SIGEV_SIGNAL,
>> +        .sigev_signo = ff->signal ? ff->signal : SIGKILL,
>> +    };
>> +
>> +    if (ff->ktimeout == 0) {
>> +        return;
>> +    }
>> +
>> +    err = timer_create(CLOCK_MONOTONIC, &sev, &ff->ktimer);
>> +    if (err == -1) {
>> +        error_setg(errp, "Error creating kernel timer: %m");
>> +        return;
>> +    }
>> +}
>> +
>> +static void
>> +qtimer_tear(FileFence *ff)
>> +{
>> +    if (ff->qtimer) {
>> +        timer_del(ff->qtimer);
>> +        timer_free(ff->qtimer);
>> +    }
>> +    ff->qtimer = NULL;
>> +}
>> +
>> +static void
>> +qtimer_setup(FileFence *ff, Error **errp)
>> +{
>> +    QEMUTimer *qtimer;
>> +
>> +    if (ff->qtimeout == 0) {
>> +        return;
>> +    }
>> +
>> +    qtimer = timer_new_ms(QEMU_CLOCK_REALTIME, file_fence_abort_cb, ff);
>> +    if (qtimer == NULL) {
>> +        error_setg(errp, "Error creating Qemu timer");
>> +        return;
>> +    }
>> +
>> +    ff->qtimer = qtimer;
>> +}
>> +
>> +static void
>> +watch_tear(FileFence *ff)
>> +{
>> +    if (ff->fm) {
>> +        qemu_file_monitor_remove_watch(ff->fm, ff->dir, ff->id);
>> +        qemu_file_monitor_free(ff->fm);
>> +        ff->fm = NULL;
>> +        ff->id = 0;
>> +    }
>> +}
>> +
>> +static void
>> +watch_setup(FileFence *ff, Error **errp)
>> +{
>> +    QFileMonitor *fm;
>> +    int64_t id;
>> +
>> +    fm = qemu_file_monitor_new(errp);
>> +    if (!fm) {
>> +        return;
>> +    }
>> +
>> +    id = qemu_file_monitor_add_watch(fm, ff->dir, ff->file,
>> +                                     file_fence_watch_cb, ff, errp);
>> +    if (id < 0) {
>> +        qemu_file_monitor_free(fm);
>> +        return;
>> +    }
>> +
>> +    ff->fm = fm;
>> +    ff->id = id;
>> +}
>> +
>> +static void
>> +file_fence_complete(UserCreatable *obj, Error **errp)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +
>> +    if (ff->dir == NULL) {
>> +        error_setg(errp, "A 'file' must be set");
>> +        return;
>> +    }
>> +
>> +    if (ff->signal != 0 && ff->ktimeout == 0) {
>> +        error_setg(errp, "Using 'signal' requires 'ktimeout' to be set");
>> +        return;
>> +    }
>> +
>> +    if (ff->ktimeout == 0 && ff->qtimeout == 0) {
>> +        error_setg(errp, "One or both of 'ktimeout' or 'qtimeout' must be set");
>> +        return;
>> +    }
>> +
>> +    if (ff->qtimeout >= ff->ktimeout) {
>> +        error_setg(errp, "Using 'qtimeout' >= 'ktimeout' doesn't make sense");
>> +        return;
>> +    }
>> +
>> +    watch_setup(ff, errp);
>> +    if (*errp != NULL) {
>> +        return;
>> +    }
>> +
>> +    qtimer_setup(ff, errp);
>> +    if (*errp != NULL) {
>> +        goto err_watch;
>> +    }
>> +
>> +    ktimer_setup(ff, errp);
>> +    if (*errp != NULL) {
>> +        goto err_qtimer;
>> +    }
> 
> 
> I would rather return success on the setup functions and write:
> 
>  if (!watch_setup(ff, errp) ||
>      !qtimer_setup(...) ||
>      !...) {
>      return; (no cleanup)
>   }
> 
> Object creation will fail, and finalize function will be called.
> 
>> +
>> +    timer_update(ff);
>> +
>> +    return;
>> +
>> +err_qtimer:
>> +    qtimer_tear(ff);
>> +err_watch:
>> +    watch_tear(ff);
>> +}
>> +
>> +static void
>> +file_fence_set_signal(Object *obj, const char *value, Error **errp)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +    gchar *gsig;
>> +
>> +    if (ff->signal) {
>> +        error_setg(errp, "Signal property already set");
>> +        return;
>> +    }
>> +
>> +    gsig = g_ascii_strup(value, -1);
>> +
>> +    if (g_strcmp0(gsig, "QUIT") == 0) {
>> +        ff->signal = SIGQUIT;
>> +    } else
>> +    if (g_strcmp0(gsig, "KILL") == 0) {
>> +        ff->signal = SIGKILL;
>> +    } else {
>> +        error_setg(errp, "Invalid signal. Must be 'quit' or 'kill'");
>> +    }
> 
> Or bail out early for NULL case and use g_ascii_strcasecmp()

Sounds better. Let me look into it.

> 
>> +
>> +    g_free(gsig);
>> +}
>> +
>> +static char *
>> +file_fence_get_signal(Object *obj, Error **errp)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +
>> +    switch (ff->signal) {
>> +    case SIGKILL:
>> +        return g_strdup("kill");
>> +    case SIGQUIT:
>> +        return g_strdup("quit");
>> +    }
>> +
>> +    /* Unreachable */
>> +    abort();
>> +}
>> +
>> +static void
>> +file_fence_set_file(Object *obj, const char *value, Error **errp)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +    gchar *dir, *file;
> 
> g_autofree

Sweet.

> 
>> +
>> +    if (ff->dir) {
>> +        error_setg(errp, "File property already set");
>> +        return;
>> +    }
>> +
>> +    dir = g_path_get_dirname(value);
>> +    if (g_str_equal(dir, ".")) {
>> +        error_setg(errp, "Path for file-fence must be absolute");
>> +        return;
>> +    }
>> +
>> +    file = g_path_get_basename(value);
>> +    if (g_str_equal(file, ".")) {
>> +        error_setg(errp, "Path for file-fence must be a file");
>> +        g_free(dir);
>> +        return;
>> +    }
>> +
>> +    ff->dir = dir;
>> +    ff->file = file;
> 
> g_steal_pointer()

Cool! Clearly I could spend more time in the glib manual. :)

> 
>> +}
>> +
>> +static char *
>> +file_fence_get_file(Object *obj, Error **errp)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +
>> +    if (ff->file) {
>> +        return g_strconcat(ff->dir, "/", ff->file, NULL);
> 
> Or g_build_filename()

Makes sense.

> 
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static void
>> +file_fence_instance_finalize(Object *obj)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +
>> +    ktimer_tear(ff);
>> +    qtimer_tear(ff);
>> +    watch_tear(ff);
>> +
>> +    g_free(ff->file);
>> +    g_free(ff->dir);
>> +}
>> +
>> +static void
>> +file_fence_instance_init(Object *obj)
>> +{
>> +    FileFence *ff = FILE_FENCE(obj);
>> +
>> +    object_property_add_str(obj, "file",
>> +                            file_fence_get_file,
>> +                            file_fence_set_file,
>> +                            &error_abort);
>> +    object_property_add_str(obj, "signal",
>> +                            file_fence_get_signal,
>> +                            file_fence_set_signal,
>> +                            &error_abort);
>> +    object_property_add_uint32_ptr(obj, "qtimeout", &ff->qtimeout,
>> +                                   false, &error_abort);
>> +    object_property_add_uint32_ptr(obj, "ktimeout", &ff->ktimeout,
>> +                                   false, &error_abort);
> 
> Make them class properties (this will help with -object
> file-fence,help and such). (fwiw, I have pending patches to support
> class property description & default values..)

I tried to fit some of these in the class, as well as justify a split
of the file-based fencing with a more generic self-fencer right off
the bat. But it didn't make sense in the end. I envisioned scenarios
where you may have different heartbeats for one Qemu with different
timeouts. In that case, it wouldn't work as a class property, right?

> 
>> +}
>> +
>> +static void
>> +file_fence_class_init(ObjectClass *klass, void *class_data)
>> +{
>> +    UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
>> +    ucc->complete = file_fence_complete;
>> +}
>> +
>> +static const TypeInfo file_fence_info = {
>> +    .name = TYPE_FILE_FENCE,
>> +    .parent = TYPE_OBJECT,
>> +    .class_init = file_fence_class_init,
>> +    .instance_size = sizeof(FileFence),
>> +    .instance_init = file_fence_instance_init,
>> +    .instance_finalize = file_fence_instance_finalize,
>> +    .interfaces = (InterfaceInfo[]) {
>> +        { TYPE_USER_CREATABLE },
>> +        { }
>> +    }
>> +};
>> +
>> +static void
>> +register_types(void)
>> +{
>> +    type_register_static(&file_fence_info);
>> +}
>> +
>> +type_init(register_types);
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 65c9473b73..995d3d6abf 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -4929,8 +4929,33 @@ CN=laptop.example.com,O=Example Home,L=London,ST=London,C=GB
>> 
>> @end table
>> 
>> -ETEXI
>> +@item -object file-fence,id=@var{id},file=@var{file},qtimeout=@var{qtimeout},ktimeout=@var{ktimeout},signal=@{signal}
>> +
>> +Self-fence Qemu if @var{file} is not modified within a given timeout.
>> +
>> +Qemu will watch @var{file} for attribute changes. Touching the file works as a
>> +heartbeat. This parameter is mandatory.
>> +
>> +Fencing happens after @var{qtimeout} or @var{ktimeout} seconds elapse
>> +without a heartbeat. At least one of these must be specified. Both may be used.
>> 
>> +When using @var{qtimeout}, an internal Qemu timer is used. Fencing with
>> +this method gives Qemu a chance to write a log message indicating which file
>> +caused the event. If Qemu's main loop is hung for whatever reason, this method
>> +won't successfully kill Qemu.
>> +
>> +When using @var{ktimeout}, a kernel timer is used. In this case, @var{signal}
>> +can be 'kill' (for SIGKILL, default) or 'quit' (for SIGQUIT). Using SIGQUIT may
>> +be preferred for obtaining core dumps. If Qemu is hung (eg. uninterruptable
>> +sleep), this method won't successfully kill Qemu.
>> +
>> +It is worth noting that even successfully killing Qemu may not be sufficient to
>> +completely fence a VM as certain operations like network packets or block
>> +commands may be pending in the kernel. If that is a concern, systems should
>> +consider using further fencing mechanisms like hardware watchdogs either in
>> +addition or in conjunction with this feature for additional protection.
>> +
>> +ETEXI
>> 
>> HXCOMM This is the last statement. Insert new options before this line!
>> STEXI
>> --
>> 2.20.1
>> 
> 
> Code looks fine to me otherwise

Let me work on a v2 next week.

F.

> 
> -- 
> Marc-André Lureau


  reply	other threads:[~2019-12-05 14:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-25 16:14 [PATCH] fence: introduce a file-based self-fence mechanism Felipe Franciosi
2019-11-26 11:18 ` Marc-André Lureau
2019-12-05 14:22   ` Felipe Franciosi [this message]
2020-02-05  9:43     ` Felipe Franciosi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D503A2EF-70B6-4C9E-A0DB-889C915F1977@nutanix.com \
    --to=felipe@nutanix.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).