qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/5] Introduce Advanced Watch Dog module
@ 2019-12-17 12:45 Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 1/5] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Zhang Chen @ 2019-12-17 12:45 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Advanced Watch Dog is an universal monitoring module on VMM side, it can be used
to detect network down(VMM to guest, VMM to VMM, VMM to another remote server)
and do previously set operation. Current AWD patch just accept any input as the
signal to refresh the watchdog timer, and we can also make a certain interactive
protocol here. For the outputs, user can pre-write some command or some messages
in the AWD opt-script. We noticed that there is no way for VMM communicate
directly, maybe some people think we don't need such things(up layer
software like openstack can handle it). so we engaged with real customer found
that they need a lightweight and efficient mechanism to solve some practical problems,

For example Edge Computing cases(they think high level software is too heavy
to use in Edge or it is hard to manage and combine with VM instance).
It make user have basic VM/Host network monitoring tools and basic false
tolerance and recovery solution..

Please see the detail documentation in the last patch.

V4:
 - Add more introduction in qemu-options.hx
 - Addressed Paolo's comments add docs/awd.txt for the AWD module detail.

V3:
 - Rebased on Qemu 4.2.0-rc1 code.
 - Fix commit message issue.

V2:
 - Addressed Philippe comments add configure selector for AWD.

Initial:
 - Initial version.


Zhang Chen (5):
  net/awd.c: Introduce Advanced Watch Dog module framework
  net/awd.c: Initailize input/output chardev
  net/awd.c: Load advanced watch dog worker thread job
  vl.c: Make Advanced Watch Dog delayed initialization
  docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module

 configure         |   9 +
 docs/awd.txt      |  88 +++++++++
 net/Makefile.objs |   1 +
 net/awd.c         | 491 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx   |  20 ++
 vl.c              |   7 +
 6 files changed, 616 insertions(+)
 create mode 100644 docs/awd.txt
 create mode 100644 net/awd.c

-- 
2.17.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V4 1/5] net/awd.c: Introduce Advanced Watch Dog module framework
  2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
@ 2019-12-17 12:45 ` Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 2/5] net/awd.c: Initailize input/output chardev Zhang Chen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Zhang Chen @ 2019-12-17 12:45 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

This patch introduce a new module named Advanced Watch Dog,
and defined the input and output parameter. AWD use standard chardev
as the way of communicationg with the outside world.
If you want to use it, please add "--enable-awd" when configure qemu.

Demo command:
-object advanced-watchdog,id=awd1,server=on,awd_node=d_node,notification_node=remote_server,opt_script=opt_script_path,iothread=awd_iothread,pulse_interval=1000,timeout=5000

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 configure         |   9 ++
 net/Makefile.objs |   1 +
 net/awd.c         | 261 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx   |  20 ++++
 4 files changed, 291 insertions(+)
 create mode 100644 net/awd.c

diff --git a/configure b/configure
index 6099be1d84..49d1830de4 100755
--- a/configure
+++ b/configure
@@ -383,6 +383,7 @@ vhost_scsi=""
 vhost_vsock=""
 vhost_user=""
 vhost_user_fs=""
+awd="no"
 kvm="no"
 hax="no"
 hvf="no"
@@ -1304,6 +1305,10 @@ for opt do
   ;;
   --enable-vhost-user-fs) vhost_user_fs="yes"
   ;;
+  --disable-awd) awd="no"
+  ;;
+  --enable-awd) awd="yes"
+  ;;
   --disable-opengl) opengl="no"
   ;;
   --enable-opengl) opengl="yes"
@@ -1780,6 +1785,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   vhost-crypto    vhost-user-crypto backend support
   vhost-kernel    vhost kernel backend support
   vhost-user      vhost-user backend support
+  awd             Advanced Watch Dog support
   spice           spice
   rbd             rados block device (rbd)
   libiscsi        iscsi support
@@ -7043,6 +7049,9 @@ fi
 if test "$vhost_user" = "yes" ; then
   echo "CONFIG_VHOST_USER=y" >> $config_host_mak
 fi
+if test "$awd" = "yes" ; then
+  echo "CONFIG_AWD=y" >> $config_host_mak
+fi
 if test "$vhost_user_fs" = "yes" ; then
   echo "CONFIG_VHOST_USER_FS=y" >> $config_host_mak
 fi
diff --git a/net/Makefile.objs b/net/Makefile.objs
index c5d076d19c..187e655443 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -19,6 +19,7 @@ common-obj-y += colo-compare.o
 common-obj-y += colo.o
 common-obj-y += filter-rewriter.o
 common-obj-y += filter-replay.o
+common-obj-$(CONFIG_AWD) += awd.o
 
 tap-obj-$(CONFIG_LINUX) = tap-linux.o
 tap-obj-$(CONFIG_BSD) = tap-bsd.o
diff --git a/net/awd.c b/net/awd.c
new file mode 100644
index 0000000000..d42b4a7372
--- /dev/null
+++ b/net/awd.c
@@ -0,0 +1,261 @@
+/*
+ * Advanced Watch Dog
+ *
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2019 Intel Corporation
+ *
+ * Author: Zhang Chen <chen.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "net/net.h"
+#include "qom/object_interfaces.h"
+#include "qom/object.h"
+#include "chardev/char-fe.h"
+#include "qemu/sockets.h"
+#include "sysemu/iothread.h"
+
+#define TYPE_AWD  "advanced-watchdog"
+#define AWD(obj)  OBJECT_CHECK(AwdState, (obj), TYPE_AWD)
+
+#define AWD_READ_LEN_MAX NET_BUFSIZE
+/* Default advanced watchdog pulse interval */
+#define AWD_PULSE_INTERVAL_DEFAULT 5000
+/* Default advanced watchdog timeout */
+#define AWD_TIMEOUT_DEFAULT 2000
+
+typedef struct AwdState {
+    Object parent;
+
+    bool server;
+    char *awd_node;
+    char *notification_node;
+    char *opt_script;
+    uint32_t pulse_interval;
+    uint32_t timeout;
+    IOThread *iothread;
+} AwdState;
+
+typedef struct AwdClass {
+    ObjectClass parent_class;
+} AwdClass;
+
+static char *awd_get_node(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return g_strdup(s->awd_node);
+}
+
+static void awd_set_node(Object *obj, const char *value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->awd_node);
+    s->awd_node = g_strdup(value);
+}
+
+static char *noti_get_node(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return g_strdup(s->notification_node);
+}
+
+static void noti_set_node(Object *obj, const char *value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->notification_node);
+    s->notification_node = g_strdup(value);
+}
+
+static char *opt_script_get_node(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return g_strdup(s->opt_script);
+}
+
+static void opt_script_set_node(Object *obj, const char *value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->opt_script);
+    s->opt_script = g_strdup(value);
+}
+
+static bool awd_get_server(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return s->server;
+}
+
+static void awd_set_server(Object *obj, bool value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    s->server = value;
+}
+
+static void awd_get_interval(Object *obj, Visitor *v,
+                                   const char *name, void *opaque,
+                                   Error **errp)
+{
+    AwdState *s = AWD(obj);
+    uint32_t value = s->pulse_interval;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void awd_set_interval(Object *obj, Visitor *v,
+                                   const char *name, void *opaque,
+                                   Error **errp)
+{
+    AwdState *s = AWD(obj);
+    Error *local_err = NULL;
+    uint32_t value;
+
+    visit_type_uint32(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    if (!value) {
+        error_setg(&local_err, "Property '%s.%s' requires a positive value",
+                   object_get_typename(obj), name);
+        goto out;
+    }
+    s->pulse_interval = value;
+
+out:
+    error_propagate(errp, local_err);
+}
+
+static void awd_get_timeout(Object *obj, Visitor *v,
+                            const char *name, void *opaque,
+                            Error **errp)
+{
+    AwdState *s = AWD(obj);
+    uint32_t value = s->timeout;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void awd_set_timeout(Object *obj, Visitor *v,
+                            const char *name, void *opaque,
+                            Error **errp)
+{
+    AwdState *s = AWD(obj);
+    Error *local_err = NULL;
+    uint32_t value;
+
+    visit_type_uint32(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    if (!value) {
+        error_setg(&local_err, "Property '%s.%s' requires a positive value",
+                   object_get_typename(obj), name);
+        goto out;
+    }
+    s->timeout = value;
+
+out:
+    error_propagate(errp, local_err);
+}
+
+static void awd_complete(UserCreatable *uc, Error **errp)
+{
+    AwdState *s = AWD(uc);
+
+    if (!s->awd_node || !s->iothread ||
+        !s->notification_node || !s->opt_script) {
+        error_setg(errp, "advanced-watchdog needs 'awd_node', "
+                   "'notification_node', 'opt_script' "
+                   "and 'server' property set");
+        return;
+    }
+
+    return;
+}
+
+static void awd_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+
+    ucc->complete = awd_complete;
+}
+
+static void awd_init(Object *obj)
+{
+    AwdState *s = AWD(obj);
+
+    object_property_add_str(obj, "awd_node",
+                            awd_get_node, awd_set_node,
+                            NULL);
+
+    object_property_add_str(obj, "notification_node",
+                            noti_get_node, noti_set_node,
+                            NULL);
+
+    object_property_add_str(obj, "opt_script",
+                            opt_script_get_node, opt_script_set_node,
+                            NULL);
+
+    object_property_add_bool(obj, "server",
+                             awd_get_server,
+                             awd_set_server, NULL);
+
+    object_property_add(obj, "pulse_interval", "uint32",
+                        awd_get_interval,
+                        awd_set_interval, NULL, NULL, NULL);
+
+    object_property_add(obj, "timeout", "uint32",
+                        awd_get_timeout,
+                        awd_set_timeout, NULL, NULL, NULL);
+
+    object_property_add_link(obj, "iothread", TYPE_IOTHREAD,
+                            (Object **)&s->iothread,
+                            object_property_allow_set_link,
+                            OBJ_PROP_LINK_STRONG, NULL);
+}
+
+static void awd_finalize(Object *obj)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->awd_node);
+    g_free(s->notification_node);
+}
+
+static const TypeInfo awd_info = {
+    .name = TYPE_AWD,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(AwdState),
+    .instance_init = awd_init,
+    .instance_finalize = awd_finalize,
+    .class_size = sizeof(AwdClass),
+    .class_init = awd_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&awd_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index 65c9473b73..40417afab5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4589,6 +4589,26 @@ Dump the network traffic on netdev @var{dev} to the file specified by
 The file format is libpcap, so it can be analyzed with tools such as tcpdump
 or Wireshark.
 
+@item -object advanced-watchdog,id=@var{id},awd_node=@var{chardevid},notification_node=@var{chardevid},server=@var{server},iothread=@var{id},opt_script=@var{path}[,pulse_interval=@var{time_ms},timeout=@var{time_ms}]
+
+Advanced Watch Dog is an universal monitoring module on VMM side, it can be used
+to detect network down(VMM to guest, VMM to VMM, VMM to another remote server)
+and do previously set operation. AWD(Advanced WatchDog) use awd_node
+@var{chardevid} parameter to connect with a -chardev node for heartbeat
+service, and the service use the server @var{server} parameter to divided into
+server side and client side. The iothread @var{id} parameter make AWD attach to
+iothread and run independently of the main loop. The pulse_interval @var{time_ms}
+and timeout @var{time_ms} are heartbeat service property, default property are
+pulse_interval=5000, timeout=2000. AWD use the notification_node @var{chardevid}
+attach another -chardev socket node to do previously set operation, user can
+setup the operation(user command) in opt_script file, AWD will open this script
+and send it to notification_node. It make user have basic VM/Host network
+monitoring tools and basic false tolerance and recovery solution.
+
+Usage cases:
+Send message to admin, notify another VMM, send qmp command to qemu do some
+operation like restart the VM, build VMM heartbeat system, etc.
+
 @item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},outdev=@var{chardevid},iothread=@var{id}[,vnet_hdr_support][,notify_dev=@var{id}]
 
 Colo-compare gets packet from primary_in@var{chardevid} and secondary_in@var{chardevid}, than compare primary packet with
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 2/5] net/awd.c: Initailize input/output chardev
  2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 1/5] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
@ 2019-12-17 12:45 ` Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 3/5] net/awd.c: Load advanced watch dog worker thread job Zhang Chen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Zhang Chen @ 2019-12-17 12:45 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Find and check the chardev awd_node and notification_node,
The awd_node used for keep connect with outside(like VM client/other
host/Remote server), and the notification_node used for do some
operation when disconnect event occur.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/awd.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/net/awd.c b/net/awd.c
index d42b4a7372..ad3d39c982 100644
--- a/net/awd.c
+++ b/net/awd.c
@@ -42,6 +42,8 @@ typedef struct AwdState {
     char *opt_script;
     uint32_t pulse_interval;
     uint32_t timeout;
+    CharBackend chr_awd_node;
+    CharBackend chr_notification_node;
     IOThread *iothread;
 } AwdState;
 
@@ -175,9 +177,30 @@ out:
     error_propagate(errp, local_err);
 }
 
+static int find_and_check_chardev(Chardev **chr,
+                                  char *chr_name,
+                                  Error **errp)
+{
+    *chr = qemu_chr_find(chr_name);
+    if (*chr == NULL) {
+        error_setg(errp, "Device '%s' not found",
+                   chr_name);
+        return 1;
+    }
+
+    if (!qemu_chr_has_feature(*chr, QEMU_CHAR_FEATURE_RECONNECTABLE)) {
+        error_setg(errp, "chardev \"%s\" is not reconnectable",
+                   chr_name);
+        return 1;
+    }
+
+    return 0;
+}
+
 static void awd_complete(UserCreatable *uc, Error **errp)
 {
     AwdState *s = AWD(uc);
+    Chardev *chr;
 
     if (!s->awd_node || !s->iothread ||
         !s->notification_node || !s->opt_script) {
@@ -187,6 +210,20 @@ static void awd_complete(UserCreatable *uc, Error **errp)
         return;
     }
 
+    if (find_and_check_chardev(&chr, s->awd_node, errp) ||
+        !qemu_chr_fe_init(&s->chr_awd_node, chr, errp)) {
+        error_setg(errp, "advanced-watchdog can't find chardev awd_node: %s",
+                   s->awd_node);
+        return;
+    }
+
+    if (find_and_check_chardev(&chr, s->notification_node, errp) ||
+        !qemu_chr_fe_init(&s->chr_notification_node, chr, errp)) {
+        error_setg(errp, "advanced-watchdog can't find "
+                   "chardev notification_node: %s", s->notification_node);
+        return;
+    }
+
     return;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 3/5] net/awd.c: Load advanced watch dog worker thread job
  2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 1/5] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 2/5] net/awd.c: Initailize input/output chardev Zhang Chen
@ 2019-12-17 12:45 ` Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 4/5] vl.c: Make Advanced Watch Dog delayed initialization Zhang Chen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Zhang Chen @ 2019-12-17 12:45 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

This patch load pulse_timer and timeout_timer in the new iothread.
The pulse timer will send pulse info to awd_node, and the timeout timer
will check the reply pulse from awd_node. If timeout occur, it will send
opt_script's data to the notification_node.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/awd.c | 193 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 193 insertions(+)

diff --git a/net/awd.c b/net/awd.c
index ad3d39c982..04f40e7cc8 100644
--- a/net/awd.c
+++ b/net/awd.c
@@ -40,17 +40,137 @@ typedef struct AwdState {
     char *awd_node;
     char *notification_node;
     char *opt_script;
+    char *opt_script_data;
     uint32_t pulse_interval;
     uint32_t timeout;
     CharBackend chr_awd_node;
     CharBackend chr_notification_node;
+    SocketReadState awd_rs;
+
+    QEMUTimer *pulse_timer;
+    QEMUTimer *timeout_timer;
     IOThread *iothread;
+    GMainContext *worker_context;
 } AwdState;
 
 typedef struct AwdClass {
     ObjectClass parent_class;
 } AwdClass;
 
+static int awd_chr_send(AwdState *s,
+                        const uint8_t *buf,
+                        uint32_t size)
+{
+    int ret = 0;
+    uint32_t len = htonl(size);
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = qemu_chr_fe_write_all(&s->chr_awd_node, (uint8_t *)&len,
+                                sizeof(len));
+    if (ret != sizeof(len)) {
+        goto err;
+    }
+
+    ret = qemu_chr_fe_write_all(&s->chr_awd_node, (uint8_t *)buf,
+                                size);
+    if (ret != size) {
+        goto err;
+    }
+
+    return 0;
+
+err:
+    return ret < 0 ? ret : -EIO;
+}
+
+static int awd_chr_can_read(void *opaque)
+{
+    return AWD_READ_LEN_MAX;
+}
+
+static void awd_node_in(void *opaque, const uint8_t *buf, int size)
+{
+    AwdState *s = AWD(opaque);
+    int ret;
+
+    ret = net_fill_rstate(&s->awd_rs, buf, size);
+    if (ret == -1) {
+        qemu_chr_fe_set_handlers(&s->chr_awd_node, NULL, NULL, NULL, NULL,
+                                 NULL, NULL, true);
+        error_report("advanced-watchdog get pulse error");
+    }
+}
+
+static void awd_send_pulse(void *opaque)
+{
+    AwdState *s = opaque;
+    char buf[] = "advanced-watchdog pulse";
+
+    awd_chr_send(s, (uint8_t *)buf, sizeof(buf));
+}
+
+static void awd_regular_pulse(void *opaque)
+{
+    AwdState *s = opaque;
+
+    awd_send_pulse(s);
+    timer_mod(s->pulse_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->pulse_interval);
+}
+
+static void awd_timeout(void *opaque)
+{
+    AwdState *s = opaque;
+    int ret = 0;
+
+    ret = qemu_chr_fe_write_all(&s->chr_notification_node,
+                                (uint8_t *)s->opt_script_data,
+                                strlen(s->opt_script_data));
+    if (ret) {
+        error_report("advanced-watchdog notification failure");
+    }
+}
+
+static void awd_timer_init(AwdState *s)
+{
+    AioContext *ctx = iothread_get_aio_context(s->iothread);
+
+    s->timeout_timer = aio_timer_new(ctx, QEMU_CLOCK_VIRTUAL, SCALE_MS,
+                                     awd_timeout, s);
+
+    s->pulse_timer = aio_timer_new(ctx, QEMU_CLOCK_VIRTUAL, SCALE_MS,
+                                      awd_regular_pulse, s);
+
+    if (!s->pulse_interval) {
+        s->pulse_interval = AWD_PULSE_INTERVAL_DEFAULT;
+    }
+
+    if (!s->timeout) {
+        s->timeout = AWD_TIMEOUT_DEFAULT;
+    }
+
+    timer_mod(s->pulse_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->pulse_interval);
+}
+
+static void awd_timer_del(AwdState *s)
+{
+    if (s->pulse_timer) {
+        timer_del(s->pulse_timer);
+        timer_free(s->pulse_timer);
+        s->pulse_timer = NULL;
+    }
+
+    if (s->timeout_timer) {
+        timer_del(s->timeout_timer);
+        timer_free(s->timeout_timer);
+        s->timeout_timer = NULL;
+    }
+ }
+
 static char *awd_get_node(Object *obj, Error **errp)
 {
     AwdState *s = AWD(obj);
@@ -177,6 +297,22 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void awd_rs_finalize(SocketReadState *awd_rs)
+{
+    AwdState *s = container_of(awd_rs, AwdState, awd_rs);
+
+    if (!s->server) {
+        char buf[] = "advanced-watchdog reply pulse";
+
+        awd_chr_send(s, (uint8_t *)buf, sizeof(buf));
+    }
+
+    timer_mod(s->timeout_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->timeout);
+
+    error_report("advanced-watchdog got message : %s", awd_rs->buf);
+}
+
 static int find_and_check_chardev(Chardev **chr,
                                   char *chr_name,
                                   Error **errp)
@@ -197,6 +333,46 @@ static int find_and_check_chardev(Chardev **chr,
     return 0;
 }
 
+static void awd_iothread(AwdState *s)
+{
+    object_ref(OBJECT(s->iothread));
+    s->worker_context = iothread_get_g_main_context(s->iothread);
+
+    qemu_chr_fe_set_handlers(&s->chr_awd_node, awd_chr_can_read,
+                             awd_node_in, NULL, NULL,
+                             s, s->worker_context, true);
+
+    awd_timer_init(s);
+}
+
+static int get_opt_script_data(AwdState *s)
+{
+    FILE *opt_fd;
+    long fsize;
+
+    opt_fd = fopen(s->opt_script, "r");
+    if (opt_fd == NULL) {
+        error_report("advanced-watchdog can't open "
+                     "opt_script: %s", s->opt_script);
+        return -1;
+    }
+
+    fseek(opt_fd, 0, SEEK_END);
+    fsize = ftell(opt_fd);
+    fseek(opt_fd, 0, SEEK_SET);
+    s->opt_script_data = malloc(fsize + 1);
+
+    if (!fread(s->opt_script_data, 1, fsize, opt_fd)) {
+        error_report("advanced-watchdog can't read "
+                     "opt_script: %s", s->opt_script);
+        return -1;
+    }
+
+    fclose(opt_fd);
+
+    return 0;
+}
+
 static void awd_complete(UserCreatable *uc, Error **errp)
 {
     AwdState *s = AWD(uc);
@@ -224,6 +400,16 @@ static void awd_complete(UserCreatable *uc, Error **errp)
         return;
     }
 
+    if (get_opt_script_data(s)) {
+        error_setg(errp, "advanced-watchdog can't get "
+                   "opt script data: %s", s->opt_script);
+        return;
+    }
+
+    net_socket_rs_init(&s->awd_rs, awd_rs_finalize, false);
+
+    awd_iothread(s);
+
     return;
 }
 
@@ -272,6 +458,13 @@ static void awd_finalize(Object *obj)
 {
     AwdState *s = AWD(obj);
 
+    qemu_chr_fe_deinit(&s->chr_awd_node, false);
+    qemu_chr_fe_deinit(&s->chr_notification_node, false);
+
+    if (s->iothread) {
+        awd_timer_del(s);
+    }
+
     g_free(s->awd_node);
     g_free(s->notification_node);
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 4/5] vl.c: Make Advanced Watch Dog delayed initialization
  2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
                   ` (2 preceding siblings ...)
  2019-12-17 12:45 ` [PATCH V4 3/5] net/awd.c: Load advanced watch dog worker thread job Zhang Chen
@ 2019-12-17 12:45 ` Zhang Chen
  2019-12-17 12:45 ` [PATCH V4 5/5] docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module Zhang Chen
  2020-01-07  4:32 ` [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang, Chen
  5 siblings, 0 replies; 16+ messages in thread
From: Zhang Chen @ 2019-12-17 12:45 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Advanced Watch Dog module needs chardev socket
to initialize properly before running.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 vl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/vl.c b/vl.c
index 6a65a64bfd..048fe458b9 100644
--- a/vl.c
+++ b/vl.c
@@ -2689,6 +2689,13 @@ static bool object_create_initial(const char *type, QemuOpts *opts)
         return false;
     }
 
+    /*
+     * Reason: Advanced Watch Dog property "chardev".
+     */
+    if (g_str_equal(type, "advanced-watchdog")) {
+        return false;
+    }
+
     /* Memory allocation by backends needs to be done
      * after configure_accelerator() (due to the tcg_enabled()
      * checks at memory_region_init_*()).
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 5/5] docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module
  2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
                   ` (3 preceding siblings ...)
  2019-12-17 12:45 ` [PATCH V4 4/5] vl.c: Make Advanced Watch Dog delayed initialization Zhang Chen
@ 2019-12-17 12:45 ` Zhang Chen
  2020-01-07  4:32 ` [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang, Chen
  5 siblings, 0 replies; 16+ messages in thread
From: Zhang Chen @ 2019-12-17 12:45 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Add docs to introduce Advanced WatchDog detail and usage.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 docs/awd.txt | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 docs/awd.txt

diff --git a/docs/awd.txt b/docs/awd.txt
new file mode 100644
index 0000000000..0ce513be5a
--- /dev/null
+++ b/docs/awd.txt
@@ -0,0 +1,88 @@
+Advanced Watch Dog (AWD)
+========================
+Copyright (c) 2019 Intel Corporation.
+Author: Zhang Chen <chen.zhang@intel.com>
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Introduction
+------------
+
+Advanced Watch Dog is an universal monitoring module on VMM side, it can be used
+to detect network issues(VMM to guest, VMM to VMM, VMM to another remote server)
+and do previously set operation. Current AWD accept any input as the signal
+to refresh the watchdog timer, and we can also make a certain interactive
+protocol here. Users can pre-write some command or some messages in the
+AWD opt-script as the notification output. We noticed that there is no way
+for VMM communicate directly, so we engaged with real customer found that they
+need a lightweight and efficient mechanism to solve some practical problems,
+for example Edge Computing cases(they think high level software is too heavy
+to use in Edge or it is hard to manage and combine with VM instance).
+It make user have basic VM/Host network monitoring tools and basic false
+tolerance and recovery solution.
+
+Use case
+--------
+
+1. Monitor local guest status.
+Running a simple application in guest for send signal to the local AWD module,
+if timeout occur, AWD will notify high level admin or do some previously set
+operation. For example send exit command to local QMP interface or qemu monitor.
+
+2. Monitor other VMM.
+AWD module can be connected to each other to build heartbeat service.
+
+3. Monitor other remote service.
+In some cases, remote service have certain relationship with current VM. If
+network connection have some issue, AWD can do some urgent operation like reboot
+local VM. etc...
+
+AWD usage
+---------
+
+User must "--enable-awd" in Qemu configuration.
+
+1. Monitor local guest status.
+
+-chardev socket,id=detection,host=0.0.0.0,port=9009,server,nowait
+-chardev socket,id=notification,host=127.0.0.1,port=4445
+-object iothread,id=iothread1
+-object advanced-watchdog,id=awd1,server=on,awd_node=detection,notification_node=notification,opt_script=colo_opt_script,iothread=iothread1,pulse_interval=1000,timeout=5000
+-monitor tcp::4445,server,nowait
+
+qemu_opt_script:
+quit
+
+Guest service need connect to detection node, admin can check notification node
+to get message when timeout occur.
+
+2. Monitor other VMM.
+
+Demo usage(for COLO heartbeat service):
+
+In primary node:
+
+-chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
+-chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
+-object iothread,id=iothread1
+-object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_primary_opt_script,iothread=iothread1,pulse_interval=1000,timeout=5000
+
+colo_primary_opt_script:
+x_colo_lost_heartbeat
+
+In secondary node:
+
+-monitor tcp::4445,server,nowait
+-chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
+-chardev socket,id=heart1,host=3.3.3.8,port=4445
+-object iothread,id=iothread1
+-object advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000
+
+colo_secondary_opt_script:
+nbd_server_stop
+x_colo_lost_heartbeat
+
+3. Monitor other remote service.
+
+Same like monitor local guest except detection node and notification node.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
                   ` (4 preceding siblings ...)
  2019-12-17 12:45 ` [PATCH V4 5/5] docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module Zhang Chen
@ 2020-01-07  4:32 ` Zhang, Chen
  2020-01-19  9:10   ` Zhang, Chen
  5 siblings, 1 reply; 16+ messages in thread
From: Zhang, Chen @ 2020-01-07  4:32 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen

Hi All,

No news for a while about this series.

This version already add new docs to address Paolo's comments.

Please give me more comments.


Thanks

Zhang Chen


On 12/17/2019 8:45 PM, Zhang, Chen wrote:
> From: Zhang Chen <chen.zhang@intel.com>
>
> Advanced Watch Dog is an universal monitoring module on VMM side, it can be used
> to detect network down(VMM to guest, VMM to VMM, VMM to another remote server)
> and do previously set operation. Current AWD patch just accept any input as the
> signal to refresh the watchdog timer, and we can also make a certain interactive
> protocol here. For the outputs, user can pre-write some command or some messages
> in the AWD opt-script. We noticed that there is no way for VMM communicate
> directly, maybe some people think we don't need such things(up layer
> software like openstack can handle it). so we engaged with real customer found
> that they need a lightweight and efficient mechanism to solve some practical problems,
>
> For example Edge Computing cases(they think high level software is too heavy
> to use in Edge or it is hard to manage and combine with VM instance).
> It make user have basic VM/Host network monitoring tools and basic false
> tolerance and recovery solution..
>
> Please see the detail documentation in the last patch.
>
> V4:
>   - Add more introduction in qemu-options.hx
>   - Addressed Paolo's comments add docs/awd.txt for the AWD module detail.
>
> V3:
>   - Rebased on Qemu 4.2.0-rc1 code.
>   - Fix commit message issue.
>
> V2:
>   - Addressed Philippe comments add configure selector for AWD.
>
> Initial:
>   - Initial version.
>
>
> Zhang Chen (5):
>    net/awd.c: Introduce Advanced Watch Dog module framework
>    net/awd.c: Initailize input/output chardev
>    net/awd.c: Load advanced watch dog worker thread job
>    vl.c: Make Advanced Watch Dog delayed initialization
>    docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module
>
>   configure         |   9 +
>   docs/awd.txt      |  88 +++++++++
>   net/Makefile.objs |   1 +
>   net/awd.c         | 491 ++++++++++++++++++++++++++++++++++++++++++++++
>   qemu-options.hx   |  20 ++
>   vl.c              |   7 +
>   6 files changed, 616 insertions(+)
>   create mode 100644 docs/awd.txt
>   create mode 100644 net/awd.c
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-01-07  4:32 ` [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang, Chen
@ 2020-01-19  9:10   ` Zhang, Chen
  2020-01-20  2:56     ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Chen @ 2020-01-19  9:10 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen

Hi~

Anyone have comments about this module?
We have some clients already try to use this module with COLO. Please review this part.
If no one want to maintain this module, I can maintain this module myself.

Thanks
Zhang Chen

> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+chen.zhang=intel.com@nongnu.org> On Behalf Of Zhang, Chen
> Sent: Tuesday, January 7, 2020 12:33 PM
> To: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>
> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
> 
> Hi All,
> 
> No news for a while about this series.
> 
> This version already add new docs to address Paolo's comments.
> 
> Please give me more comments.
> 
> 
> Thanks
> 
> Zhang Chen
> 
> 
> On 12/17/2019 8:45 PM, Zhang, Chen wrote:
> > From: Zhang Chen <chen.zhang@intel.com>
> >
> > Advanced Watch Dog is an universal monitoring module on VMM side, it
> > can be used to detect network down(VMM to guest, VMM to VMM, VMM
> to
> > another remote server) and do previously set operation. Current AWD
> > patch just accept any input as the signal to refresh the watchdog
> > timer, and we can also make a certain interactive protocol here. For
> > the outputs, user can pre-write some command or some messages in the
> > AWD opt-script. We noticed that there is no way for VMM communicate
> > directly, maybe some people think we don't need such things(up layer
> > software like openstack can handle it). so we engaged with real
> > customer found that they need a lightweight and efficient mechanism to
> > solve some practical problems,
> >
> > For example Edge Computing cases(they think high level software is too
> > heavy to use in Edge or it is hard to manage and combine with VM instance).
> > It make user have basic VM/Host network monitoring tools and basic
> > false tolerance and recovery solution..
> >
> > Please see the detail documentation in the last patch.
> >
> > V4:
> >   - Add more introduction in qemu-options.hx
> >   - Addressed Paolo's comments add docs/awd.txt for the AWD module
> detail.
> >
> > V3:
> >   - Rebased on Qemu 4.2.0-rc1 code.
> >   - Fix commit message issue.
> >
> > V2:
> >   - Addressed Philippe comments add configure selector for AWD.
> >
> > Initial:
> >   - Initial version.
> >
> >
> > Zhang Chen (5):
> >    net/awd.c: Introduce Advanced Watch Dog module framework
> >    net/awd.c: Initailize input/output chardev
> >    net/awd.c: Load advanced watch dog worker thread job
> >    vl.c: Make Advanced Watch Dog delayed initialization
> >    docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module
> >
> >   configure         |   9 +
> >   docs/awd.txt      |  88 +++++++++
> >   net/Makefile.objs |   1 +
> >   net/awd.c         | 491
> ++++++++++++++++++++++++++++++++++++++++++++++
> >   qemu-options.hx   |  20 ++
> >   vl.c              |   7 +
> >   6 files changed, 616 insertions(+)
> >   create mode 100644 docs/awd.txt
> >   create mode 100644 net/awd.c
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-01-19  9:10   ` Zhang, Chen
@ 2020-01-20  2:56     ` Jason Wang
  2020-02-11  8:58       ` Zhang, Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2020-01-20  2:56 UTC (permalink / raw)
  To: Zhang, Chen, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen


On 2020/1/19 下午5:10, Zhang, Chen wrote:
> Hi~
>
> Anyone have comments about this module?


Hi Chen:

I will take a look at this series.

Two general questions:

- if it can detect more than network stall, it should not belong to /net
- need to convince libvirt guys for this proposal, since usually it's 
the duty of upper layer instead of qemu itself

Thanks


> We have some clients already try to use this module with COLO. Please review this part.
> If no one want to maintain this module, I can maintain this module myself.
>
> Thanks
> Zhang Chen
>
>> -----Original Message-----
>> From: Qemu-devel <qemu-devel-
>> bounces+chen.zhang=intel.com@nongnu.org> On Behalf Of Zhang, Chen
>> Sent: Tuesday, January 7, 2020 12:33 PM
>> To: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
>> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
>> qemu-dev <qemu-devel@nongnu.org>
>> Cc: Zhang Chen <zhangckid@gmail.com>
>> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
>>
>> Hi All,
>>
>> No news for a while about this series.
>>
>> This version already add new docs to address Paolo's comments.
>>
>> Please give me more comments.
>>
>>
>> Thanks
>>
>> Zhang Chen
>>
>>
>> On 12/17/2019 8:45 PM, Zhang, Chen wrote:
>>> From: Zhang Chen <chen.zhang@intel.com>
>>>
>>> Advanced Watch Dog is an universal monitoring module on VMM side, it
>>> can be used to detect network down(VMM to guest, VMM to VMM, VMM
>> to
>>> another remote server) and do previously set operation. Current AWD
>>> patch just accept any input as the signal to refresh the watchdog
>>> timer, and we can also make a certain interactive protocol here. For
>>> the outputs, user can pre-write some command or some messages in the
>>> AWD opt-script. We noticed that there is no way for VMM communicate
>>> directly, maybe some people think we don't need such things(up layer
>>> software like openstack can handle it). so we engaged with real
>>> customer found that they need a lightweight and efficient mechanism to
>>> solve some practical problems,
>>>
>>> For example Edge Computing cases(they think high level software is too
>>> heavy to use in Edge or it is hard to manage and combine with VM instance).
>>> It make user have basic VM/Host network monitoring tools and basic
>>> false tolerance and recovery solution..
>>>
>>> Please see the detail documentation in the last patch.
>>>
>>> V4:
>>>    - Add more introduction in qemu-options.hx
>>>    - Addressed Paolo's comments add docs/awd.txt for the AWD module
>> detail.
>>> V3:
>>>    - Rebased on Qemu 4.2.0-rc1 code.
>>>    - Fix commit message issue.
>>>
>>> V2:
>>>    - Addressed Philippe comments add configure selector for AWD.
>>>
>>> Initial:
>>>    - Initial version.
>>>
>>>
>>> Zhang Chen (5):
>>>     net/awd.c: Introduce Advanced Watch Dog module framework
>>>     net/awd.c: Initailize input/output chardev
>>>     net/awd.c: Load advanced watch dog worker thread job
>>>     vl.c: Make Advanced Watch Dog delayed initialization
>>>     docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module
>>>
>>>    configure         |   9 +
>>>    docs/awd.txt      |  88 +++++++++
>>>    net/Makefile.objs |   1 +
>>>    net/awd.c         | 491
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>    qemu-options.hx   |  20 ++
>>>    vl.c              |   7 +
>>>    6 files changed, 616 insertions(+)
>>>    create mode 100644 docs/awd.txt
>>>    create mode 100644 net/awd.c
>>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-01-20  2:56     ` Jason Wang
@ 2020-02-11  8:58       ` Zhang, Chen
  2020-02-12  2:56         ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Chen @ 2020-02-11  8:58 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen



> -----Original Message-----
> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, January 20, 2020 10:57 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>
> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
> 
> 
> On 2020/1/19 下午5:10, Zhang, Chen wrote:
> > Hi~
> >
> > Anyone have comments about this module?
> 
> 
> Hi Chen:
> 
> I will take a look at this series.

Sorry for slow reply due to CNY and extend leave.
OK, waiting your comments~ Thanks~

> 
> Two general questions:
> 
> - if it can detect more than network stall, it should not belong to /net

This module use network connection status to detect all the issue(Host to Guest/Host to Host/Host to Admin...).
The target is more than network but all use network way. So it is looks a tricky problem.

> - need to convince libvirt guys for this proposal, since usually it's the duty of
> upper layer instead of qemu itself
> 

Yes, It looks a upper layer responsibility, but In the cover latter I have explained the reason why we need this in Qemu.
 try to make this module as simple as possible. This module give upper layer software a new way to connect/monitoring Qemu.
And due to all the COLO code implement in Qemu side, Many customer want to use this FT solution without other dependencies,
it is very easy to integrated to real product. 

Thanks
Zhang Chen

> Thanks
> 
> 
> > We have some clients already try to use this module with COLO. Please
> review this part.
> > If no one want to maintain this module, I can maintain this module myself.
> >
> > Thanks
> > Zhang Chen
> >
> >> -----Original Message-----
> >> From: Qemu-devel <qemu-devel-
> >> bounces+chen.zhang=intel.com@nongnu.org> On Behalf Of Zhang, Chen
> >> Sent: Tuesday, January 7, 2020 12:33 PM
> >> To: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> >> <pbonzini@redhat.com>; Philippe Mathieu-Daudé
> <philmd@redhat.com>;
> >> qemu-dev <qemu-devel@nongnu.org>
> >> Cc: Zhang Chen <zhangckid@gmail.com>
> >> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
> >>
> >> Hi All,
> >>
> >> No news for a while about this series.
> >>
> >> This version already add new docs to address Paolo's comments.
> >>
> >> Please give me more comments.
> >>
> >>
> >> Thanks
> >>
> >> Zhang Chen
> >>
> >>
> >> On 12/17/2019 8:45 PM, Zhang, Chen wrote:
> >>> From: Zhang Chen <chen.zhang@intel.com>
> >>>
> >>> Advanced Watch Dog is an universal monitoring module on VMM side, it
> >>> can be used to detect network down(VMM to guest, VMM to VMM,
> VMM
> >> to
> >>> another remote server) and do previously set operation. Current AWD
> >>> patch just accept any input as the signal to refresh the watchdog
> >>> timer, and we can also make a certain interactive protocol here. For
> >>> the outputs, user can pre-write some command or some messages in
> the
> >>> AWD opt-script. We noticed that there is no way for VMM communicate
> >>> directly, maybe some people think we don't need such things(up layer
> >>> software like openstack can handle it). so we engaged with real
> >>> customer found that they need a lightweight and efficient mechanism
> >>> to solve some practical problems,
> >>>
> >>> For example Edge Computing cases(they think high level software is
> >>> too heavy to use in Edge or it is hard to manage and combine with VM
> instance).
> >>> It make user have basic VM/Host network monitoring tools and basic
> >>> false tolerance and recovery solution..
> >>>
> >>> Please see the detail documentation in the last patch.
> >>>
> >>> V4:
> >>>    - Add more introduction in qemu-options.hx
> >>>    - Addressed Paolo's comments add docs/awd.txt for the AWD module
> >> detail.
> >>> V3:
> >>>    - Rebased on Qemu 4.2.0-rc1 code.
> >>>    - Fix commit message issue.
> >>>
> >>> V2:
> >>>    - Addressed Philippe comments add configure selector for AWD.
> >>>
> >>> Initial:
> >>>    - Initial version.
> >>>
> >>>
> >>> Zhang Chen (5):
> >>>     net/awd.c: Introduce Advanced Watch Dog module framework
> >>>     net/awd.c: Initailize input/output chardev
> >>>     net/awd.c: Load advanced watch dog worker thread job
> >>>     vl.c: Make Advanced Watch Dog delayed initialization
> >>>     docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD)
> module
> >>>
> >>>    configure         |   9 +
> >>>    docs/awd.txt      |  88 +++++++++
> >>>    net/Makefile.objs |   1 +
> >>>    net/awd.c         | 491
> >> ++++++++++++++++++++++++++++++++++++++++++++++
> >>>    qemu-options.hx   |  20 ++
> >>>    vl.c              |   7 +
> >>>    6 files changed, 616 insertions(+)
> >>>    create mode 100644 docs/awd.txt
> >>>    create mode 100644 net/awd.c
> >>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-02-11  8:58       ` Zhang, Chen
@ 2020-02-12  2:56         ` Jason Wang
  2020-02-20  3:36           ` Zhang, Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2020-02-12  2:56 UTC (permalink / raw)
  To: Zhang, Chen, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen


On 2020/2/11 下午4:58, Zhang, Chen wrote:
>> -----Original Message-----
>> From: Jason Wang<jasowang@redhat.com>
>> Sent: Monday, January 20, 2020 10:57 AM
>> To: Zhang, Chen<chen.zhang@intel.com>; Paolo Bonzini
>> <pbonzini@redhat.com>; Philippe Mathieu-Daudé<philmd@redhat.com>;
>> qemu-dev<qemu-devel@nongnu.org>
>> Cc: Zhang Chen<zhangckid@gmail.com>
>> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
>>
>>
>> On 2020/1/19 下午5:10, Zhang, Chen wrote:
>>> Hi~
>>>
>>> Anyone have comments about this module?
>> Hi Chen:
>>
>> I will take a look at this series.
> Sorry for slow reply due to CNY and extend leave.
> OK, waiting your comments~ Thanks~
>
>> Two general questions:
>>
>> - if it can detect more than network stall, it should not belong to /net
> This module use network connection status to detect all the issue(Host to Guest/Host to Host/Host to Admin...).
> The target is more than network but all use network way. So it is looks a tricky problem.


Ok.


>
>> - need to convince libvirt guys for this proposal, since usually it's the duty of
>> upper layer instead of qemu itself
>>
> Yes, It looks a upper layer responsibility, but In the cover latter I have explained the reason why we need this in Qemu.
>   try to make this module as simple as possible. This module give upper layer software a new way to connect/monitoring Qemu.
> And due to all the COLO code implement in Qemu side, Many customer want to use this FT solution without other dependencies,
> it is very easy to integrated to real product.
>
> Thanks
> Zhang Chen


I would like to hear from libvirt about such design.

Thanks

>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-02-12  2:56         ` Jason Wang
@ 2020-02-20  3:36           ` Zhang, Chen
  2020-03-04  8:06             ` Zhang, Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Chen @ 2020-02-20  3:36 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-dev, Eric Blake, libvir-list
  Cc: Zhang Chen


On 2/12/2020 10:56 AM, Jason Wang wrote:
> On 2020/2/11 下午4:58, Zhang, Chen wrote:
>>> -----Original Message-----
>>> From: Jason Wang<jasowang@redhat.com>
>>> Sent: Monday, January 20, 2020 10:57 AM
>>> To: Zhang, Chen<chen.zhang@intel.com>; Paolo Bonzini
>>> <pbonzini@redhat.com>; Philippe Mathieu-Daudé<philmd@redhat.com>;
>>> qemu-dev<qemu-devel@nongnu.org>
>>> Cc: Zhang Chen<zhangckid@gmail.com>
>>> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
>>>
>>>
>>> On 2020/1/19 下午5:10, Zhang, Chen wrote:
>>>> Hi~
>>>>
>>>> Anyone have comments about this module?
>>> Hi Chen:
>>>
>>> I will take a look at this series.
>> Sorry for slow reply due to CNY and extend leave.
>> OK, waiting your comments~ Thanks~
>>
>>> Two general questions:
>>>
>>> - if it can detect more than network stall, it should not belong to /net
>> This module use network connection status to detect all the issue(Host to Guest/Host to Host/Host to Admin...).
>> The target is more than network but all use network way. So it is looks a tricky problem.
>
> Ok.
>
>
>>> - need to convince libvirt guys for this proposal, since usually it's the duty of
>>> upper layer instead of qemu itself
>>>
>> Yes, It looks a upper layer responsibility, but In the cover latter I have explained the reason why we need this in Qemu.
>>    try to make this module as simple as possible. This module give upper layer software a new way to connect/monitoring Qemu.
>> And due to all the COLO code implement in Qemu side, Many customer want to use this FT solution without other dependencies,
>> it is very easy to integrated to real product.
>>
>> Thanks
>> Zhang Chen
>
> I would like to hear from libvirt about such design.


Hi Jason,

OK. I add the libvirt mailing list in this thread.

The full mail discussion and patches:

https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg02611.html


By the way, I noticed Eric is libvirt maintianer.

Hi Eric and Paolo, Can you give some comments about this series?


Thanks

Zhang Chen


>
> Thanks
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-02-20  3:36           ` Zhang, Chen
@ 2020-03-04  8:06             ` Zhang, Chen
  2020-03-04 13:37               ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Chen @ 2020-03-04  8:06 UTC (permalink / raw)
  To: Zhang, Chen, Jason Wang, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	qemu-dev, Eric Blake, libvir-list
  Cc: Zhang Chen

> >>> Subject: Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
> >>>
> >>>
> >>> On 2020/1/19 下午5:10, Zhang, Chen wrote:
> >>>> Hi~
> >>>>
> >>>> Anyone have comments about this module?
> >>> Hi Chen:
> >>>
> >>> I will take a look at this series.
> >> Sorry for slow reply due to CNY and extend leave.
> >> OK, waiting your comments~ Thanks~
> >>
> >>> Two general questions:
> >>>
> >>> - if it can detect more than network stall, it should not belong to
> >>> /net
> >> This module use network connection status to detect all the issue(Host to
> Guest/Host to Host/Host to Admin...).
> >> The target is more than network but all use network way. So it is looks a
> tricky problem.
> >
> > Ok.
> >
> >
> >>> - need to convince libvirt guys for this proposal, since usually
> >>> it's the duty of upper layer instead of qemu itself
> >>>
> >> Yes, It looks a upper layer responsibility, but In the cover latter I have
> explained the reason why we need this in Qemu.
> >>    try to make this module as simple as possible. This module give upper
> layer software a new way to connect/monitoring Qemu.
> >> And due to all the COLO code implement in Qemu side, Many customer
> >> want to use this FT solution without other dependencies, it is very easy to
> integrated to real product.
> >>
> >> Thanks
> >> Zhang Chen
> >
> > I would like to hear from libvirt about such design.
> 
> 
> Hi Jason,
> 
> OK. I add the libvirt mailing list in this thread.
> 
> The full mail discussion and patches:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg02611.html
> 
> 
> By the way, I noticed Eric is libvirt maintianer.
> 
> Hi Eric and Paolo, Can you give some comments about this series?
> 
> 

No news for a while...
We already have some users(Cloud Service Provider) try to use is module in their product.
But they also need to follow the Qemu upstream code.

Thanks
Zhang Chen


> Thanks
> 
> Zhang Chen
> 
> 
> >
> > Thanks
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-03-04  8:06             ` Zhang, Chen
@ 2020-03-04 13:37               ` Paolo Bonzini
  2020-03-09  9:32                 ` Zhang, Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2020-03-04 13:37 UTC (permalink / raw)
  To: Zhang, Chen, Jason Wang, Philippe Mathieu-Daudé,
	qemu-dev, Eric Blake, libvir-list
  Cc: Zhang Chen

On 04/03/20 09:06, Zhang, Chen wrote:
>> Hi Eric and Paolo, Can you give some comments about this series?
>>
>>
> No news for a while...
> We already have some users(Cloud Service Provider) try to use is module in their product.
> But they also need to follow the Qemu upstream code.

My main comment about this series is that it's not clear why it is
needed and how to use it.  The documentation includes a demo, but no
description of what is an awd_node, a notification_node and an
opt_script.  I can more or less understand the notification_node and
opt_script role from the documentation, but not entirely because, for
example, the two-host demo has hardcoded IP addresses without saying
which host is which IP address.

The documentation does not describe the protocol, which is absolutely
necessary, and does not describe _why_ the protocol was designed like
that.  Without such documentation it's not clear if, for example, the
watchdog protocol could be implemented as QMP commands (e.g.
start-watchdog, stop-watchdog, notify-watchdog).  Another possibility
could be to use the systemd watchdog protocol, which consists of
essentially three commands (WATCHDOG=1, WATCHDOG=trigger,
WATCHDOG_USEC=...) which are transmitted as datagrams.  Documentation is
important for reviewers to judge the merits of the protocol without (or
before) diving into the code.

In the demo, the opt_script mechanism is currently using the "human"
monitor as opposed to QMP.  The human monitor interface is not stable
and not meant for consumption by management interface.  It is not clear
if this is just a sample usage, and in practice the notification_node
would be outside of QEMU, or not.  In general I would prefer to have the
script as an optional feature, and report the triggering of the watchdog
via QMP events.

Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-03-04 13:37               ` Paolo Bonzini
@ 2020-03-09  9:32                 ` Zhang, Chen
  2020-03-12 15:52                   ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Chen @ 2020-03-09  9:32 UTC (permalink / raw)
  To: Paolo Bonzini, Jason Wang, Philippe Mathieu-Daudé,
	qemu-dev, Eric Blake, libvir-list
  Cc: Zhang Chen


On 3/4/2020 9:37 PM, Paolo Bonzini wrote:
> On 04/03/20 09:06, Zhang, Chen wrote:
>>> Hi Eric and Paolo, Can you give some comments about this series?
>>>
>>>
>> No news for a while...
>> We already have some users(Cloud Service Provider) try to use is module in their product.
>> But they also need to follow the Qemu upstream code.
> My main comment about this series is that it's not clear why it is
> needed and how to use it.  The documentation includes a demo, but no
> description of what is an awd_node, a notification_node and an
> opt_script.  I can more or less understand the notification_node and
> opt_script role from the documentation, but not entirely because, for
> example, the two-host demo has hardcoded IP addresses without saying
> which host is which IP address.

Hi Paolo,

Sorry for slow reply and thank you for your comments.

Let me summarize your main opinions and methods:

1. Why AWD is needed.

Advanced Watch Dog is an universal monitoring module on VMM side, it can 
be used to detect network down(VMM to guest, VMM to VMM, VMM to another 
remote server) and do previously set operation. Current AWD patch just 
accept any input as the signal to refresh the watchdog timer, and we can 
also make a certain interactive protocol here. For the outputs, user can 
pre-write some command or some messages in the AWD opt-script. We 
noticed that there is no way for VMM communicate directly, maybe some 
people think we don't need such things(up layer software like openstack 
can handle it). so we engaged with real customer found that they need a 
lightweight and efficient mechanism to solve some practical problems,

For example Edge Computing cases(they think high level software is too 
heavy to use in Edge or it is hard to manage and combine with VM instance).
It make user have basic VM/Host network monitoring tools and basic false 
tolerance and recovery solution.

For COLO FT/HA solution, we already have some CSPs try to use AWD with COLO.

2. Documentation issues, include how to use it.

I will address all your comments and complete details about documentation.

3. Communication protocol issue.

Current AWD without any protocol, any data it gets will be considered a 
heartbeat signal.

I think use QMP format is good for me.

4. Implementation issue.

The AWD script as an optional feature is OK for me.

And report the triggering of the watchdog via QMP events is enough for 
current usage.

But it looks have limitation to notify outside Qemu. I don't know which 
is better choice.

If the QMP events solution is better, I will fix it in next version.


I don't know if I understand your means correctly.

Please give me more guidance on this series.  :-)

Thanks

Zhang Chen


>
> The documentation does not describe the protocol, which is absolutely
> necessary, and does not describe _why_ the protocol was designed like
> that.  Without such documentation it's not clear if, for example, the
> watchdog protocol could be implemented as QMP commands (e.g.
> start-watchdog, stop-watchdog, notify-watchdog).  Another possibility
> could be to use the systemd watchdog protocol, which consists of
> essentially three commands (WATCHDOG=1, WATCHDOG=trigger,
> WATCHDOG_USEC=...) which are transmitted as datagrams.  Documentation is
> important for reviewers to judge the merits of the protocol without (or
> before) diving into the code.
>
> In the demo, the opt_script mechanism is currently using the "human"
> monitor as opposed to QMP.  The human monitor interface is not stable
> and not meant for consumption by management interface.  It is not clear
> if this is just a sample usage, and in practice the notification_node
> would be outside of QEMU, or not.  In general I would prefer to have the
> script as an optional feature, and report the triggering of the watchdog
> via QMP events.
>
> Paolo
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module
  2020-03-09  9:32                 ` Zhang, Chen
@ 2020-03-12 15:52                   ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2020-03-12 15:52 UTC (permalink / raw)
  To: Zhang, Chen, Jason Wang, Philippe Mathieu-Daudé,
	qemu-dev, Eric Blake, libvir-list
  Cc: Zhang Chen

On 09/03/20 10:32, Zhang, Chen wrote:
> 4. Implementation issue.
> 
> The AWD script as an optional feature is OK for me.
> 
> And report the triggering of the watchdog via QMP events is enough for
> current usage.
> 
> But it looks have limitation to notify outside Qemu. I don't know which
> is better choice.
> 
> If the QMP events solution is better, I will fix it in next version.

Good, thanks.

Naming-wise, it's ugly that we already have a WATCHDOG event for guest
watchdog devices.  The following design however should allow setting up
multiple watchdogs

- Creating a watchdog from the command line:

-object watchdog,id=STR,timeout=NNN,chardev=CHR

and object_add/object-add can also be used for HMP and QMP.

- Reporting a watchdog timeout via QMP:

{ 'event': 'WATCHDOG_TIMEOUT',
  'data': { 'id': 'str' } }

- Protocol: the data sent on the chardev to QEMU must be

WATCHDOG=1

optionally followed by exactly one \n character.  All other data is ignored.

Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-12 15:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 12:45 [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang Chen
2019-12-17 12:45 ` [PATCH V4 1/5] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
2019-12-17 12:45 ` [PATCH V4 2/5] net/awd.c: Initailize input/output chardev Zhang Chen
2019-12-17 12:45 ` [PATCH V4 3/5] net/awd.c: Load advanced watch dog worker thread job Zhang Chen
2019-12-17 12:45 ` [PATCH V4 4/5] vl.c: Make Advanced Watch Dog delayed initialization Zhang Chen
2019-12-17 12:45 ` [PATCH V4 5/5] docs/awd.txt: Add doc to introduce Advanced WatchDog(AWD) module Zhang Chen
2020-01-07  4:32 ` [PATCH V4 0/5] Introduce Advanced Watch Dog module Zhang, Chen
2020-01-19  9:10   ` Zhang, Chen
2020-01-20  2:56     ` Jason Wang
2020-02-11  8:58       ` Zhang, Chen
2020-02-12  2:56         ` Jason Wang
2020-02-20  3:36           ` Zhang, Chen
2020-03-04  8:06             ` Zhang, Chen
2020-03-04 13:37               ` Paolo Bonzini
2020-03-09  9:32                 ` Zhang, Chen
2020-03-12 15:52                   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).