qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/4] Introduce Advanced Watch Dog module
@ 2019-11-01  2:48 Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 1/4] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Zhang Chen @ 2019-11-01  2:48 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Advanced Watch Dog is an universal monitoring module on VMM side, it can be used to detect network down(VMM to guest, VMM to VMM, VMM to another remote server) and do previously set operation. Current AWD patch just accept any input as the signal to refresh the watchdog timer,
and we can also make a certain interactive protocol here. For the output user can pre-write
some command or some messages in the AWD opt-script. We noticed that there is no way
for VMM communicate directly, maybe some people think we don't need such things(up layer
software like openstack can handle it). But we engaged with real customer found that in some cases,they need a lightweight and efficient mechanism to solve some practical problems(openstack is too heavy).
for example: When it detects lost connection with the paired node,it will send message to admin, notify another VMM, send qmp command to qemu do some operation like restart the VM, build VMM heartbeat system, etc.
It make user have basic VM/Host network monitoring tools and basic false tolerance and recovery solution.

Demo usage(for COLO heartbeat service):

In primary node:

-chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
-chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
-object iothread,id=iothread2
-object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000

In secondary node:

-monitor tcp::4445,server,nowait
-chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
-chardev socket,id=heart1,host=3.3.3.8,port=4445
-object iothread,id=iothread1
-object advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000


V2:
 - Addressed Philippe comments add configure selector for AWD.

Initial:
 - Initial version.

Zhang Chen (4):
  net/awd.c: Introduce Advanced Watch Dog module framework
  net/awd.c: Initailize input/output chardev
  net/awd.c: Load advanced watch dog worker thread job
  vl.c: Make Advanced Watch Dog delayed initialization

 configure         |   9 +
 net/Makefile.objs |   1 +
 net/awd.c         | 491 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx   |   6 +
 vl.c              |   7 +
 5 files changed, 514 insertions(+)
 create mode 100644 net/awd.c

-- 
2.17.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2 1/4] net/awd.c: Introduce Advanced Watch Dog module framework
  2019-11-01  2:48 [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang Chen
@ 2019-11-01  2:48 ` Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 2/4] net/awd.c: Initailize input/output chardev Zhang Chen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Zhang Chen @ 2019-11-01  2:48 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

This patch introduce a new module named Advanced Watch Dog,
and defined the input and output parameter. AWD use standard chardev
as the way of communicationg with the outside world.
If you want to use it, please add "--enable-awd" when configure qemu.

Demo command:
-object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 configure         |   9 ++
 net/Makefile.objs |   1 +
 net/awd.c         | 261 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx   |   6 ++
 4 files changed, 277 insertions(+)
 create mode 100644 net/awd.c

diff --git a/configure b/configure
index 72553f98ea..a857a8c2d7 100755
--- a/configure
+++ b/configure
@@ -383,6 +383,7 @@ vhost_scsi=""
 vhost_vsock=""
 vhost_user=""
 vhost_user_fs=""
+awd="no"
 kvm="no"
 hax="no"
 hvf="no"
@@ -1303,6 +1304,10 @@ for opt do
   ;;
   --enable-vhost-user-fs) vhost_user_fs="yes"
   ;;
+  --disable-awd) awd="no"
+  ;;
+  --enable-awd) awd="yes"
+  ;;
   --disable-opengl) opengl="no"
   ;;
   --enable-opengl) opengl="yes"
@@ -1779,6 +1784,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   vhost-crypto    vhost-user-crypto backend support
   vhost-kernel    vhost kernel backend support
   vhost-user      vhost-user backend support
+  awd             Advanced Watch Dog support
   spice           spice
   rbd             rados block device (rbd)
   libiscsi        iscsi support
@@ -7002,6 +7008,9 @@ fi
 if test "$vhost_user" = "yes" ; then
   echo "CONFIG_VHOST_USER=y" >> $config_host_mak
 fi
+if test "$awd" = "yes" ; then
+  echo "CONFIG_AWD=y" >> $config_host_mak
+fi
 if test "$vhost_user_fs" = "yes" ; then
   echo "CONFIG_VHOST_USER_FS=y" >> $config_host_mak
 fi
diff --git a/net/Makefile.objs b/net/Makefile.objs
index c5d076d19c..187e655443 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -19,6 +19,7 @@ common-obj-y += colo-compare.o
 common-obj-y += colo.o
 common-obj-y += filter-rewriter.o
 common-obj-y += filter-replay.o
+common-obj-$(CONFIG_AWD) += awd.o
 
 tap-obj-$(CONFIG_LINUX) = tap-linux.o
 tap-obj-$(CONFIG_BSD) = tap-bsd.o
diff --git a/net/awd.c b/net/awd.c
new file mode 100644
index 0000000000..d42b4a7372
--- /dev/null
+++ b/net/awd.c
@@ -0,0 +1,261 @@
+/*
+ * Advanced Watch Dog
+ *
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2019 Intel Corporation
+ *
+ * Author: Zhang Chen <chen.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "net/net.h"
+#include "qom/object_interfaces.h"
+#include "qom/object.h"
+#include "chardev/char-fe.h"
+#include "qemu/sockets.h"
+#include "sysemu/iothread.h"
+
+#define TYPE_AWD  "advanced-watchdog"
+#define AWD(obj)  OBJECT_CHECK(AwdState, (obj), TYPE_AWD)
+
+#define AWD_READ_LEN_MAX NET_BUFSIZE
+/* Default advanced watchdog pulse interval */
+#define AWD_PULSE_INTERVAL_DEFAULT 5000
+/* Default advanced watchdog timeout */
+#define AWD_TIMEOUT_DEFAULT 2000
+
+typedef struct AwdState {
+    Object parent;
+
+    bool server;
+    char *awd_node;
+    char *notification_node;
+    char *opt_script;
+    uint32_t pulse_interval;
+    uint32_t timeout;
+    IOThread *iothread;
+} AwdState;
+
+typedef struct AwdClass {
+    ObjectClass parent_class;
+} AwdClass;
+
+static char *awd_get_node(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return g_strdup(s->awd_node);
+}
+
+static void awd_set_node(Object *obj, const char *value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->awd_node);
+    s->awd_node = g_strdup(value);
+}
+
+static char *noti_get_node(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return g_strdup(s->notification_node);
+}
+
+static void noti_set_node(Object *obj, const char *value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->notification_node);
+    s->notification_node = g_strdup(value);
+}
+
+static char *opt_script_get_node(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return g_strdup(s->opt_script);
+}
+
+static void opt_script_set_node(Object *obj, const char *value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->opt_script);
+    s->opt_script = g_strdup(value);
+}
+
+static bool awd_get_server(Object *obj, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    return s->server;
+}
+
+static void awd_set_server(Object *obj, bool value, Error **errp)
+{
+    AwdState *s = AWD(obj);
+
+    s->server = value;
+}
+
+static void awd_get_interval(Object *obj, Visitor *v,
+                                   const char *name, void *opaque,
+                                   Error **errp)
+{
+    AwdState *s = AWD(obj);
+    uint32_t value = s->pulse_interval;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void awd_set_interval(Object *obj, Visitor *v,
+                                   const char *name, void *opaque,
+                                   Error **errp)
+{
+    AwdState *s = AWD(obj);
+    Error *local_err = NULL;
+    uint32_t value;
+
+    visit_type_uint32(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    if (!value) {
+        error_setg(&local_err, "Property '%s.%s' requires a positive value",
+                   object_get_typename(obj), name);
+        goto out;
+    }
+    s->pulse_interval = value;
+
+out:
+    error_propagate(errp, local_err);
+}
+
+static void awd_get_timeout(Object *obj, Visitor *v,
+                            const char *name, void *opaque,
+                            Error **errp)
+{
+    AwdState *s = AWD(obj);
+    uint32_t value = s->timeout;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void awd_set_timeout(Object *obj, Visitor *v,
+                            const char *name, void *opaque,
+                            Error **errp)
+{
+    AwdState *s = AWD(obj);
+    Error *local_err = NULL;
+    uint32_t value;
+
+    visit_type_uint32(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    if (!value) {
+        error_setg(&local_err, "Property '%s.%s' requires a positive value",
+                   object_get_typename(obj), name);
+        goto out;
+    }
+    s->timeout = value;
+
+out:
+    error_propagate(errp, local_err);
+}
+
+static void awd_complete(UserCreatable *uc, Error **errp)
+{
+    AwdState *s = AWD(uc);
+
+    if (!s->awd_node || !s->iothread ||
+        !s->notification_node || !s->opt_script) {
+        error_setg(errp, "advanced-watchdog needs 'awd_node', "
+                   "'notification_node', 'opt_script' "
+                   "and 'server' property set");
+        return;
+    }
+
+    return;
+}
+
+static void awd_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+
+    ucc->complete = awd_complete;
+}
+
+static void awd_init(Object *obj)
+{
+    AwdState *s = AWD(obj);
+
+    object_property_add_str(obj, "awd_node",
+                            awd_get_node, awd_set_node,
+                            NULL);
+
+    object_property_add_str(obj, "notification_node",
+                            noti_get_node, noti_set_node,
+                            NULL);
+
+    object_property_add_str(obj, "opt_script",
+                            opt_script_get_node, opt_script_set_node,
+                            NULL);
+
+    object_property_add_bool(obj, "server",
+                             awd_get_server,
+                             awd_set_server, NULL);
+
+    object_property_add(obj, "pulse_interval", "uint32",
+                        awd_get_interval,
+                        awd_set_interval, NULL, NULL, NULL);
+
+    object_property_add(obj, "timeout", "uint32",
+                        awd_get_timeout,
+                        awd_set_timeout, NULL, NULL, NULL);
+
+    object_property_add_link(obj, "iothread", TYPE_IOTHREAD,
+                            (Object **)&s->iothread,
+                            object_property_allow_set_link,
+                            OBJ_PROP_LINK_STRONG, NULL);
+}
+
+static void awd_finalize(Object *obj)
+{
+    AwdState *s = AWD(obj);
+
+    g_free(s->awd_node);
+    g_free(s->notification_node);
+}
+
+static const TypeInfo awd_info = {
+    .name = TYPE_AWD,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(AwdState),
+    .instance_init = awd_init,
+    .instance_finalize = awd_finalize,
+    .class_size = sizeof(AwdClass),
+    .class_init = awd_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&awd_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index 1fc2470e2f..032be8372d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4572,6 +4572,12 @@ Dump the network traffic on netdev @var{dev} to the file specified by
 The file format is libpcap, so it can be analyzed with tools such as tcpdump
 or Wireshark.
 
+@item -object advanced-watchdog,id=@var{id},awd_node=@var{chardevid},notification_node=@var{chardevid},server=@var{server},iothread=@var{id}[,pulse_interval=@var{time_ms},timeout=@var{time_ms}]
+
+Advanced Watch Dog is an universal monitoring module on VMM side, it can be used to detect network down(VMM to guest, VMM to VMM, VMM to another remote server) and do previously set operation.
+for example: send message to admin, notify another VMM, send qmp command to qemu do some operation like restart the VM, build VMM heartbeat system, etc.
+It make user have basic VM/Host network monitoring tools and basic false tolerance and recovery solution.
+
 @item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},outdev=@var{chardevid},iothread=@var{id}[,vnet_hdr_support][,notify_dev=@var{id}]
 
 Colo-compare gets packet from primary_in@var{chardevid} and secondary_in@var{chardevid}, than compare primary packet with
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 2/4] net/awd.c: Initailize input/output chardev
  2019-11-01  2:48 [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 1/4] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
@ 2019-11-01  2:48 ` Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 3/4] net/awd.c: Load advanced watch dog worker thread job Zhang Chen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Zhang Chen @ 2019-11-01  2:48 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Find and check the chardev awd_node and notification_node,
The awd_node used for keep connect with outside(like VM client/other
host/Remote server), and the notification_node used for do some
operation when disconnect event occur.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/awd.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/net/awd.c b/net/awd.c
index d42b4a7372..ad3d39c982 100644
--- a/net/awd.c
+++ b/net/awd.c
@@ -42,6 +42,8 @@ typedef struct AwdState {
     char *opt_script;
     uint32_t pulse_interval;
     uint32_t timeout;
+    CharBackend chr_awd_node;
+    CharBackend chr_notification_node;
     IOThread *iothread;
 } AwdState;
 
@@ -175,9 +177,30 @@ out:
     error_propagate(errp, local_err);
 }
 
+static int find_and_check_chardev(Chardev **chr,
+                                  char *chr_name,
+                                  Error **errp)
+{
+    *chr = qemu_chr_find(chr_name);
+    if (*chr == NULL) {
+        error_setg(errp, "Device '%s' not found",
+                   chr_name);
+        return 1;
+    }
+
+    if (!qemu_chr_has_feature(*chr, QEMU_CHAR_FEATURE_RECONNECTABLE)) {
+        error_setg(errp, "chardev \"%s\" is not reconnectable",
+                   chr_name);
+        return 1;
+    }
+
+    return 0;
+}
+
 static void awd_complete(UserCreatable *uc, Error **errp)
 {
     AwdState *s = AWD(uc);
+    Chardev *chr;
 
     if (!s->awd_node || !s->iothread ||
         !s->notification_node || !s->opt_script) {
@@ -187,6 +210,20 @@ static void awd_complete(UserCreatable *uc, Error **errp)
         return;
     }
 
+    if (find_and_check_chardev(&chr, s->awd_node, errp) ||
+        !qemu_chr_fe_init(&s->chr_awd_node, chr, errp)) {
+        error_setg(errp, "advanced-watchdog can't find chardev awd_node: %s",
+                   s->awd_node);
+        return;
+    }
+
+    if (find_and_check_chardev(&chr, s->notification_node, errp) ||
+        !qemu_chr_fe_init(&s->chr_notification_node, chr, errp)) {
+        error_setg(errp, "advanced-watchdog can't find "
+                   "chardev notification_node: %s", s->notification_node);
+        return;
+    }
+
     return;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 3/4] net/awd.c: Load advanced watch dog worker thread job
  2019-11-01  2:48 [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 1/4] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 2/4] net/awd.c: Initailize input/output chardev Zhang Chen
@ 2019-11-01  2:48 ` Zhang Chen
  2019-11-01  2:48 ` [PATCH V2 4/4] vl.c: Make Advanced Watch Dog delayed initialization Zhang Chen
  2019-11-08  3:03 ` [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang, Chen
  4 siblings, 0 replies; 8+ messages in thread
From: Zhang Chen @ 2019-11-01  2:48 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

This patch load pulse_timer and timeout_timer in the new iothread.
The pulse timer will send pulse info to awd_node, and the timeout timer
will check the reply pulse from awd_node. If timeout occur, it will send
opt_script's data to the notification_node.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 net/awd.c | 193 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 193 insertions(+)

diff --git a/net/awd.c b/net/awd.c
index ad3d39c982..04f40e7cc8 100644
--- a/net/awd.c
+++ b/net/awd.c
@@ -40,17 +40,137 @@ typedef struct AwdState {
     char *awd_node;
     char *notification_node;
     char *opt_script;
+    char *opt_script_data;
     uint32_t pulse_interval;
     uint32_t timeout;
     CharBackend chr_awd_node;
     CharBackend chr_notification_node;
+    SocketReadState awd_rs;
+
+    QEMUTimer *pulse_timer;
+    QEMUTimer *timeout_timer;
     IOThread *iothread;
+    GMainContext *worker_context;
 } AwdState;
 
 typedef struct AwdClass {
     ObjectClass parent_class;
 } AwdClass;
 
+static int awd_chr_send(AwdState *s,
+                        const uint8_t *buf,
+                        uint32_t size)
+{
+    int ret = 0;
+    uint32_t len = htonl(size);
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = qemu_chr_fe_write_all(&s->chr_awd_node, (uint8_t *)&len,
+                                sizeof(len));
+    if (ret != sizeof(len)) {
+        goto err;
+    }
+
+    ret = qemu_chr_fe_write_all(&s->chr_awd_node, (uint8_t *)buf,
+                                size);
+    if (ret != size) {
+        goto err;
+    }
+
+    return 0;
+
+err:
+    return ret < 0 ? ret : -EIO;
+}
+
+static int awd_chr_can_read(void *opaque)
+{
+    return AWD_READ_LEN_MAX;
+}
+
+static void awd_node_in(void *opaque, const uint8_t *buf, int size)
+{
+    AwdState *s = AWD(opaque);
+    int ret;
+
+    ret = net_fill_rstate(&s->awd_rs, buf, size);
+    if (ret == -1) {
+        qemu_chr_fe_set_handlers(&s->chr_awd_node, NULL, NULL, NULL, NULL,
+                                 NULL, NULL, true);
+        error_report("advanced-watchdog get pulse error");
+    }
+}
+
+static void awd_send_pulse(void *opaque)
+{
+    AwdState *s = opaque;
+    char buf[] = "advanced-watchdog pulse";
+
+    awd_chr_send(s, (uint8_t *)buf, sizeof(buf));
+}
+
+static void awd_regular_pulse(void *opaque)
+{
+    AwdState *s = opaque;
+
+    awd_send_pulse(s);
+    timer_mod(s->pulse_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->pulse_interval);
+}
+
+static void awd_timeout(void *opaque)
+{
+    AwdState *s = opaque;
+    int ret = 0;
+
+    ret = qemu_chr_fe_write_all(&s->chr_notification_node,
+                                (uint8_t *)s->opt_script_data,
+                                strlen(s->opt_script_data));
+    if (ret) {
+        error_report("advanced-watchdog notification failure");
+    }
+}
+
+static void awd_timer_init(AwdState *s)
+{
+    AioContext *ctx = iothread_get_aio_context(s->iothread);
+
+    s->timeout_timer = aio_timer_new(ctx, QEMU_CLOCK_VIRTUAL, SCALE_MS,
+                                     awd_timeout, s);
+
+    s->pulse_timer = aio_timer_new(ctx, QEMU_CLOCK_VIRTUAL, SCALE_MS,
+                                      awd_regular_pulse, s);
+
+    if (!s->pulse_interval) {
+        s->pulse_interval = AWD_PULSE_INTERVAL_DEFAULT;
+    }
+
+    if (!s->timeout) {
+        s->timeout = AWD_TIMEOUT_DEFAULT;
+    }
+
+    timer_mod(s->pulse_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->pulse_interval);
+}
+
+static void awd_timer_del(AwdState *s)
+{
+    if (s->pulse_timer) {
+        timer_del(s->pulse_timer);
+        timer_free(s->pulse_timer);
+        s->pulse_timer = NULL;
+    }
+
+    if (s->timeout_timer) {
+        timer_del(s->timeout_timer);
+        timer_free(s->timeout_timer);
+        s->timeout_timer = NULL;
+    }
+ }
+
 static char *awd_get_node(Object *obj, Error **errp)
 {
     AwdState *s = AWD(obj);
@@ -177,6 +297,22 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void awd_rs_finalize(SocketReadState *awd_rs)
+{
+    AwdState *s = container_of(awd_rs, AwdState, awd_rs);
+
+    if (!s->server) {
+        char buf[] = "advanced-watchdog reply pulse";
+
+        awd_chr_send(s, (uint8_t *)buf, sizeof(buf));
+    }
+
+    timer_mod(s->timeout_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->timeout);
+
+    error_report("advanced-watchdog got message : %s", awd_rs->buf);
+}
+
 static int find_and_check_chardev(Chardev **chr,
                                   char *chr_name,
                                   Error **errp)
@@ -197,6 +333,46 @@ static int find_and_check_chardev(Chardev **chr,
     return 0;
 }
 
+static void awd_iothread(AwdState *s)
+{
+    object_ref(OBJECT(s->iothread));
+    s->worker_context = iothread_get_g_main_context(s->iothread);
+
+    qemu_chr_fe_set_handlers(&s->chr_awd_node, awd_chr_can_read,
+                             awd_node_in, NULL, NULL,
+                             s, s->worker_context, true);
+
+    awd_timer_init(s);
+}
+
+static int get_opt_script_data(AwdState *s)
+{
+    FILE *opt_fd;
+    long fsize;
+
+    opt_fd = fopen(s->opt_script, "r");
+    if (opt_fd == NULL) {
+        error_report("advanced-watchdog can't open "
+                     "opt_script: %s", s->opt_script);
+        return -1;
+    }
+
+    fseek(opt_fd, 0, SEEK_END);
+    fsize = ftell(opt_fd);
+    fseek(opt_fd, 0, SEEK_SET);
+    s->opt_script_data = malloc(fsize + 1);
+
+    if (!fread(s->opt_script_data, 1, fsize, opt_fd)) {
+        error_report("advanced-watchdog can't read "
+                     "opt_script: %s", s->opt_script);
+        return -1;
+    }
+
+    fclose(opt_fd);
+
+    return 0;
+}
+
 static void awd_complete(UserCreatable *uc, Error **errp)
 {
     AwdState *s = AWD(uc);
@@ -224,6 +400,16 @@ static void awd_complete(UserCreatable *uc, Error **errp)
         return;
     }
 
+    if (get_opt_script_data(s)) {
+        error_setg(errp, "advanced-watchdog can't get "
+                   "opt script data: %s", s->opt_script);
+        return;
+    }
+
+    net_socket_rs_init(&s->awd_rs, awd_rs_finalize, false);
+
+    awd_iothread(s);
+
     return;
 }
 
@@ -272,6 +458,13 @@ static void awd_finalize(Object *obj)
 {
     AwdState *s = AWD(obj);
 
+    qemu_chr_fe_deinit(&s->chr_awd_node, false);
+    qemu_chr_fe_deinit(&s->chr_notification_node, false);
+
+    if (s->iothread) {
+        awd_timer_del(s);
+    }
+
     g_free(s->awd_node);
     g_free(s->notification_node);
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 4/4] vl.c: Make Advanced Watch Dog delayed initialization
  2019-11-01  2:48 [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang Chen
                   ` (2 preceding siblings ...)
  2019-11-01  2:48 ` [PATCH V2 3/4] net/awd.c: Load advanced watch dog worker thread job Zhang Chen
@ 2019-11-01  2:48 ` Zhang Chen
  2019-11-08  3:03 ` [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang, Chen
  4 siblings, 0 replies; 8+ messages in thread
From: Zhang Chen @ 2019-11-01  2:48 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen, Zhang Chen

From: Zhang Chen <chen.zhang@intel.com>

Because Advanced Watch Dog module needs chardev socket
to initialize properly before.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 vl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/vl.c b/vl.c
index 6a65a64bfd..048fe458b9 100644
--- a/vl.c
+++ b/vl.c
@@ -2689,6 +2689,13 @@ static bool object_create_initial(const char *type, QemuOpts *opts)
         return false;
     }
 
+    /*
+     * Reason: Advanced Watch Dog property "chardev".
+     */
+    if (g_str_equal(type, "advanced-watchdog")) {
+        return false;
+    }
+
     /* Memory allocation by backends needs to be done
      * after configure_accelerator() (due to the tcg_enabled()
      * checks at memory_region_init_*()).
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH V2 0/4] Introduce Advanced Watch Dog module
  2019-11-01  2:48 [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang Chen
                   ` (3 preceding siblings ...)
  2019-11-01  2:48 ` [PATCH V2 4/4] vl.c: Make Advanced Watch Dog delayed initialization Zhang Chen
@ 2019-11-08  3:03 ` Zhang, Chen
  2019-11-27 15:48   ` Markus Armbruster
  4 siblings, 1 reply; 8+ messages in thread
From: Zhang, Chen @ 2019-11-08  3:03 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, Philippe Mathieu-Daudé, qemu-dev
  Cc: Zhang Chen

Hi~ All~ 

Ping.... Anyone have time to review this series? I need more comments~

Thanks
Zhang Chen

> -----Original Message-----
> From: Zhang, Chen <chen.zhang@intel.com>
> Sent: Friday, November 1, 2019 10:49 AM
> To: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Zhang, Chen
> <chen.zhang@intel.com>
> Subject: [PATCH V2 0/4] Introduce Advanced Watch Dog module
> 
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Advanced Watch Dog is an universal monitoring module on VMM side, it can
> be used to detect network down(VMM to guest, VMM to VMM, VMM to
> another remote server) and do previously set operation. Current AWD patch
> just accept any input as the signal to refresh the watchdog timer, and we can
> also make a certain interactive protocol here. For the output user can pre-
> write some command or some messages in the AWD opt-script. We noticed
> that there is no way for VMM communicate directly, maybe some people
> think we don't need such things(up layer software like openstack can handle
> it). But we engaged with real customer found that in some cases,they need a
> lightweight and efficient mechanism to solve some practical
> problems(openstack is too heavy).
> for example: When it detects lost connection with the paired node,it will
> send message to admin, notify another VMM, send qmp command to qemu
> do some operation like restart the VM, build VMM heartbeat system, etc.
> It make user have basic VM/Host network monitoring tools and basic false
> tolerance and recovery solution.
> 
> Demo usage(for COLO heartbeat service):
> 
> In primary node:
> 
> -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
> -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
> -object iothread,id=iothread2
> -object advanced-
> watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat
> 0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,
> timeout=5000
> 
> In secondary node:
> 
> -monitor tcp::4445,server,nowait
> -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
> -chardev socket,id=heart1,host=3.3.3.8,port=4445
> -object iothread,id=iothread1
> -object advanced-
> watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,op
> t_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000
> 
> 
> V2:
>  - Addressed Philippe comments add configure selector for AWD.
> 
> Initial:
>  - Initial version.
> 
> Zhang Chen (4):
>   net/awd.c: Introduce Advanced Watch Dog module framework
>   net/awd.c: Initailize input/output chardev
>   net/awd.c: Load advanced watch dog worker thread job
>   vl.c: Make Advanced Watch Dog delayed initialization
> 
>  configure         |   9 +
>  net/Makefile.objs |   1 +
>  net/awd.c         | 491
> ++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-options.hx   |   6 +
>  vl.c              |   7 +
>  5 files changed, 514 insertions(+)
>  create mode 100644 net/awd.c
> 
> --
> 2.17.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 0/4] Introduce Advanced Watch Dog module
  2019-11-08  3:03 ` [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang, Chen
@ 2019-11-27 15:48   ` Markus Armbruster
  2019-11-28  3:15     ` Zhang, Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Markus Armbruster @ 2019-11-27 15:48 UTC (permalink / raw)
  To: Zhang, Chen
  Cc: Paolo Bonzini, Jason Wang, Philippe Mathieu-Daudé,
	qemu-dev, Zhang Chen

"Zhang, Chen" <chen.zhang@intel.com> writes:

> Hi~ All~ 
>
> Ping.... Anyone have time to review this series? I need more comments~

Any takers?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH V2 0/4] Introduce Advanced Watch Dog module
  2019-11-27 15:48   ` Markus Armbruster
@ 2019-11-28  3:15     ` Zhang, Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Zhang, Chen @ 2019-11-28  3:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, Jason Wang, Philippe Mathieu-Daudé,
	qemu-dev, Zhang Chen



> -----Original Message-----
> From: Markus Armbruster <armbru@redhat.com>
> Sent: Wednesday, November 27, 2019 11:49 PM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>; Zhang Chen <zhangckid@gmail.com>
> Subject: Re: [PATCH V2 0/4] Introduce Advanced Watch Dog module
> 
> "Zhang, Chen" <chen.zhang@intel.com> writes:
> 
> > Hi~ All~
> >
> > Ping.... Anyone have time to review this series? I need more comments~
> 
> Any takers?

Hi Markus,

Thank you for your attention.
This is a very simple module to complete the tasks related to error detection and automatic processing.
I have write the detail reason why we need it in real environment on the commit log.
Here is the latest patch:
https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02872.html

Thanks
Zhang Chen



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-28  3:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01  2:48 [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang Chen
2019-11-01  2:48 ` [PATCH V2 1/4] net/awd.c: Introduce Advanced Watch Dog module framework Zhang Chen
2019-11-01  2:48 ` [PATCH V2 2/4] net/awd.c: Initailize input/output chardev Zhang Chen
2019-11-01  2:48 ` [PATCH V2 3/4] net/awd.c: Load advanced watch dog worker thread job Zhang Chen
2019-11-01  2:48 ` [PATCH V2 4/4] vl.c: Make Advanced Watch Dog delayed initialization Zhang Chen
2019-11-08  3:03 ` [PATCH V2 0/4] Introduce Advanced Watch Dog module Zhang, Chen
2019-11-27 15:48   ` Markus Armbruster
2019-11-28  3:15     ` Zhang, Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).