All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update
@ 2021-06-16 14:43 Julien Grall
  2021-06-16 14:43 ` [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore Julien Grall
                   ` (10 more replies)
  0 siblings, 11 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Juergen Gross

From: Julien Grall <jgrall@amazon.com>

Hi all,

At the moment, Live-Update will, by default, not proceed if there are
in-flight transactions. It is possible force it by passing -F but this
will break any connection with in-flight transactions.

There are PV drivers out that may never terminate some transaction. On
host running such guest, we would need to use -F. Unfortunately, this
also risks to break well-behaving guests (and even dom0) because
Live-Update will happen as soon as the timeout is hit.

This series aims to allow to Live-Update more safely even when the option
-F is used.

The first part of the series contains a few fixes for bug found while
testing Live-Update.

Cheers,

Julien Grall (10):
  MAINTAINERS: Add myself as reviewers for tools/xenstore
  tools/xenstored: Introduce lu_get_connection() and use it
  tools/xenstore: Don't assume conn->in points to the LU request
  tools/xenstored: Limit the number of requests a connection can delay
  tools/xenstored: xenstored_core.h should include fcntl.h
  tools/xenstored: Introduce a wrapper for conn->funcs->can_{read,
    write}
  tools/xenstored: delay_request: don't assume conn->in == in
  tools/xenstored: Extend restore code to handle multiple input buffer
  tools/xenstored: Dump delayed requests
  tools/xenstored: Delay new transaction while Live-Update is pending

 MAINTAINERS                        |   1 +
 tools/xenstore/xenstored_control.c |  66 +++++++++-
 tools/xenstore/xenstored_control.h |   7 ++
 tools/xenstore/xenstored_core.c    | 196 +++++++++++++++++++++++------
 tools/xenstore/xenstored_core.h    |   8 +-
 tools/xenstore/xenstored_domain.c  |  46 +++----
 tools/xenstore/xenstored_domain.h  |   6 +-
 7 files changed, 255 insertions(+), 75 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-16 15:15   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it Julien Grall
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu

I would like to help reviewing Xenstored patches. It is more convenient
to find them if I am CCed.

Signed-off-by: Julien Grall <julien@xen.org>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 39750bb75db5..dd8c011456cd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -628,6 +628,7 @@ XENSTORE
 M:	Ian Jackson <iwj@xenproject.org>
 M:	Wei Liu <wl@xen.org>
 M:	Juergen Gross <jgross@suse.com>
+R:	Julien Grall <julien@xen.org>
 S:	Supported
 F:	tools/xenstore/
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
  2021-06-16 14:43 ` [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  8:21   ` Luca Fancellu
  2021-06-24  7:21   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request Julien Grall
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

At the moment, dump_state_buffered_data() is taking two connections
in parameters (one is the connection to dump, the other is the
connection used to request LU). The naming doesn't help to
distinguish (c vs conn) them and this already lead to several mistake
while modifying the function.

To remove the confusion, introduce an help lu_get_connection() that
will return the connection used to request LU and use it
in place of the existing parameter.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_control.c | 13 ++++++++++++-
 tools/xenstore/xenstored_control.h |  2 ++
 tools/xenstore/xenstored_core.c    |  7 +++----
 tools/xenstore/xenstored_core.h    |  1 -
 tools/xenstore/xenstored_domain.c  |  6 +++---
 tools/xenstore/xenstored_domain.h  |  2 +-
 6 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index 0d57f9f9400d..d08a2b961432 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -104,6 +104,17 @@ static const char *lu_begin(struct connection *conn)
 
 	return NULL;
 }
+
+struct connection *lu_get_connection(void)
+{
+	return lu_status ? lu_status->conn : NULL;
+}
+
+#else
+struct connection *lu_get_connection(void)
+{
+	return NULL;
+}
 #endif
 
 struct cmd_s {
@@ -516,7 +527,7 @@ static const char *lu_dump_state(const void *ctx, struct connection *conn)
 	ret = dump_state_global(fp);
 	if (ret)
 		goto out;
-	ret = dump_state_connections(fp, conn);
+	ret = dump_state_connections(fp);
 	if (ret)
 		goto out;
 	ret = dump_state_special_nodes(fp);
diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
index aac61f05908f..6842b8d88760 100644
--- a/tools/xenstore/xenstored_control.h
+++ b/tools/xenstore/xenstored_control.h
@@ -18,3 +18,5 @@
 
 int do_control(struct connection *conn, struct buffered_data *in);
 void lu_read_state(void);
+
+struct connection *lu_get_connection(void);
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 883a1a582a60..607187361d84 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2369,14 +2369,13 @@ const char *dump_state_global(FILE *fp)
 
 /* Called twice: first with fp == NULL to get length, then for writing data. */
 const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
-				     const struct connection *conn,
 				     struct xs_state_connection *sc)
 {
 	unsigned int len = 0, used;
 	struct buffered_data *out, *in = c->in;
 	bool partial = true;
 
-	if (in && c != conn) {
+	if (in && c != lu_get_connection()) {
 		len = in->inhdr ? in->used : sizeof(in->hdr);
 		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
 			return "Dump read data error";
@@ -2416,8 +2415,8 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
 	}
 
 	/* Add "OK" for live-update command. */
-	if (c == conn) {
-		struct xsd_sockmsg msg = conn->in->hdr.msg;
+	if (c == lu_get_connection()) {
+		struct xsd_sockmsg msg = c->in->hdr.msg;
 
 		msg.len = sizeof("OK");
 		if (fp && fwrite(&msg, sizeof(msg), 1, fp) != 1)
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index bb36111ecc56..89ce155e755b 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -269,7 +269,6 @@ void set_tdb_key(const char *name, TDB_DATA *key);
 
 const char *dump_state_global(FILE *fp);
 const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
-				     const struct connection *conn,
 				     struct xs_state_connection *sc);
 const char *dump_state_nodes(FILE *fp, const void *ctx);
 const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 322b0dbca449..6d8d29cbe41c 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -1183,7 +1183,7 @@ void wrl_apply_debit_trans_commit(struct connection *conn)
 	wrl_apply_debit_actual(conn->domain);
 }
 
-const char *dump_state_connections(FILE *fp, struct connection *conn)
+const char *dump_state_connections(FILE *fp)
 {
 	const char *ret = NULL;
 	unsigned int conn_id = 1;
@@ -1209,7 +1209,7 @@ const char *dump_state_connections(FILE *fp, struct connection *conn)
 			sc.spec.socket_fd = c->fd;
 		}
 
-		ret = dump_state_buffered_data(NULL, c, conn, &sc);
+		ret = dump_state_buffered_data(NULL, c, &sc);
 		if (ret)
 			return ret;
 		head.length += sc.data_in_len + sc.data_out_len;
@@ -1219,7 +1219,7 @@ const char *dump_state_connections(FILE *fp, struct connection *conn)
 		if (fwrite(&sc, offsetof(struct xs_state_connection, data),
 			   1, fp) != 1)
 			return "Dump connection state error";
-		ret = dump_state_buffered_data(fp, c, conn, NULL);
+		ret = dump_state_buffered_data(fp, c, NULL);
 		if (ret)
 			return ret;
 		ret = dump_state_align(fp);
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index cc5147d7e747..62ee471ea6aa 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -101,7 +101,7 @@ void wrl_log_periodic(struct wrl_timestampt now);
 void wrl_apply_debit_direct(struct connection *conn);
 void wrl_apply_debit_trans_commit(struct connection *conn);
 
-const char *dump_state_connections(FILE *fp, struct connection *conn);
+const char *dump_state_connections(FILE *fp);
 const char *dump_state_special_nodes(FILE *fp);
 
 void read_state_connection(const void *ctx, const void *state);
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
  2021-06-16 14:43 ` [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore Julien Grall
  2021-06-16 14:43 ` [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  8:55   ` Luca Fancellu
  2021-06-24  7:32   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay Julien Grall
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

call_delayed() is currently assuming that conn->in is NULL when
handling delayed request. However, the connection is not paused.
Therefore new request can be processed and conn->in may be non-NULL
if we have only received a partial request.

Furthermore, as we overwrite conn->in, the current partial request
will not be transferred. This will result to corrupt the connection.

Rather than updating conn->in, stash the LU request in lu_status and
let each callback for delayed request to update conn->in when
necessary.

To keep a sane interface, the code to write the "OK" response the
LU request is moved in xenstored_core.c.

Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution of a xenstore request")
Fixes: ed6eebf17d ("tools/xenstore: dump the xenstore state for live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

This is fixing bugs from two separate commits. I couldn't figure out
how to split in two patches without breaking bisection.
---
 tools/xenstore/xenstored_control.c | 41 ++++++++++++++++++++++++++++--
 tools/xenstore/xenstored_control.h |  3 +++
 tools/xenstore/xenstored_core.c    | 17 +++----------
 3 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index d08a2b961432..7acc2d134f9f 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -50,6 +50,9 @@ struct live_update {
 	/* For verification the correct connection is acting. */
 	struct connection *conn;
 
+	/* Pointer to the command used to request LU */
+	struct buffered_data *in;
+
 #ifdef __MINIOS__
 	void *kernel;
 	unsigned int kernel_size;
@@ -100,6 +103,7 @@ static const char *lu_begin(struct connection *conn)
 	if (!lu_status)
 		return "Allocation failure.";
 	lu_status->conn = conn;
+	lu_status->in = conn->in;
 	talloc_set_destructor(lu_status, lu_destroy);
 
 	return NULL;
@@ -110,11 +114,34 @@ struct connection *lu_get_connection(void)
 	return lu_status ? lu_status->conn : NULL;
 }
 
+unsigned int lu_write_response(FILE *fp)
+{
+	struct xsd_sockmsg msg;
+
+	assert(lu_status);
+
+	msg = lu_status->in->hdr.msg;
+
+	msg.len = sizeof("OK");
+	if (fp && fwrite(&msg, sizeof(msg), 1, fp) != 1)
+		return 0;
+	if (fp && fwrite("OK", msg.len, 1, fp) != 1)
+		return 0;
+
+	return sizeof(msg) + msg.len;
+}
+
 #else
 struct connection *lu_get_connection(void)
 {
 	return NULL;
 }
+
+unsigned int lu_write_response(FILE *fp)
+{
+	/* Unsupported */
+	return 0;
+}
 #endif
 
 struct cmd_s {
@@ -658,6 +685,8 @@ static bool do_lu_start(struct delayed_request *req)
 {
 	time_t now = time(NULL);
 	const char *ret;
+	struct buffered_data *saved_in;
+	struct connection *conn = lu_status->conn;
 
 	if (!lu_check_lu_allowed()) {
 		if (now < lu_status->started_at + lu_status->timeout)
@@ -668,8 +697,9 @@ static bool do_lu_start(struct delayed_request *req)
 		}
 	}
 
+	assert(req->in == lu_status->in);
 	/* Dump out internal state, including "OK" for live update. */
-	ret = lu_dump_state(req->in, lu_status->conn);
+	ret = lu_dump_state(req->in, conn);
 	if (!ret) {
 		/* Perform the activation of new binary. */
 		ret = lu_activate_binary(req->in);
@@ -677,7 +707,14 @@ static bool do_lu_start(struct delayed_request *req)
 
 	/* We will reach this point only in case of failure. */
  out:
-	send_reply(lu_status->conn, XS_CONTROL, ret, strlen(ret) + 1);
+	/*
+	 * send_reply() will send the response for conn->in. Save the current
+	 * conn->in and restore it afterwards.
+	 */
+	saved_in = conn->in;
+	conn->in = req->in;
+	send_reply(conn, XS_CONTROL, ret, strlen(ret) + 1);
+	conn->in = saved_in;
 	talloc_free(lu_status);
 
 	return true;
diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
index 6842b8d88760..27d7f19e4b7f 100644
--- a/tools/xenstore/xenstored_control.h
+++ b/tools/xenstore/xenstored_control.h
@@ -20,3 +20,6 @@ int do_control(struct connection *conn, struct buffered_data *in);
 void lu_read_state(void);
 
 struct connection *lu_get_connection(void);
+
+/* Write the "OK" response for the live-update command */
+unsigned int lu_write_response(FILE *fp);
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 607187361d84..41b26d7094c8 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -272,15 +272,10 @@ static int undelay_request(void *_req)
 
 static void call_delayed(struct connection *conn, struct delayed_request *req)
 {
-	assert(conn->in == NULL);
-	conn->in = req->in;
-
 	if (req->func(req)) {
 		undelay_request(req);
 		talloc_set_destructor(req, NULL);
 	}
-
-	conn->in = NULL;
 }
 
 int delay_request(struct connection *conn, struct buffered_data *in,
@@ -2375,7 +2370,7 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
 	struct buffered_data *out, *in = c->in;
 	bool partial = true;
 
-	if (in && c != lu_get_connection()) {
+	if (in) {
 		len = in->inhdr ? in->used : sizeof(in->hdr);
 		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
 			return "Dump read data error";
@@ -2416,16 +2411,12 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
 
 	/* Add "OK" for live-update command. */
 	if (c == lu_get_connection()) {
-		struct xsd_sockmsg msg = c->in->hdr.msg;
+		unsigned int rc = lu_write_response(fp);
 
-		msg.len = sizeof("OK");
-		if (fp && fwrite(&msg, sizeof(msg), 1, fp) != 1)
+		if (!rc)
 			return "Dump buffered data error";
-		len += sizeof(msg);
-		if (fp && fwrite("OK", msg.len, 1, fp) != 1)
 
-			return "Dump buffered data error";
-		len += msg.len;
+		len += rc;
 	}
 
 	if (sc)
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (2 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:02   ` Luca Fancellu
  2021-06-24  7:35   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h Julien Grall
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

Currently, only liveupdate request can be delayed. The request can only
be performed by a privileged connection (e.g. dom0). So it is fine to
have no limits.

In a follow-up patch we will want to delay request for unprivileged
connection as well. So it is best to apply a limit.

For now and for simplicity, only a single request can be delayed
for a given unprivileged connection.

Take the opportunity to tweak the prototype and provide a way to
bypass the quota check. This would be useful when the function
is called from the restore code.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_control.c |  2 +-
 tools/xenstore/xenstored_core.c    | 11 ++++++++++-
 tools/xenstore/xenstored_core.h    |  3 ++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index 7acc2d134f9f..1c24d4869eab 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -737,7 +737,7 @@ static const char *lu_start(const void *ctx, struct connection *conn,
 	lu_status->timeout = to;
 	lu_status->started_at = time(NULL);
 
-	errno = delay_request(conn, conn->in, do_lu_start, NULL);
+	errno = delay_request(conn, conn->in, do_lu_start, NULL, false);
 
 	return NULL;
 }
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 41b26d7094c8..51d210828922 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -279,10 +279,19 @@ static void call_delayed(struct connection *conn, struct delayed_request *req)
 }
 
 int delay_request(struct connection *conn, struct buffered_data *in,
-		  bool (*func)(struct delayed_request *), void *data)
+		  bool (*func)(struct delayed_request *), void *data,
+		  bool no_quota_check)
 {
 	struct delayed_request *req;
 
+	/*
+	 * Only allow one request can be delayed for an unprivileged
+	 * connection.
+	 */
+	if (!no_quota_check && domain_is_unprivileged(conn) &&
+	    !list_empty(&conn->delayed))
+		return ENOSPC;
+
 	req = talloc(in, struct delayed_request);
 	if (!req)
 		return ENOMEM;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 89ce155e755b..34839b34f6e9 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -213,7 +213,8 @@ char *get_parent(const void *ctx, const char *node);
 
 /* Delay a request. */
 int delay_request(struct connection *conn, struct buffered_data *in,
-		  bool (*func)(struct delayed_request *), void *data);
+		  bool (*func)(struct delayed_request *), void *data,
+		  bool no_quota_check);
 
 /* Tracing infrastructure. */
 void trace_create(const void *data, const char *type);
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (3 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:03   ` Luca Fancellu
  2021-06-24  7:36   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write} Julien Grall
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

xenstored_core.h will consider live-udpate is not supported if
O_CLOEXEC doesn't exist. However, the header doesn't include the one
defining O_CLOEXEC (i.e. fcntl.h). This means that depending on
the header included, some source file will think Live-Update is not
supported.

I am not aware of any issue with the existing. Therefore this is just
a latent bug so far.

Prevent any potential issue by including fcntl.h in xenstored_core.h

Fixes: cd831ee438 ("tools/xenstore: handle CLOEXEC flag for local files and pipes")
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_core.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 34839b34f6e9..dac517156993 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -24,6 +24,7 @@
 
 #include <sys/types.h>
 #include <dirent.h>
+#include <fcntl.h>
 #include <stdbool.h>
 #include <stdint.h>
 #include <errno.h>
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write}
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (4 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:10   ` Luca Fancellu
  2021-06-24  7:39   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in Julien Grall
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

Currently, the callbacks can_read and can_write are called directly. This
doesn't allow us to add generic check and therefore requires duplication.

At the moment, one check that could benefit to be common is whether the
connection should ignored. The position is slightly different between
domain and socket because for the latter we want to check the state of
the file descriptor first.

In follow-up patches, there will be more potential generic checks.

This patch provides wrappers to read/write a connection and move
the check ->is_ignored after the callback for everyone.

This also requires to replace the direct call to domain_can_read()
and domain_can_write() with the new wrapper. At the same time,
both functions can now be static. Note that the implementations need
to be moved earlier in the file xenstored_domain.c to avoid
declaring the prototype.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_core.c   | 18 ++++++++++----
 tools/xenstore/xenstored_domain.c | 40 +++++++++++++------------------
 tools/xenstore/xenstored_domain.h |  4 ----
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 51d210828922..2e5760fe4599 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -334,6 +334,16 @@ static int destroy_conn(void *_conn)
 	return 0;
 }
 
+static bool conn_can_read(struct connection *conn)
+{
+	return conn->funcs->can_read(conn) && !conn->is_ignored;
+}
+
+static bool conn_can_write(struct connection *conn)
+{
+	return conn->funcs->can_write(conn) && !conn->is_ignored;
+}
+
 /* This function returns index inside the array if succeed, -1 if fail */
 static int set_fd(int fd, short events)
 {
@@ -396,8 +406,8 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
 	list_for_each_entry(conn, &connections, list) {
 		if (conn->domain) {
 			wrl_check_timeout(conn->domain, now, ptimeout);
-			if (domain_can_read(conn) ||
-			    (domain_can_write(conn) &&
+			if (conn_can_read(conn) ||
+			    (conn_can_write(conn) &&
 			     !list_empty(&conn->out_list)))
 				*ptimeout = 0;
 		} else {
@@ -2325,14 +2335,14 @@ int main(int argc, char *argv[])
 			if (&next->list != &connections)
 				talloc_increase_ref_count(next);
 
-			if (conn->funcs->can_read(conn))
+			if (conn_can_read(conn))
 				handle_input(conn);
 			if (talloc_free(conn) == 0)
 				continue;
 
 			talloc_increase_ref_count(conn);
 
-			if (conn->funcs->can_write(conn))
+			if (conn_can_write(conn))
 				handle_output(conn);
 			if (talloc_free(conn) == 0)
 				continue;
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 6d8d29cbe41c..47e9107c144e 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -172,6 +172,23 @@ static int readchn(struct connection *conn, void *data, unsigned int len)
 	return len;
 }
 
+static bool domain_can_write(struct connection *conn)
+{
+	struct xenstore_domain_interface *intf = conn->domain->interface;
+
+	return ((intf->rsp_prod - intf->rsp_cons) != XENSTORE_RING_SIZE);
+}
+
+static bool domain_can_read(struct connection *conn)
+{
+	struct xenstore_domain_interface *intf = conn->domain->interface;
+
+	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+		return false;
+
+	return (intf->req_cons != intf->req_prod);
+}
+
 static const struct interface_funcs domain_funcs = {
 	.write = writechn,
 	.read = readchn,
@@ -290,19 +307,6 @@ void handle_event(void)
 		barf_perror("Failed to write to event fd");
 }
 
-bool domain_can_read(struct connection *conn)
-{
-	struct xenstore_domain_interface *intf = conn->domain->interface;
-
-	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
-		return false;
-
-	if (conn->is_ignored)
-		return false;
-
-	return (intf->req_cons != intf->req_prod);
-}
-
 static bool domid_is_unprivileged(unsigned int domid)
 {
 	return domid != 0 && domid != priv_domid;
@@ -314,16 +318,6 @@ bool domain_is_unprivileged(struct connection *conn)
 	       domid_is_unprivileged(conn->domain->domid);
 }
 
-bool domain_can_write(struct connection *conn)
-{
-	struct xenstore_domain_interface *intf = conn->domain->interface;
-
-	if (conn->is_ignored)
-		return false;
-
-	return ((intf->rsp_prod - intf->rsp_cons) != XENSTORE_RING_SIZE);
-}
-
 static char *talloc_domain_path(void *context, unsigned int domid)
 {
 	return talloc_asprintf(context, "/local/domain/%u", domid);
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 62ee471ea6aa..1e929b8f8c6f 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -51,10 +51,6 @@ void domain_deinit(void);
 /* Returns the implicit path of a connection (only domains have this) */
 const char *get_implicit_path(const struct connection *conn);
 
-/* Can connection attached to domain read/write. */
-bool domain_can_read(struct connection *conn);
-bool domain_can_write(struct connection *conn);
-
 bool domain_is_unprivileged(struct connection *conn);
 
 /* Remove node permissions for no longer existing domains. */
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (5 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write} Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:12   ` Luca Fancellu
  2021-06-24  7:44   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer Julien Grall
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

delay_request() is currently assuming that the request delayed is
always conn->in. This is currently correct, but it is a call for
a latent bug as the function allows the caller to specify any request.

To prevent any future surprise, check if the request delayed is the
current one.

Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution of a xenstore request")
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 2e5760fe4599..a5084a5b173d 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -306,7 +306,9 @@ int delay_request(struct connection *conn, struct buffered_data *in,
 	delayed_requests++;
 	list_add(&req->list, &conn->delayed);
 
-	conn->in = NULL;
+	/* Unlink the request from conn if this is the current one */
+	if (conn->in == in)
+		conn->in = NULL;
 
 	return 0;
 }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (6 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:21   ` Luca Fancellu
  2021-06-24  8:30   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 09/10] tools/xenstored: Dump delayed requests Julien Grall
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

Currently, the restore code is considering the stream will contain at
most one in-flight request per connection. In a follow-up changes, we
will want to transfer multiple in-flight requests.

The function read_state_buffered() is now extended to restore multiple
in-flight request. Complete requests will be queued as delayed
requests, if there a partial request (only the last one can) then it
will used as the current in-flight request.

Note that we want to bypass the quota check for delayed requests as
the new Xenstore may have a lower limit.

Lastly, there is no need to change the specification as there was
no restriction on the number of in-flight requests preserved.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_core.c | 56 ++++++++++++++++++++++++++++-----
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index a5084a5b173d..5b7ab7f74013 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1486,6 +1486,10 @@ static void process_message(struct connection *conn, struct buffered_data *in)
 	enum xsd_sockmsg_type type = in->hdr.msg.type;
 	int ret;
 
+	/* At least send_error() and send_reply() expects conn->in == in */
+	assert(conn->in == in);
+	trace_io(conn, in, 0);
+
 	if ((unsigned int)type >= XS_TYPE_COUNT || !wire_funcs[type].func) {
 		eprintf("Client unknown operation %i", type);
 		send_error(conn, ENOSYS);
@@ -1515,6 +1519,23 @@ static void process_message(struct connection *conn, struct buffered_data *in)
 	conn->transaction = NULL;
 }
 
+static bool process_delayed_message(struct delayed_request *req)
+{
+	struct connection *conn = req->data;
+	struct buffered_data *saved_in = conn->in;
+
+	/*
+	 * Part of process_message() expects conn->in to contains the
+	 * processed response. So save the current conn->in and restore it
+	 * afterwards.
+	 */
+	conn->in = req->in;
+	process_message(req->data, req->in);
+	conn->in = saved_in;
+
+	return true;
+}
+
 static void consider_message(struct connection *conn)
 {
 	if (verbose)
@@ -1582,7 +1603,6 @@ static void handle_input(struct connection *conn)
 	if (in->used != in->hdr.msg.len)
 		return;
 
-	trace_io(conn, in, 0);
 	consider_message(conn);
 	return;
 
@@ -2611,14 +2631,20 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
 	unsigned int len;
 	bool partial = sc->data_resp_len;
 
-	if (sc->data_in_len) {
+	for (data = sc->data; data < sc->data + sc->data_in_len; data += len) {
 		bdata = new_buffer(conn);
 		if (!bdata)
 			barf("error restoring read data");
-		if (sc->data_in_len < sizeof(bdata->hdr)) {
+
+		/*
+		 * We don't know yet if there is more than one message
+		 * to process. So the len is the size of the leftover data.
+		 */
+		len = sc->data_in_len - (data - sc->data);
+		if (len < sizeof(bdata->hdr)) {
 			bdata->inhdr = true;
-			memcpy(&bdata->hdr, sc->data, sc->data_in_len);
-			bdata->used = sc->data_in_len;
+			memcpy(&bdata->hdr, sc->data, len);
+			bdata->used = len;
 		} else {
 			bdata->inhdr = false;
 			memcpy(&bdata->hdr, sc->data, sizeof(bdata->hdr));
@@ -2629,12 +2655,26 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
 							bdata->hdr.msg.len);
 			if (!bdata->buffer)
 				barf("Error allocating in buffer");
-			bdata->used = sc->data_in_len - sizeof(bdata->hdr);
-			memcpy(bdata->buffer, sc->data + sizeof(bdata->hdr),
+			bdata->used = min_t(unsigned int,
+					    len - sizeof(bdata->hdr),
+					    bdata->hdr.msg.len);
+			memcpy(bdata->buffer, data + sizeof(bdata->hdr),
 			       bdata->used);
+			/* Update len to match the size of the message. */
+			len = bdata->used + sizeof(bdata->hdr);
 		}
 
-		conn->in = bdata;
+		/*
+		 * If the message is not complete, then it means this was
+		 * the current processed message. All the other messages
+		 * will be queued to be handled after restoring.
+		 */
+		if (bdata->inhdr || bdata->used != bdata->hdr.msg.len) {
+			assert(conn->in == NULL);
+			conn->in = bdata;
+		} else if (delay_request(conn, bdata, process_delayed_message,
+					 conn, true))
+			barf("Unable to delay the request");
 	}
 
 	for (data = sc->data + sc->data_in_len;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (7 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:27   ` Luca Fancellu
  2021-06-24  8:41   ` Juergen Gross
  2021-06-16 14:43 ` [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending Julien Grall
  2021-06-24 10:43 ` [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

Currently, only Live-Update request can be delayed. In a follow-up,
we will want to delay more requests (e.g. transaction start).
Therefore we want to preserve delayed requests across Live-Update.

Delayed requests are just complete "in" buffer. So the code is
refactored to allow sharing the code to dump "in" buffer.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_core.c | 42 +++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 5b7ab7f74013..9eca58682b51 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
 	return NULL;
 }
 
+static const char *dump_input_buffered_data(FILE *fp,
+					    const struct buffered_data *in,
+					    unsigned int *total_len)
+{
+	unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
+
+	*total_len += hlen;
+	if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
+		return "Dump read data error";
+	if (!in->inhdr && in->used) {
+		*total_len += in->used;
+		if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
+			return "Dump read data error";
+	}
+
+	return NULL;
+}
+
 /* Called twice: first with fp == NULL to get length, then for writing data. */
 const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
 				     struct xs_state_connection *sc)
 {
 	unsigned int len = 0, used;
-	struct buffered_data *out, *in = c->in;
+	struct buffered_data *out;
 	bool partial = true;
+	struct delayed_request *req;
+	const char *ret;
 
-	if (in) {
-		len = in->inhdr ? in->used : sizeof(in->hdr);
-		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
-			return "Dump read data error";
-		if (!in->inhdr && in->used) {
-			len += in->used;
-			if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
-				return "Dump read data error";
-		}
+	/* Dump any command that was delayed */
+	list_for_each_entry(req, &c->delayed, list) {
+		if (req->func != process_delayed_message)
+			continue;
+
+		assert(!req->in->inhdr);
+		if ((ret = dump_input_buffered_data(fp, req->in, &len)))
+			return ret;
 	}
 
+	if (c->in && (ret = dump_input_buffered_data(fp, c->in, &len)))
+		return ret;
+
 	if (sc) {
 		sc->data_in_len = len;
 		sc->data_resp_len = 0;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (8 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 09/10] tools/xenstored: Dump delayed requests Julien Grall
@ 2021-06-16 14:43 ` Julien Grall
  2021-06-21  9:30   ` Luca Fancellu
  2021-06-24  9:23   ` Juergen Gross
  2021-06-24 10:43 ` [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
  10 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-16 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross, Julien Grall

From: Julien Grall <jgrall@amazon.com>

At the moment, Live-Update will, by default, not proceed if there are
in-flight transactions. It is possible force it by passing -F but this
will break any connection with in-flight transactions.

There are PV drivers out that may never terminate some transaction. On
host running such guest, we would need to use -F. Unfortunately, this
also risks to break well-behaving guests (and even dom0) because
Live-Update will happen as soon as the timeout is hit.

Ideally, we would want to preserve transactions but this requires
some work and a lot of testing to be able to use it in production.

As a stop gap, we want to limit the damage of -F. This patch will delay
any transactions that are started after Live-Update has been requested.

If the request cannot be delayed, the connection will be stalled to
avoid loosing requests.

If the connection has already a pending transaction before Live-Update,
then new transaction will not be delayed. This is to avoid the connection
to stall.

With this stop gap in place, domains with long running transactions will
still break when using -F, but other domains which starts a transaction
in the middle of Live-Update will continue to work.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_control.c | 10 ++++++
 tools/xenstore/xenstored_control.h |  2 ++
 tools/xenstore/xenstored_core.c    | 49 +++++++++++++++++++++++++++++-
 tools/xenstore/xenstored_core.h    |  3 ++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index 1c24d4869eab..a045f102a420 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -131,6 +131,11 @@ unsigned int lu_write_response(FILE *fp)
 	return sizeof(msg) + msg.len;
 }
 
+bool lu_is_pending(void)
+{
+	return lu_status != NULL;
+}
+
 #else
 struct connection *lu_get_connection(void)
 {
@@ -142,6 +147,11 @@ unsigned int lu_write_response(FILE *fp)
 	/* Unsupported */
 	return 0;
 }
+
+bool lu_is_pending(void)
+{
+	return false;
+}
 #endif
 
 struct cmd_s {
diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
index 27d7f19e4b7f..98b6fbcea2b1 100644
--- a/tools/xenstore/xenstored_control.h
+++ b/tools/xenstore/xenstored_control.h
@@ -23,3 +23,5 @@ struct connection *lu_get_connection(void);
 
 /* Write the "OK" response for the live-update command */
 unsigned int lu_write_response(FILE *fp);
+
+bool lu_is_pending(void);
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 9eca58682b51..10b53af76ac5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -338,7 +338,20 @@ static int destroy_conn(void *_conn)
 
 static bool conn_can_read(struct connection *conn)
 {
-	return conn->funcs->can_read(conn) && !conn->is_ignored;
+	if (!conn->funcs->can_read(conn))
+		return false;
+
+	if (conn->is_ignored)
+		return false;
+
+	/*
+	 * For stalled connection, we want to process the pending
+	 * command as soon as live-update has aborted.
+	 */
+	if (conn->is_stalled)
+		return !lu_is_pending();
+
+	return true;
 }
 
 static bool conn_can_write(struct connection *conn)
@@ -417,6 +430,12 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
 			if (!list_empty(&conn->out_list))
 				events |= POLLOUT;
 			conn->pollfd_idx = set_fd(conn->fd, events);
+			/*
+			 * For stalled connection, we want to process the
+			 * pending command as soon as live-update has aborted.
+			 */
+			if (conn->is_stalled && !lu_is_pending())
+				*ptimeout = 0;
 		}
 	}
 }
@@ -1524,6 +1543,9 @@ static bool process_delayed_message(struct delayed_request *req)
 	struct connection *conn = req->data;
 	struct buffered_data *saved_in = conn->in;
 
+	if (lu_is_pending())
+		return false;
+
 	/*
 	 * Part of process_message() expects conn->in to contains the
 	 * processed response. So save the current conn->in and restore it
@@ -1543,6 +1565,30 @@ static void consider_message(struct connection *conn)
 			sockmsg_string(conn->in->hdr.msg.type),
 			conn->in->hdr.msg.len, conn);
 
+	conn->is_stalled = false;
+	/*
+	 * Currently, Live-Update is not supported if there is active
+	 * transactions. In order to reduce the number of retry, delay
+	 * any new request to start a transaction if Live-Update is pending
+	 * and there are no transactions in-flight.
+	 *
+	 * If we can't delay the request, then mark the connection as
+	 * stalled. This will ignore new requests until Live-Update happened
+	 * or it was aborted.
+	 */
+	if (lu_is_pending() && conn->transaction_started == 0 &&
+	    conn->in->hdr.msg.type == XS_TRANSACTION_START) {
+		trace("Delaying transaction start for connection %p req_id %u\n",
+		      conn, conn->in->hdr.msg.req_id);
+
+		if (delay_request(conn, conn->in, process_delayed_message,
+				  conn, false) != 0) {
+			trace("Stalling connection %p\n", conn);
+			conn->is_stalled = true;
+		}
+		return;
+	}
+
 	process_message(conn, conn->in);
 
 	assert(conn->in == NULL);
@@ -1629,6 +1675,7 @@ struct connection *new_connection(const struct interface_funcs *funcs)
 	new->pollfd_idx = -1;
 	new->funcs = funcs;
 	new->is_ignored = false;
+	new->is_stalled = false;
 	new->transaction_started = 0;
 	INIT_LIST_HEAD(&new->out_list);
 	INIT_LIST_HEAD(&new->watches);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index dac517156993..258f6ff38279 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -110,6 +110,9 @@ struct connection
 	/* Is this connection ignored? */
 	bool is_ignored;
 
+	/* Is the connection stalled? */
+	bool is_stalled;
+
 	/* Buffered incoming data. */
 	struct buffered_data *in;
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore
  2021-06-16 14:43 ` [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore Julien Grall
@ 2021-06-16 15:15   ` Juergen Gross
  0 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-16 15:15 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Stefano Stabellini, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 255 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> I would like to help reviewing Xenstored patches. It is more convenient
> to find them if I am CCed.
> 
> Signed-off-by: Julien Grall <julien@xen.org>

Acked-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it
  2021-06-16 14:43 ` [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it Julien Grall
@ 2021-06-21  8:21   ` Luca Fancellu
  2021-06-24  7:21   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  8:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, dump_state_buffered_data() is taking two connections
> in parameters (one is the connection to dump, the other is the
> connection used to request LU). The naming doesn't help to
> distinguish (c vs conn) them and this already lead to several mistake
> while modifying the function.
> 
> To remove the confusion, introduce an help lu_get_connection() that
> will return the connection used to request LU and use it
> in place of the existing parameter.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_control.c | 13 ++++++++++++-
> tools/xenstore/xenstored_control.h |  2 ++
> tools/xenstore/xenstored_core.c    |  7 +++----
> tools/xenstore/xenstored_core.h    |  1 -
> tools/xenstore/xenstored_domain.c  |  6 +++---
> tools/xenstore/xenstored_domain.h  |  2 +-
> 6 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
> index 0d57f9f9400d..d08a2b961432 100644
> --- a/tools/xenstore/xenstored_control.c
> +++ b/tools/xenstore/xenstored_control.c
> @@ -104,6 +104,17 @@ static const char *lu_begin(struct connection *conn)
> 
> 	return NULL;
> }
> +
> +struct connection *lu_get_connection(void)
> +{
> +	return lu_status ? lu_status->conn : NULL;
> +}
> +
> +#else
> +struct connection *lu_get_connection(void)
> +{
> +	return NULL;
> +}
> #endif
> 
> struct cmd_s {
> @@ -516,7 +527,7 @@ static const char *lu_dump_state(const void *ctx, struct connection *conn)
> 	ret = dump_state_global(fp);
> 	if (ret)
> 		goto out;
> -	ret = dump_state_connections(fp, conn);
> +	ret = dump_state_connections(fp);
> 	if (ret)
> 		goto out;
> 	ret = dump_state_special_nodes(fp);
> diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
> index aac61f05908f..6842b8d88760 100644
> --- a/tools/xenstore/xenstored_control.h
> +++ b/tools/xenstore/xenstored_control.h
> @@ -18,3 +18,5 @@
> 
> int do_control(struct connection *conn, struct buffered_data *in);
> void lu_read_state(void);
> +
> +struct connection *lu_get_connection(void);
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 883a1a582a60..607187361d84 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -2369,14 +2369,13 @@ const char *dump_state_global(FILE *fp)
> 
> /* Called twice: first with fp == NULL to get length, then for writing data. */
> const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
> -				     const struct connection *conn,
> 				     struct xs_state_connection *sc)
> {
> 	unsigned int len = 0, used;
> 	struct buffered_data *out, *in = c->in;
> 	bool partial = true;
> 
> -	if (in && c != conn) {
> +	if (in && c != lu_get_connection()) {
> 		len = in->inhdr ? in->used : sizeof(in->hdr);
> 		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
> 			return "Dump read data error";
> @@ -2416,8 +2415,8 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
> 	}
> 
> 	/* Add "OK" for live-update command. */
> -	if (c == conn) {
> -		struct xsd_sockmsg msg = conn->in->hdr.msg;
> +	if (c == lu_get_connection()) {
> +		struct xsd_sockmsg msg = c->in->hdr.msg;
> 
> 		msg.len = sizeof("OK");
> 		if (fp && fwrite(&msg, sizeof(msg), 1, fp) != 1)
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index bb36111ecc56..89ce155e755b 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -269,7 +269,6 @@ void set_tdb_key(const char *name, TDB_DATA *key);
> 
> const char *dump_state_global(FILE *fp);
> const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
> -				     const struct connection *conn,
> 				     struct xs_state_connection *sc);
> const char *dump_state_nodes(FILE *fp, const void *ctx);
> const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index 322b0dbca449..6d8d29cbe41c 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -1183,7 +1183,7 @@ void wrl_apply_debit_trans_commit(struct connection *conn)
> 	wrl_apply_debit_actual(conn->domain);
> }
> 
> -const char *dump_state_connections(FILE *fp, struct connection *conn)
> +const char *dump_state_connections(FILE *fp)
> {
> 	const char *ret = NULL;
> 	unsigned int conn_id = 1;
> @@ -1209,7 +1209,7 @@ const char *dump_state_connections(FILE *fp, struct connection *conn)
> 			sc.spec.socket_fd = c->fd;
> 		}
> 
> -		ret = dump_state_buffered_data(NULL, c, conn, &sc);
> +		ret = dump_state_buffered_data(NULL, c, &sc);
> 		if (ret)
> 			return ret;
> 		head.length += sc.data_in_len + sc.data_out_len;
> @@ -1219,7 +1219,7 @@ const char *dump_state_connections(FILE *fp, struct connection *conn)
> 		if (fwrite(&sc, offsetof(struct xs_state_connection, data),
> 			   1, fp) != 1)
> 			return "Dump connection state error";
> -		ret = dump_state_buffered_data(fp, c, conn, NULL);
> +		ret = dump_state_buffered_data(fp, c, NULL);
> 		if (ret)
> 			return ret;
> 		ret = dump_state_align(fp);
> diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
> index cc5147d7e747..62ee471ea6aa 100644
> --- a/tools/xenstore/xenstored_domain.h
> +++ b/tools/xenstore/xenstored_domain.h
> @@ -101,7 +101,7 @@ void wrl_log_periodic(struct wrl_timestampt now);
> void wrl_apply_debit_direct(struct connection *conn);
> void wrl_apply_debit_trans_commit(struct connection *conn);
> 
> -const char *dump_state_connections(FILE *fp, struct connection *conn);
> +const char *dump_state_connections(FILE *fp);
> const char *dump_state_special_nodes(FILE *fp);
> 
> void read_state_connection(const void *ctx, const void *state);
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-16 14:43 ` [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request Julien Grall
@ 2021-06-21  8:55   ` Luca Fancellu
  2021-06-24  8:06     ` Julien Grall
  2021-06-24  7:32   ` Juergen Gross
  1 sibling, 1 reply; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  8:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> call_delayed() is currently assuming that conn->in is NULL when
> handling delayed request. However, the connection is not paused.
> Therefore new request can be processed and conn->in may be non-NULL
> if we have only received a partial request.
> 
> Furthermore, as we overwrite conn->in, the current partial request
> will not be transferred. This will result to corrupt the connection.
> 
> Rather than updating conn->in, stash the LU request in lu_status and
> let each callback for delayed request to update conn->in when
> necessary.
> 
> To keep a sane interface, the code to write the "OK" response the
> LU request is moved in xenstored_core.c.
> 
> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution of a xenstore request")
> Fixes: ed6eebf17d ("tools/xenstore: dump the xenstore state for live update")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----
> 
> This is fixing bugs from two separate commits. I couldn't figure out
> how to split in two patches without breaking bisection.
> ---
> tools/xenstore/xenstored_control.c | 41 ++++++++++++++++++++++++++++--
> tools/xenstore/xenstored_control.h |  3 +++
> tools/xenstore/xenstored_core.c    | 17 +++----------
> 3 files changed, 46 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
> index d08a2b961432..7acc2d134f9f 100644
> --- a/tools/xenstore/xenstored_control.c
> +++ b/tools/xenstore/xenstored_control.c
> @@ -50,6 +50,9 @@ struct live_update {
> 	/* For verification the correct connection is acting. */
> 	struct connection *conn;
> 
> +	/* Pointer to the command used to request LU */
> +	struct buffered_data *in;
> +
> #ifdef __MINIOS__
> 	void *kernel;
> 	unsigned int kernel_size;
> @@ -100,6 +103,7 @@ static const char *lu_begin(struct connection *conn)
> 	if (!lu_status)
> 		return "Allocation failure.";
> 	lu_status->conn = conn;
> +	lu_status->in = conn->in;
> 	talloc_set_destructor(lu_status, lu_destroy);
> 
> 	return NULL;
> @@ -110,11 +114,34 @@ struct connection *lu_get_connection(void)
> 	return lu_status ? lu_status->conn : NULL;
> }
> 
> +unsigned int lu_write_response(FILE *fp)
> +{
> +	struct xsd_sockmsg msg;
> +
> +	assert(lu_status);
> +
> +	msg = lu_status->in->hdr.msg;
> +
> +	msg.len = sizeof("OK");
> +	if (fp && fwrite(&msg, sizeof(msg), 1, fp) != 1)
> +		return 0;
> +	if (fp && fwrite("OK", msg.len, 1, fp) != 1)
> +		return 0;
> +
> +	return sizeof(msg) + msg.len;
> +}
> +
> #else
> struct connection *lu_get_connection(void)
> {
> 	return NULL;
> }
> +
> +unsigned int lu_write_response(FILE *fp)
> +{
> +	/* Unsupported */
> +	return 0;
> +}
> #endif
> 
> struct cmd_s {
> @@ -658,6 +685,8 @@ static bool do_lu_start(struct delayed_request *req)
> {
> 	time_t now = time(NULL);
> 	const char *ret;
> +	struct buffered_data *saved_in;
> +	struct connection *conn = lu_status->conn;
> 
> 	if (!lu_check_lu_allowed()) {
> 		if (now < lu_status->started_at + lu_status->timeout)
> @@ -668,8 +697,9 @@ static bool do_lu_start(struct delayed_request *req)
> 		}
> 	}
> 
> +	assert(req->in == lu_status->in);
> 	/* Dump out internal state, including "OK" for live update. */
> -	ret = lu_dump_state(req->in, lu_status->conn);
> +	ret = lu_dump_state(req->in, conn);
> 	if (!ret) {
> 		/* Perform the activation of new binary. */
> 		ret = lu_activate_binary(req->in);
> @@ -677,7 +707,14 @@ static bool do_lu_start(struct delayed_request *req)
> 
> 	/* We will reach this point only in case of failure. */
>  out:
> -	send_reply(lu_status->conn, XS_CONTROL, ret, strlen(ret) + 1);
> +	/*
> +	 * send_reply() will send the response for conn->in. Save the current
> +	 * conn->in and restore it afterwards.
> +	 */
> +	saved_in = conn->in;
> +	conn->in = req->in;
> +	send_reply(conn, XS_CONTROL, ret, strlen(ret) + 1);
> +	conn->in = saved_in;
> 	talloc_free(lu_status);
> 
> 	return true;
> diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
> index 6842b8d88760..27d7f19e4b7f 100644
> --- a/tools/xenstore/xenstored_control.h
> +++ b/tools/xenstore/xenstored_control.h
> @@ -20,3 +20,6 @@ int do_control(struct connection *conn, struct buffered_data *in);
> void lu_read_state(void);
> 
> struct connection *lu_get_connection(void);
> +
> +/* Write the "OK" response for the live-update command */
> +unsigned int lu_write_response(FILE *fp);
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 607187361d84..41b26d7094c8 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -272,15 +272,10 @@ static int undelay_request(void *_req)
> 
> static void call_delayed(struct connection *conn, struct delayed_request *req)

Here the conn parameter is not needed anymore, or am I missing something?

Cheers,
Luca

> {
> -	assert(conn->in == NULL);
> -	conn->in = req->in;
> -
> 	if (req->func(req)) {
> 		undelay_request(req);
> 		talloc_set_destructor(req, NULL);
> 	}
> -
> -	conn->in = NULL;
> }
> 
> int delay_request(struct connection *conn, struct buffered_data *in,
> @@ -2375,7 +2370,7 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
> 	struct buffered_data *out, *in = c->in;
> 	bool partial = true;
> 
> -	if (in && c != lu_get_connection()) {
> +	if (in) {
> 		len = in->inhdr ? in->used : sizeof(in->hdr);
> 		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
> 			return "Dump read data error";
> @@ -2416,16 +2411,12 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
> 
> 	/* Add "OK" for live-update command. */
> 	if (c == lu_get_connection()) {
> -		struct xsd_sockmsg msg = c->in->hdr.msg;
> +		unsigned int rc = lu_write_response(fp);
> 
> -		msg.len = sizeof("OK");
> -		if (fp && fwrite(&msg, sizeof(msg), 1, fp) != 1)
> +		if (!rc)
> 			return "Dump buffered data error";
> -		len += sizeof(msg);
> -		if (fp && fwrite("OK", msg.len, 1, fp) != 1)
> 
> -			return "Dump buffered data error";
> -		len += msg.len;
> +		len += rc;
> 	}
> 
> 	if (sc)
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay
  2021-06-16 14:43 ` [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay Julien Grall
@ 2021-06-21  9:02   ` Luca Fancellu
  2021-06-24  7:35   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:02 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, only liveupdate request can be delayed. The request can only
> be performed by a privileged connection (e.g. dom0). So it is fine to
> have no limits.
> 
> In a follow-up patch we will want to delay request for unprivileged
> connection as well. So it is best to apply a limit.
> 
> For now and for simplicity, only a single request can be delayed
> for a given unprivileged connection.
> 
> Take the opportunity to tweak the prototype and provide a way to
> bypass the quota check. This would be useful when the function
> is called from the restore code.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_control.c |  2 +-
> tools/xenstore/xenstored_core.c    | 11 ++++++++++-
> tools/xenstore/xenstored_core.h    |  3 ++-
> 3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
> index 7acc2d134f9f..1c24d4869eab 100644
> --- a/tools/xenstore/xenstored_control.c
> +++ b/tools/xenstore/xenstored_control.c
> @@ -737,7 +737,7 @@ static const char *lu_start(const void *ctx, struct connection *conn,
> 	lu_status->timeout = to;
> 	lu_status->started_at = time(NULL);
> 
> -	errno = delay_request(conn, conn->in, do_lu_start, NULL);
> +	errno = delay_request(conn, conn->in, do_lu_start, NULL, false);
> 
> 	return NULL;
> }
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 41b26d7094c8..51d210828922 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -279,10 +279,19 @@ static void call_delayed(struct connection *conn, struct delayed_request *req)
> }
> 
> int delay_request(struct connection *conn, struct buffered_data *in,
> -		  bool (*func)(struct delayed_request *), void *data)
> +		  bool (*func)(struct delayed_request *), void *data,
> +		  bool no_quota_check)
> {
> 	struct delayed_request *req;
> 
> +	/*
> +	 * Only allow one request can be delayed for an unprivileged
> +	 * connection.
> +	 */
> +	if (!no_quota_check && domain_is_unprivileged(conn) &&
> +	    !list_empty(&conn->delayed))
> +		return ENOSPC;
> +
> 	req = talloc(in, struct delayed_request);
> 	if (!req)
> 		return ENOMEM;
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 89ce155e755b..34839b34f6e9 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -213,7 +213,8 @@ char *get_parent(const void *ctx, const char *node);
> 
> /* Delay a request. */
> int delay_request(struct connection *conn, struct buffered_data *in,
> -		  bool (*func)(struct delayed_request *), void *data);
> +		  bool (*func)(struct delayed_request *), void *data,
> +		  bool no_quota_check);
> 
> /* Tracing infrastructure. */
> void trace_create(const void *data, const char *type);
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h
  2021-06-16 14:43 ` [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h Julien Grall
@ 2021-06-21  9:03   ` Luca Fancellu
  2021-06-24  7:36   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> xenstored_core.h will consider live-udpate is not supported if
> O_CLOEXEC doesn't exist. However, the header doesn't include the one
> defining O_CLOEXEC (i.e. fcntl.h). This means that depending on
> the header included, some source file will think Live-Update is not
> supported.
> 
> I am not aware of any issue with the existing. Therefore this is just
> a latent bug so far.
> 
> Prevent any potential issue by including fcntl.h in xenstored_core.h
> 
> Fixes: cd831ee438 ("tools/xenstore: handle CLOEXEC flag for local files and pipes")
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_core.h | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 34839b34f6e9..dac517156993 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -24,6 +24,7 @@
> 
> #include <sys/types.h>
> #include <dirent.h>
> +#include <fcntl.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <errno.h>
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write}
  2021-06-16 14:43 ` [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write} Julien Grall
@ 2021-06-21  9:10   ` Luca Fancellu
  2021-06-24  7:39   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the callbacks can_read and can_write are called directly. This
> doesn't allow us to add generic check and therefore requires duplication.
> 
> At the moment, one check that could benefit to be common is whether the
> connection should ignored. The position is slightly different between
> domain and socket because for the latter we want to check the state of
> the file descriptor first.
> 
> In follow-up patches, there will be more potential generic checks.
> 
> This patch provides wrappers to read/write a connection and move
> the check ->is_ignored after the callback for everyone.
> 
> This also requires to replace the direct call to domain_can_read()
> and domain_can_write() with the new wrapper. At the same time,
> both functions can now be static. Note that the implementations need
> to be moved earlier in the file xenstored_domain.c to avoid
> declaring the prototype.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_core.c   | 18 ++++++++++----
> tools/xenstore/xenstored_domain.c | 40 +++++++++++++------------------
> tools/xenstore/xenstored_domain.h |  4 ----
> 3 files changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 51d210828922..2e5760fe4599 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -334,6 +334,16 @@ static int destroy_conn(void *_conn)
> 	return 0;
> }
> 
> +static bool conn_can_read(struct connection *conn)
> +{
> +	return conn->funcs->can_read(conn) && !conn->is_ignored;
> +}
> +
> +static bool conn_can_write(struct connection *conn)
> +{
> +	return conn->funcs->can_write(conn) && !conn->is_ignored;
> +}
> +
> /* This function returns index inside the array if succeed, -1 if fail */
> static int set_fd(int fd, short events)
> {
> @@ -396,8 +406,8 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
> 	list_for_each_entry(conn, &connections, list) {
> 		if (conn->domain) {
> 			wrl_check_timeout(conn->domain, now, ptimeout);
> -			if (domain_can_read(conn) ||
> -			    (domain_can_write(conn) &&
> +			if (conn_can_read(conn) ||
> +			    (conn_can_write(conn) &&
> 			     !list_empty(&conn->out_list)))
> 				*ptimeout = 0;
> 		} else {
> @@ -2325,14 +2335,14 @@ int main(int argc, char *argv[])
> 			if (&next->list != &connections)
> 				talloc_increase_ref_count(next);
> 
> -			if (conn->funcs->can_read(conn))
> +			if (conn_can_read(conn))
> 				handle_input(conn);
> 			if (talloc_free(conn) == 0)
> 				continue;
> 
> 			talloc_increase_ref_count(conn);
> 
> -			if (conn->funcs->can_write(conn))
> +			if (conn_can_write(conn))
> 				handle_output(conn);
> 			if (talloc_free(conn) == 0)
> 				continue;
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index 6d8d29cbe41c..47e9107c144e 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -172,6 +172,23 @@ static int readchn(struct connection *conn, void *data, unsigned int len)
> 	return len;
> }
> 
> +static bool domain_can_write(struct connection *conn)
> +{
> +	struct xenstore_domain_interface *intf = conn->domain->interface;
> +
> +	return ((intf->rsp_prod - intf->rsp_cons) != XENSTORE_RING_SIZE);
> +}
> +
> +static bool domain_can_read(struct connection *conn)
> +{
> +	struct xenstore_domain_interface *intf = conn->domain->interface;
> +
> +	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
> +		return false;
> +
> +	return (intf->req_cons != intf->req_prod);
> +}
> +
> static const struct interface_funcs domain_funcs = {
> 	.write = writechn,
> 	.read = readchn,
> @@ -290,19 +307,6 @@ void handle_event(void)
> 		barf_perror("Failed to write to event fd");
> }
> 
> -bool domain_can_read(struct connection *conn)
> -{
> -	struct xenstore_domain_interface *intf = conn->domain->interface;
> -
> -	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
> -		return false;
> -
> -	if (conn->is_ignored)
> -		return false;
> -
> -	return (intf->req_cons != intf->req_prod);
> -}
> -
> static bool domid_is_unprivileged(unsigned int domid)
> {
> 	return domid != 0 && domid != priv_domid;
> @@ -314,16 +318,6 @@ bool domain_is_unprivileged(struct connection *conn)
> 	       domid_is_unprivileged(conn->domain->domid);
> }
> 
> -bool domain_can_write(struct connection *conn)
> -{
> -	struct xenstore_domain_interface *intf = conn->domain->interface;
> -
> -	if (conn->is_ignored)
> -		return false;
> -
> -	return ((intf->rsp_prod - intf->rsp_cons) != XENSTORE_RING_SIZE);
> -}
> -
> static char *talloc_domain_path(void *context, unsigned int domid)
> {
> 	return talloc_asprintf(context, "/local/domain/%u", domid);
> diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
> index 62ee471ea6aa..1e929b8f8c6f 100644
> --- a/tools/xenstore/xenstored_domain.h
> +++ b/tools/xenstore/xenstored_domain.h
> @@ -51,10 +51,6 @@ void domain_deinit(void);
> /* Returns the implicit path of a connection (only domains have this) */
> const char *get_implicit_path(const struct connection *conn);
> 
> -/* Can connection attached to domain read/write. */
> -bool domain_can_read(struct connection *conn);
> -bool domain_can_write(struct connection *conn);
> -
> bool domain_is_unprivileged(struct connection *conn);
> 
> /* Remove node permissions for no longer existing domains. */
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in
  2021-06-16 14:43 ` [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in Julien Grall
@ 2021-06-21  9:12   ` Luca Fancellu
  2021-06-24  7:44   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> delay_request() is currently assuming that the request delayed is
> always conn->in. This is currently correct, but it is a call for
> a latent bug as the function allows the caller to specify any request.
> 
> To prevent any future surprise, check if the request delayed is the
> current one.
> 
> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution of a xenstore request")
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 2e5760fe4599..a5084a5b173d 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -306,7 +306,9 @@ int delay_request(struct connection *conn, struct buffered_data *in,
> 	delayed_requests++;
> 	list_add(&req->list, &conn->delayed);
> 
> -	conn->in = NULL;
> +	/* Unlink the request from conn if this is the current one */
> +	if (conn->in == in)
> +		conn->in = NULL;
> 
> 	return 0;
> }
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer
  2021-06-16 14:43 ` [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer Julien Grall
@ 2021-06-21  9:21   ` Luca Fancellu
  2021-06-24  8:30   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the restore code is considering the stream will contain at
> most one in-flight request per connection. In a follow-up changes, we
> will want to transfer multiple in-flight requests.
> 
> The function read_state_buffered() is now extended to restore multiple
> in-flight request. Complete requests will be queued as delayed
> requests, if there a partial request (only the last one can) then it
> will used as the current in-flight request.
> 
> Note that we want to bypass the quota check for delayed requests as
> the new Xenstore may have a lower limit.
> 
> Lastly, there is no need to change the specification as there was
> no restriction on the number of in-flight requests preserved.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_core.c | 56 ++++++++++++++++++++++++++++-----
> 1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index a5084a5b173d..5b7ab7f74013 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -1486,6 +1486,10 @@ static void process_message(struct connection *conn, struct buffered_data *in)
> 	enum xsd_sockmsg_type type = in->hdr.msg.type;
> 	int ret;
> 
> +	/* At least send_error() and send_reply() expects conn->in == in */
> +	assert(conn->in == in);
> +	trace_io(conn, in, 0);
> +
> 	if ((unsigned int)type >= XS_TYPE_COUNT || !wire_funcs[type].func) {
> 		eprintf("Client unknown operation %i", type);
> 		send_error(conn, ENOSYS);
> @@ -1515,6 +1519,23 @@ static void process_message(struct connection *conn, struct buffered_data *in)
> 	conn->transaction = NULL;
> }
> 
> +static bool process_delayed_message(struct delayed_request *req)
> +{
> +	struct connection *conn = req->data;
> +	struct buffered_data *saved_in = conn->in;
> +
> +	/*
> +	 * Part of process_message() expects conn->in to contains the
> +	 * processed response. So save the current conn->in and restore it
> +	 * afterwards.
> +	 */
> +	conn->in = req->in;
> +	process_message(req->data, req->in);
> +	conn->in = saved_in;
> +
> +	return true;
> +}
> +
> static void consider_message(struct connection *conn)
> {
> 	if (verbose)
> @@ -1582,7 +1603,6 @@ static void handle_input(struct connection *conn)
> 	if (in->used != in->hdr.msg.len)
> 		return;
> 
> -	trace_io(conn, in, 0);
> 	consider_message(conn);
> 	return;
> 
> @@ -2611,14 +2631,20 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
> 	unsigned int len;
> 	bool partial = sc->data_resp_len;
> 
> -	if (sc->data_in_len) {
> +	for (data = sc->data; data < sc->data + sc->data_in_len; data += len) {
> 		bdata = new_buffer(conn);
> 		if (!bdata)
> 			barf("error restoring read data");
> -		if (sc->data_in_len < sizeof(bdata->hdr)) {
> +
> +		/*
> +		 * We don't know yet if there is more than one message
> +		 * to process. So the len is the size of the leftover data.
> +		 */
> +		len = sc->data_in_len - (data - sc->data);
> +		if (len < sizeof(bdata->hdr)) {
> 			bdata->inhdr = true;
> -			memcpy(&bdata->hdr, sc->data, sc->data_in_len);
> -			bdata->used = sc->data_in_len;
> +			memcpy(&bdata->hdr, sc->data, len);
> +			bdata->used = len;
> 		} else {
> 			bdata->inhdr = false;
> 			memcpy(&bdata->hdr, sc->data, sizeof(bdata->hdr));
> @@ -2629,12 +2655,26 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
> 							bdata->hdr.msg.len);
> 			if (!bdata->buffer)
> 				barf("Error allocating in buffer");
> -			bdata->used = sc->data_in_len - sizeof(bdata->hdr);
> -			memcpy(bdata->buffer, sc->data + sizeof(bdata->hdr),
> +			bdata->used = min_t(unsigned int,
> +					    len - sizeof(bdata->hdr),
> +					    bdata->hdr.msg.len);
> +			memcpy(bdata->buffer, data + sizeof(bdata->hdr),
> 			       bdata->used);
> +			/* Update len to match the size of the message. */
> +			len = bdata->used + sizeof(bdata->hdr);
> 		}
> 
> -		conn->in = bdata;
> +		/*
> +		 * If the message is not complete, then it means this was
> +		 * the current processed message. All the other messages
> +		 * will be queued to be handled after restoring.
> +		 */
> +		if (bdata->inhdr || bdata->used != bdata->hdr.msg.len) {
> +			assert(conn->in == NULL);
> +			conn->in = bdata;
> +		} else if (delay_request(conn, bdata, process_delayed_message,
> +					 conn, true))
> +			barf("Unable to delay the request");
> 	}
> 
> 	for (data = sc->data + sc->data_in_len;
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-16 14:43 ` [PATCH 09/10] tools/xenstored: Dump delayed requests Julien Grall
@ 2021-06-21  9:27   ` Luca Fancellu
  2021-06-24  8:41   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, only Live-Update request can be delayed. In a follow-up,
> we will want to delay more requests (e.g. transaction start).
> Therefore we want to preserve delayed requests across Live-Update.
> 
> Delayed requests are just complete "in" buffer. So the code is
> refactored to allow sharing the code to dump "in" buffer.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_core.c | 42 +++++++++++++++++++++++++--------
> 1 file changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 5b7ab7f74013..9eca58682b51 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
> 	return NULL;
> }
> 
> +static const char *dump_input_buffered_data(FILE *fp,
> +					    const struct buffered_data *in,
> +					    unsigned int *total_len)
> +{
> +	unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
> +
> +	*total_len += hlen;
> +	if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
> +		return "Dump read data error";
> +	if (!in->inhdr && in->used) {
> +		*total_len += in->used;
> +		if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
> +			return "Dump read data error";
> +	}
> +
> +	return NULL;
> +}
> +
> /* Called twice: first with fp == NULL to get length, then for writing data. */
> const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
> 				     struct xs_state_connection *sc)
> {
> 	unsigned int len = 0, used;
> -	struct buffered_data *out, *in = c->in;
> +	struct buffered_data *out;
> 	bool partial = true;
> +	struct delayed_request *req;
> +	const char *ret;
> 
> -	if (in) {
> -		len = in->inhdr ? in->used : sizeof(in->hdr);
> -		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
> -			return "Dump read data error";
> -		if (!in->inhdr && in->used) {
> -			len += in->used;
> -			if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
> -				return "Dump read data error";
> -		}
> +	/* Dump any command that was delayed */
> +	list_for_each_entry(req, &c->delayed, list) {
> +		if (req->func != process_delayed_message)
> +			continue;
> +
> +		assert(!req->in->inhdr);
> +		if ((ret = dump_input_buffered_data(fp, req->in, &len)))
> +			return ret;
> 	}
> 
> +	if (c->in && (ret = dump_input_buffered_data(fp, c->in, &len)))
> +		return ret;
> +
> 	if (sc) {
> 		sc->data_in_len = len;
> 		sc->data_resp_len = 0;
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending
  2021-06-16 14:43 ` [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending Julien Grall
@ 2021-06-21  9:30   ` Luca Fancellu
  2021-06-24  9:23   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Luca Fancellu @ 2021-06-21  9:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross



> On 16 Jun 2021, at 15:43, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, Live-Update will, by default, not proceed if there are
> in-flight transactions. It is possible force it by passing -F but this
> will break any connection with in-flight transactions.
> 
> There are PV drivers out that may never terminate some transaction. On
> host running such guest, we would need to use -F. Unfortunately, this
> also risks to break well-behaving guests (and even dom0) because
> Live-Update will happen as soon as the timeout is hit.
> 
> Ideally, we would want to preserve transactions but this requires
> some work and a lot of testing to be able to use it in production.
> 
> As a stop gap, we want to limit the damage of -F. This patch will delay
> any transactions that are started after Live-Update has been requested.
> 
> If the request cannot be delayed, the connection will be stalled to
> avoid loosing requests.
> 
> If the connection has already a pending transaction before Live-Update,
> then new transaction will not be delayed. This is to avoid the connection
> to stall.
> 
> With this stop gap in place, domains with long running transactions will
> still break when using -F, but other domains which starts a transaction
> in the middle of Live-Update will continue to work.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> tools/xenstore/xenstored_control.c | 10 ++++++
> tools/xenstore/xenstored_control.h |  2 ++
> tools/xenstore/xenstored_core.c    | 49 +++++++++++++++++++++++++++++-
> tools/xenstore/xenstored_core.h    |  3 ++
> 4 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
> index 1c24d4869eab..a045f102a420 100644
> --- a/tools/xenstore/xenstored_control.c
> +++ b/tools/xenstore/xenstored_control.c
> @@ -131,6 +131,11 @@ unsigned int lu_write_response(FILE *fp)
> 	return sizeof(msg) + msg.len;
> }
> 
> +bool lu_is_pending(void)
> +{
> +	return lu_status != NULL;
> +}
> +
> #else
> struct connection *lu_get_connection(void)
> {
> @@ -142,6 +147,11 @@ unsigned int lu_write_response(FILE *fp)
> 	/* Unsupported */
> 	return 0;
> }
> +
> +bool lu_is_pending(void)
> +{
> +	return false;
> +}
> #endif
> 
> struct cmd_s {
> diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
> index 27d7f19e4b7f..98b6fbcea2b1 100644
> --- a/tools/xenstore/xenstored_control.h
> +++ b/tools/xenstore/xenstored_control.h
> @@ -23,3 +23,5 @@ struct connection *lu_get_connection(void);
> 
> /* Write the "OK" response for the live-update command */
> unsigned int lu_write_response(FILE *fp);
> +
> +bool lu_is_pending(void);
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 9eca58682b51..10b53af76ac5 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -338,7 +338,20 @@ static int destroy_conn(void *_conn)
> 
> static bool conn_can_read(struct connection *conn)
> {
> -	return conn->funcs->can_read(conn) && !conn->is_ignored;
> +	if (!conn->funcs->can_read(conn))
> +		return false;
> +
> +	if (conn->is_ignored)
> +		return false;
> +
> +	/*
> +	 * For stalled connection, we want to process the pending
> +	 * command as soon as live-update has aborted.
> +	 */
> +	if (conn->is_stalled)
> +		return !lu_is_pending();
> +
> +	return true;
> }
> 
> static bool conn_can_write(struct connection *conn)
> @@ -417,6 +430,12 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
> 			if (!list_empty(&conn->out_list))
> 				events |= POLLOUT;
> 			conn->pollfd_idx = set_fd(conn->fd, events);
> +			/*
> +			 * For stalled connection, we want to process the
> +			 * pending command as soon as live-update has aborted.
> +			 */
> +			if (conn->is_stalled && !lu_is_pending())
> +				*ptimeout = 0;
> 		}
> 	}
> }
> @@ -1524,6 +1543,9 @@ static bool process_delayed_message(struct delayed_request *req)
> 	struct connection *conn = req->data;
> 	struct buffered_data *saved_in = conn->in;
> 
> +	if (lu_is_pending())
> +		return false;
> +
> 	/*
> 	 * Part of process_message() expects conn->in to contains the
> 	 * processed response. So save the current conn->in and restore it
> @@ -1543,6 +1565,30 @@ static void consider_message(struct connection *conn)
> 			sockmsg_string(conn->in->hdr.msg.type),
> 			conn->in->hdr.msg.len, conn);
> 
> +	conn->is_stalled = false;
> +	/*
> +	 * Currently, Live-Update is not supported if there is active
> +	 * transactions. In order to reduce the number of retry, delay
> +	 * any new request to start a transaction if Live-Update is pending
> +	 * and there are no transactions in-flight.
> +	 *
> +	 * If we can't delay the request, then mark the connection as
> +	 * stalled. This will ignore new requests until Live-Update happened
> +	 * or it was aborted.
> +	 */
> +	if (lu_is_pending() && conn->transaction_started == 0 &&
> +	    conn->in->hdr.msg.type == XS_TRANSACTION_START) {
> +		trace("Delaying transaction start for connection %p req_id %u\n",
> +		      conn, conn->in->hdr.msg.req_id);
> +
> +		if (delay_request(conn, conn->in, process_delayed_message,
> +				  conn, false) != 0) {
> +			trace("Stalling connection %p\n", conn);
> +			conn->is_stalled = true;
> +		}
> +		return;
> +	}
> +
> 	process_message(conn, conn->in);
> 
> 	assert(conn->in == NULL);
> @@ -1629,6 +1675,7 @@ struct connection *new_connection(const struct interface_funcs *funcs)
> 	new->pollfd_idx = -1;
> 	new->funcs = funcs;
> 	new->is_ignored = false;
> +	new->is_stalled = false;
> 	new->transaction_started = 0;
> 	INIT_LIST_HEAD(&new->out_list);
> 	INIT_LIST_HEAD(&new->watches);
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index dac517156993..258f6ff38279 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -110,6 +110,9 @@ struct connection
> 	/* Is this connection ignored? */
> 	bool is_ignored;
> 
> +	/* Is the connection stalled? */
> +	bool is_stalled;
> +
> 	/* Buffered incoming data. */
> 	struct buffered_data *in;
> 
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it
  2021-06-16 14:43 ` [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it Julien Grall
  2021-06-21  8:21   ` Luca Fancellu
@ 2021-06-24  7:21   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:21 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 681 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, dump_state_buffered_data() is taking two connections
> in parameters (one is the connection to dump, the other is the
> connection used to request LU). The naming doesn't help to
> distinguish (c vs conn) them and this already lead to several mistake
> while modifying the function.
> 
> To remove the confusion, introduce an help lu_get_connection() that
> will return the connection used to request LU and use it
> in place of the existing parameter.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-16 14:43 ` [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request Julien Grall
  2021-06-21  8:55   ` Luca Fancellu
@ 2021-06-24  7:32   ` Juergen Gross
  2021-06-24  7:34     ` Juergen Gross
  1 sibling, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:32 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 1152 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> call_delayed() is currently assuming that conn->in is NULL when
> handling delayed request. However, the connection is not paused.
> Therefore new request can be processed and conn->in may be non-NULL
> if we have only received a partial request.
> 
> Furthermore, as we overwrite conn->in, the current partial request
> will not be transferred. This will result to corrupt the connection.
> 
> Rather than updating conn->in, stash the LU request in lu_status and
> let each callback for delayed request to update conn->in when
> necessary.
> 
> To keep a sane interface, the code to write the "OK" response the
> LU request is moved in xenstored_core.c.
> 
> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution of a xenstore request")
> Fixes: ed6eebf17d ("tools/xenstore: dump the xenstore state for live update")
> Signed-off-by: Julien Grall <jgrall@amazon.com>

With dropping the conn parameter from call_delayed as already
mentioned by Luca you can add my:

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-24  7:32   ` Juergen Gross
@ 2021-06-24  7:34     ` Juergen Gross
  2021-06-24  7:56       ` Luca Fancellu
  0 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:34 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 1360 bytes --]

On 24.06.21 09:32, Juergen Gross wrote:
> On 16.06.21 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> call_delayed() is currently assuming that conn->in is NULL when
>> handling delayed request. However, the connection is not paused.
>> Therefore new request can be processed and conn->in may be non-NULL
>> if we have only received a partial request.
>>
>> Furthermore, as we overwrite conn->in, the current partial request
>> will not be transferred. This will result to corrupt the connection.
>>
>> Rather than updating conn->in, stash the LU request in lu_status and
>> let each callback for delayed request to update conn->in when
>> necessary.
>>
>> To keep a sane interface, the code to write the "OK" response the
>> LU request is moved in xenstored_core.c.
>>
>> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution 

>> of a xenstore request")
>> Fixes: ed6eebf17d ("tools/xenstore: dump the xenstore state for live 
>> update")
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> With dropping the conn parameter from call_delayed as already
> mentioned by Luca you can add my:

Oh, please drop my request to delete the conn parameter, as it is being
used in patch 4 again.

> 
> Reviewed-by: Juergen Gross <jgross@suse.com>

This stands, of course.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay
  2021-06-16 14:43 ` [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay Julien Grall
  2021-06-21  9:02   ` Luca Fancellu
@ 2021-06-24  7:35   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:35 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 785 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, only liveupdate request can be delayed. The request can only
> be performed by a privileged connection (e.g. dom0). So it is fine to
> have no limits.
> 
> In a follow-up patch we will want to delay request for unprivileged
> connection as well. So it is best to apply a limit.
> 
> For now and for simplicity, only a single request can be delayed
> for a given unprivileged connection.
> 
> Take the opportunity to tweak the prototype and provide a way to
> bypass the quota check. This would be useful when the function
> is called from the restore code.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h
  2021-06-16 14:43 ` [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h Julien Grall
  2021-06-21  9:03   ` Luca Fancellu
@ 2021-06-24  7:36   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:36 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 758 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> xenstored_core.h will consider live-udpate is not supported if
> O_CLOEXEC doesn't exist. However, the header doesn't include the one
> defining O_CLOEXEC (i.e. fcntl.h). This means that depending on
> the header included, some source file will think Live-Update is not
> supported.
> 
> I am not aware of any issue with the existing. Therefore this is just
> a latent bug so far.
> 
> Prevent any potential issue by including fcntl.h in xenstored_core.h
> 
> Fixes: cd831ee438 ("tools/xenstore: handle CLOEXEC flag for local files 
and pipes")
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write}
  2021-06-16 14:43 ` [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write} Julien Grall
  2021-06-21  9:10   ` Luca Fancellu
@ 2021-06-24  7:39   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 1120 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the callbacks can_read and can_write are called directly. This
> doesn't allow us to add generic check and therefore requires duplication.
> 
> At the moment, one check that could benefit to be common is whether the
> connection should ignored. The position is slightly different between
> domain and socket because for the latter we want to check the state of
> the file descriptor first.
> 
> In follow-up patches, there will be more potential generic checks.
> 
> This patch provides wrappers to read/write a connection and move
> the check ->is_ignored after the callback for everyone.
> 
> This also requires to replace the direct call to domain_can_read()
> and domain_can_write() with the new wrapper. At the same time,
> both functions can now be static. Note that the implementations need
> to be moved earlier in the file xenstored_domain.c to avoid
> declaring the prototype.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in
  2021-06-16 14:43 ` [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in Julien Grall
  2021-06-21  9:12   ` Luca Fancellu
@ 2021-06-24  7:44   ` Juergen Gross
  2021-06-24  7:58     ` Julien Grall
  1 sibling, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  7:44 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 846 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> delay_request() is currently assuming that the request delayed is
> always conn->in. This is currently correct, but it is a call for
> a latent bug as the function allows the caller to specify any request.
> 
> To prevent any future surprise, check if the request delayed is the
> current one.
> 
> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution of a xenstore request")

Is the Fixes: tag really wanted in this patch? Currently there is
nothing wrong without this patch. So a backport will be needed only if
e.g. this series will be backported.

I'm fine either way, but I think this should be Ian's call.

> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-24  7:34     ` Juergen Gross
@ 2021-06-24  7:56       ` Luca Fancellu
  2021-06-24  8:05         ` Juergen Gross
  0 siblings, 1 reply; 43+ messages in thread
From: Luca Fancellu @ 2021-06-24  7:56 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Julien Grall, xen-devel, raphning, doebel, Julien Grall,
	Ian Jackson, Wei Liu



> On 24 Jun 2021, at 08:34, Juergen Gross <jgross@suse.com> wrote:
> 
> On 24.06.21 09:32, Juergen Gross wrote:
>> On 16.06.21 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>> 
>>> call_delayed() is currently assuming that conn->in is NULL when
>>> handling delayed request. However, the connection is not paused.
>>> Therefore new request can be processed and conn->in may be non-NULL
>>> if we have only received a partial request.
>>> 
>>> Furthermore, as we overwrite conn->in, the current partial request
>>> will not be transferred. This will result to corrupt the connection.
>>> 
>>> Rather than updating conn->in, stash the LU request in lu_status and
>>> let each callback for delayed request to update conn->in when
>>> necessary.
>>> 
>>> To keep a sane interface, the code to write the "OK" response the
>>> LU request is moved in xenstored_core.c.
>>> 
>>> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution 
> 
>>> of a xenstore request")
>>> Fixes: ed6eebf17d ("tools/xenstore: dump the xenstore state for live update")
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> With dropping the conn parameter from call_delayed as already
>> mentioned by Luca you can add my:
> 
> Oh, please drop my request to delete the conn parameter, as it is being
> used in patch 4 again.
> 
>> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
> This stands, of course.

Hi Juergen,

I’m sorry but I don’t see when the parameter is used, in patch 4 we have this call:

line 2344:
        if (delayed_requests) {
            list_for_each_entry(conn, &connections, list) {
                struct delayed_request *req, *tmp;

                list_for_each_entry_safe(req, tmp,
                             &conn->delayed, list)
                    call_delayed(conn, req);
            }
        }

But call_delayed is still this one:

Line 273:
static void call_delayed(struct connection *conn, struct delayed_request *req)
{
    if (req->func(req)) {
        undelay_request(req);
        talloc_set_destructor(req, NULL);
    }
}

Am I missing something?

Cheers,
Luca

> 
> 
> Juergen
> <OpenPGP_0xB0DE9DD628BF132F.asc>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in
  2021-06-24  7:44   ` Juergen Gross
@ 2021-06-24  7:58     ` Julien Grall
  0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-24  7:58 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu

Hi Juergen,

On 24/06/2021 09:44, Juergen Gross wrote:
> On 16.06.21 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> delay_request() is currently assuming that the request delayed is
>> always conn->in. This is currently correct, but it is a call for
>> a latent bug as the function allows the caller to specify any request.
>>
>> To prevent any future surprise, check if the request delayed is the
>> current one.
>>
>> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution 
>> of a xenstore request")
> 
> Is the Fixes: tag really wanted in this patch? Currently there is
> nothing wrong without this patch. So a backport will be needed only if
> e.g. this series will be backported. >
> I'm fine either way, but I think this should be Ian's call.

We don't usually backport to stable for tech preview (Xenstored 
Live-Update is one).

In this case, I mainly added it because it makes easier for a developper 
to figure out where the bugs comes from and downstream may want to 
ingest it. This doesn't mean I request an official backport.

I could just mention in the commit message if you prefer.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-24  7:56       ` Luca Fancellu
@ 2021-06-24  8:05         ` Juergen Gross
  0 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  8:05 UTC (permalink / raw)
  To: Luca Fancellu
  Cc: Julien Grall, xen-devel, raphning, doebel, Julien Grall,
	Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 2431 bytes --]

On 24.06.21 09:56, Luca Fancellu wrote:
> 
> 
>> On 24 Jun 2021, at 08:34, Juergen Gross <jgross@suse.com> wrote:
>>
>> On 24.06.21 09:32, Juergen Gross wrote:
>>> On 16.06.21 16:43, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> call_delayed() is currently assuming that conn->in is NULL when
>>>> handling delayed request. However, the connection is not paused.
>>>> Therefore new request can be processed and conn->in may be non-NULL
>>>> if we have only received a partial request.
>>>>
>>>> Furthermore, as we overwrite conn->in, the current partial request
>>>> will not be transferred. This will result to corrupt the connection.
>>>>
>>>> Rather than updating conn->in, stash the LU request in lu_status and
>>>> let each callback for delayed request to update conn->in when
>>>> necessary.
>>>>
>>>> To keep a sane interface, the code to write the "OK" response the
>>>> LU request is moved in xenstored_core.c.
>>>>
>>>> Fixes: c5ca1404b4 ("tools/xenstore: add support for delaying execution
>>
>>>> of a xenstore request")
>>>> Fixes: ed6eebf17d ("tools/xenstore: dump the xenstore state for live 
update")
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>> With dropping the conn parameter from call_delayed as already
>>> mentioned by Luca you can add my:
>>
>> Oh, please drop my request to delete the conn parameter, as it is being
>> used in patch 4 again.
>>
>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>> This stands, of course.
> 
> Hi Juergen,
> 
> I’m sorry but I don’t see when the parameter is used, in patch 4 we have this call:
> 
> line 2344:
>          if (delayed_requests) {
>              list_for_each_entry(conn, &connections, list) {
>                  struct delayed_request *req, *tmp;
> 
>                  list_for_each_entry_safe(req, tmp,
>                               &conn->delayed, list)
>                      call_delayed(conn, req);
>              }
>          }
> 
> But call_delayed is still this one:
> 
> Line 273:
> static void call_delayed(struct connection *conn, struct delayed_request *req)
> {
>      if (req->func(req)) {
>          undelay_request(req);
>          talloc_set_destructor(req, NULL);
>      }
> }
> 
> Am I missing something?

Hmm, I seem to have mixed up something.

Yes, you are right. So off with the conn parameter (again).


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request
  2021-06-21  8:55   ` Luca Fancellu
@ 2021-06-24  8:06     ` Julien Grall
  0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-24  8:06 UTC (permalink / raw)
  To: Luca Fancellu
  Cc: xen-devel, raphning, doebel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross

Hi Luca,

On 21/06/2021 10:55, Luca Fancellu wrote:
>> diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
>> index 6842b8d88760..27d7f19e4b7f 100644
>> --- a/tools/xenstore/xenstored_control.h
>> +++ b/tools/xenstore/xenstored_control.h
>> @@ -20,3 +20,6 @@ int do_control(struct connection *conn, struct buffered_data *in);
>> void lu_read_state(void);
>>
>> struct connection *lu_get_connection(void);
>> +
>> +/* Write the "OK" response for the live-update command */
>> +unsigned int lu_write_response(FILE *fp);
>> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
>> index 607187361d84..41b26d7094c8 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -272,15 +272,10 @@ static int undelay_request(void *_req)
>>
>> static void call_delayed(struct connection *conn, struct delayed_request *req)
> 
> Here the conn parameter is not needed anymore, or am I missing something?

The parameter is now unused. I will drop it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer
  2021-06-16 14:43 ` [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer Julien Grall
  2021-06-21  9:21   ` Luca Fancellu
@ 2021-06-24  8:30   ` Juergen Gross
  2021-06-24  8:42     ` Julien Grall
  1 sibling, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  8:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 2683 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the restore code is considering the stream will contain at
> most one in-flight request per connection. In a follow-up changes, we
> will want to transfer multiple in-flight requests.
> 
> The function read_state_buffered() is now extended to restore multiple
> in-flight request. Complete requests will be queued as delayed
> requests, if there a partial request (only the last one can) then it
> will used as the current in-flight request.
> 
> Note that we want to bypass the quota check for delayed requests as
> the new Xenstore may have a lower limit.
> 
> Lastly, there is no need to change the specification as there was
> no restriction on the number of in-flight requests preserved.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> ---
>   tools/xenstore/xenstored_core.c | 56 ++++++++++++++++++++++++++++-----
>   1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index a5084a5b173d..5b7ab7f74013 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -1486,6 +1486,10 @@ static void process_message(struct connection *conn, struct buffered_data *in)
>   	enum xsd_sockmsg_type type = in->hdr.msg.type;
>   	int ret;
>   
> +	/* At least send_error() and send_reply() expects conn->in == in */
> +	assert(conn->in == in);
> +	trace_io(conn, in, 0);
> +
>   	if ((unsigned int)type >= XS_TYPE_COUNT || !wire_funcs[type].func) 
{
>   		eprintf("Client unknown operation %i", type);
>   		send_error(conn, ENOSYS);
> @@ -1515,6 +1519,23 @@ static void process_message(struct connection *conn, struct buffered_data *in)
>   	conn->transaction = NULL;
>   }
>   
> +static bool process_delayed_message(struct delayed_request *req)
> +{
> +	struct connection *conn = req->data;
> +	struct buffered_data *saved_in = conn->in;
> +
> +	/*
> +	 * Part of process_message() expects conn->in to contains the
> +	 * processed response. So save the current conn->in and restore it
> +	 * afterwards.
> +	 */
> +	conn->in = req->in;
> +	process_message(req->data, req->in);
> +	conn->in = saved_in;

This pattern was added to do_lu_start() already, and it will be needed
for each other function being called via call_delayed().

Maybe it would be better to add conn explicitly to struct
delayed_request (or even better: replace data with conn) and to do the
conn->in saving/setting/restoring in call_delayed() instead?

Other than that:

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-16 14:43 ` [PATCH 09/10] tools/xenstored: Dump delayed requests Julien Grall
  2021-06-21  9:27   ` Luca Fancellu
@ 2021-06-24  8:41   ` Juergen Gross
  2021-06-24 10:28     ` Julien Grall
  1 sibling, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  8:41 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 2793 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, only Live-Update request can be delayed. In a follow-up,
> we will want to delay more requests (e.g. transaction start).
> Therefore we want to preserve delayed requests across Live-Update.
> 
> Delayed requests are just complete "in" buffer. So the code is
> refactored to allow sharing the code to dump "in" buffer.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> ---
>   tools/xenstore/xenstored_core.c | 42 +++++++++++++++++++++++++--------
>   1 file changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 5b7ab7f74013..9eca58682b51 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
>   	return NULL;
>   }
>   
> +static const char *dump_input_buffered_data(FILE *fp,
> +					    const struct buffered_data *in,
> +					    unsigned int *total_len)
> +{
> +	unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
> +
> +	*total_len += hlen;
> +	if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
> +		return "Dump read data error";
> +	if (!in->inhdr && in->used) {
> +		*total_len += in->used;
> +		if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
> +			return "Dump read data error";
> +	}
> +
> +	return NULL;
> +}
> +
>   /* Called twice: first with fp == NULL to get length, then for writing data. */
>   const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
>   				     struct xs_state_connection *sc)
>   {
>   	unsigned int len = 0, used;
> -	struct buffered_data *out, *in = c->in;
> +	struct buffered_data *out;
>   	bool partial = true;
> +	struct delayed_request *req;
> +	const char *ret;
>   
> -	if (in) {
> -		len = in->inhdr ? in->used : sizeof(in->hdr);
> -		if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
> -			return "Dump read data error";
> -		if (!in->inhdr && in->used) {
> -			len += in->used;
> -			if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
> -				return "Dump read data error";
> -		}
> +	/* Dump any command that was delayed */
> +	list_for_each_entry(req, &c->delayed, list) {

Please add a comment here that the following if() serves especially to
avoid dumping the data for do_lu_start().

> +		if (req->func != process_delayed_message)
> +			continue;
> +
> +		assert(!req->in->inhdr);
> +		if ((ret = dump_input_buffered_data(fp, req->in, &len)))
> +			return ret;
>   	}
>   
> +	if (c->in && (ret = dump_input_buffered_data(fp, c->in, &len)))
> +		return ret;
> +
>   	if (sc) {
>   		sc->data_in_len = len;
>   		sc->data_resp_len = 0;
> 

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer
  2021-06-24  8:30   ` Juergen Gross
@ 2021-06-24  8:42     ` Julien Grall
  2021-06-24  9:20       ` Juergen Gross
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2021-06-24  8:42 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu

Hi Juergen,

On 24/06/2021 10:30, Juergen Gross wrote:
> On 16.06.21 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, the restore code is considering the stream will contain at
>> most one in-flight request per connection. In a follow-up changes, we
>> will want to transfer multiple in-flight requests.
>>
>> The function read_state_buffered() is now extended to restore multiple
>> in-flight request. Complete requests will be queued as delayed
>> requests, if there a partial request (only the last one can) then it
>> will used as the current in-flight request.
>>
>> Note that we want to bypass the quota check for delayed requests as
>> the new Xenstore may have a lower limit.
>>
>> Lastly, there is no need to change the specification as there was
>> no restriction on the number of in-flight requests preserved.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> ---
>>   tools/xenstore/xenstored_core.c | 56 ++++++++++++++++++++++++++++-----
>>   1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/xenstore/xenstored_core.c 
>> b/tools/xenstore/xenstored_core.c
>> index a5084a5b173d..5b7ab7f74013 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -1486,6 +1486,10 @@ static void process_message(struct connection 
>> *conn, struct buffered_data *in)
>>       enum xsd_sockmsg_type type = in->hdr.msg.type;
>>       int ret;
>> +    /* At least send_error() and send_reply() expects conn->in == in */
>> +    assert(conn->in == in);
>> +    trace_io(conn, in, 0);
>> +
>>       if ((unsigned int)type >= XS_TYPE_COUNT || !wire_funcs[type].func) 
> {
>>           eprintf("Client unknown operation %i", type);
>>           send_error(conn, ENOSYS);
>> @@ -1515,6 +1519,23 @@ static void process_message(struct connection 
>> *conn, struct buffered_data *in)
>>       conn->transaction = NULL;
>>   }
>> +static bool process_delayed_message(struct delayed_request *req)
>> +{
>> +    struct connection *conn = req->data;
>> +    struct buffered_data *saved_in = conn->in;
>> +
>> +    /*
>> +     * Part of process_message() expects conn->in to contains the
>> +     * processed response. So save the current conn->in and restore it
>> +     * afterwards.
>> +     */
>> +    conn->in = req->in;
>> +    process_message(req->data, req->in);
>> +    conn->in = saved_in;
> 
> This pattern was added to do_lu_start() already, and it will be needed
> for each other function being called via call_delayed().
> 
> Maybe it would be better to add conn explicitly to struct
> delayed_request (or even better: replace data with conn) and to do the
> conn->in saving/setting/restoring in call_delayed() instead?

This was my original approach, but I abandoned it because in the case of 
do_lu_start() we need to save the original conn->in in the stream (see 
patch #3 for more details).

If we overwrite conn->in in call_delayed(), then we need a different way 
to find the original conn->in. I couldn't find a nice way and decided to 
settle with the duplication.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer
  2021-06-24  8:42     ` Julien Grall
@ 2021-06-24  9:20       ` Juergen Gross
  0 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  9:20 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 3510 bytes --]

On 24.06.21 10:42, Julien Grall wrote:
> Hi Juergen,
> 
> On 24/06/2021 10:30, Juergen Gross wrote:
>> On 16.06.21 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> Currently, the restore code is considering the stream will contain at
>>> most one in-flight request per connection. In a follow-up changes, we
>>> will want to transfer multiple in-flight requests.
>>>
>>> The function read_state_buffered() is now extended to restore multiple
>>> in-flight request. Complete requests will be queued as delayed
>>> requests, if there a partial request (only the last one can) then it
>>> will used as the current in-flight request.
>>>
>>> Note that we want to bypass the quota check for delayed requests as
>>> the new Xenstore may have a lower limit.
>>>
>>> Lastly, there is no need to change the specification as there was
>>> no restriction on the number of in-flight requests preserved.
>>>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>> ---
>>>   tools/xenstore/xenstored_core.c | 56 ++++++++++++++++++++++++++++-----
>>>   1 file changed, 48 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/tools/xenstore/xenstored_core.c 
>>> b/tools/xenstore/xenstored_core.c
>>> index a5084a5b173d..5b7ab7f74013 100644
>>> --- a/tools/xenstore/xenstored_core.c
>>> +++ b/tools/xenstore/xenstored_core.c
>>> @@ -1486,6 +1486,10 @@ static void process_message(struct connection 
>>> *conn, struct buffered_data *in)
>>>       enum xsd_sockmsg_type type = in->hdr.msg.type;
>>>       int ret;
>>> +    /* At least send_error() and send_reply() expects 
conn->in == in */
>>> +    assert(conn->in == in);
>>> +    trace_io(conn, in, 0);
>>> +
>>>       if ((unsigned int)type >= XS_TYPE_COUNT|| !wire_funcs[type].func) 
>> {
>>>           eprintf("Client unknown operation %i", type);
>>>           send_error(conn, ENOSYS);
>>> @@ -1515,6 +1519,23 @@ static void process_message(struct connection 
>>> *conn, struct buffered_data *in)
>>>       conn->transaction = NULL;
>>>   }
>>> +static bool process_delayed_message(struct delayed_request *req)
>>> +{
>>> +    struct connection *conn = req->data;
>>> +    struct buffered_data *saved_in = conn->in;
>>> +
>>> +    /*
>>> +     * Part of process_message() expects conn->in to contains the
>>> +     * processed response. So save the current conn->in and restore it
>>> +     * afterwards.
>>> +     */
>>> +    conn->in = req->in;
>>> +    process_message(req->data, req->in);
>>> +    conn->in = saved_in;
>>
>> This pattern was added to do_lu_start() already, and it will be needed
>> for each other function being called via call_delayed().
>>
>> Maybe it would be better to add conn explicitly to struct
>> delayed_request (or even better: replace data with conn) and to do the
>> conn->in saving/setting/restoring in call_delayed() instead?
> 
> This was my original approach, but I abandoned it because in the case of 
> do_lu_start() we need to save the original conn->in in the stream (see 
> patch #3 for more details).
> 
> If we overwrite conn->in in call_delayed(), then we need a different way 
> to find the original conn->in. I couldn't find a nice way and decided to 
> settle with the duplication.

Ah, okay, understood. Even in case we'd be able to restore conn->in it
would be the same amount of code again, but harder to follow.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending
  2021-06-16 14:43 ` [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending Julien Grall
  2021-06-21  9:30   ` Luca Fancellu
@ 2021-06-24  9:23   ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2021-06-24  9:23 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 1462 bytes --]

On 16.06.21 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, Live-Update will, by default, not proceed if there are
> in-flight transactions. It is possible force it by passing -F but this
> will break any connection with in-flight transactions.
> 
> There are PV drivers out that may never terminate some transaction. On
> host running such guest, we would need to use -F. Unfortunately, this
> also risks to break well-behaving guests (and even dom0) because
> Live-Update will happen as soon as the timeout is hit.
> 
> Ideally, we would want to preserve transactions but this requires
> some work and a lot of testing to be able to use it in production.
> 
> As a stop gap, we want to limit the damage of -F. This patch will delay
> any transactions that are started after Live-Update has been requested.
> 
> If the request cannot be delayed, the connection will be stalled to
> avoid loosing requests.
> 
> If the connection has already a pending transaction before Live-Update,
> then new transaction will not be delayed. This is to avoid the connection
> to stall.
> 
> With this stop gap in place, domains with long running transactions will
> still break when using -F, but other domains which starts a transaction
> in the middle of Live-Update will continue to work.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-24  8:41   ` Juergen Gross
@ 2021-06-24 10:28     ` Julien Grall
  2021-06-24 10:45       ` Juergen Gross
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2021-06-24 10:28 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu

Hi Juergen,

On 24/06/2021 10:41, Juergen Gross wrote:
> On 16.06.21 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, only Live-Update request can be delayed. In a follow-up,
>> we will want to delay more requests (e.g. transaction start).
>> Therefore we want to preserve delayed requests across Live-Update.
>>
>> Delayed requests are just complete "in" buffer. So the code is
>> refactored to allow sharing the code to dump "in" buffer.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> ---
>>   tools/xenstore/xenstored_core.c | 42 +++++++++++++++++++++++++--------
>>   1 file changed, 32 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/xenstore/xenstored_core.c 
>> b/tools/xenstore/xenstored_core.c
>> index 5b7ab7f74013..9eca58682b51 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
>>       return NULL;
>>   }
>> +static const char *dump_input_buffered_data(FILE *fp,
>> +                        const struct buffered_data *in,
>> +                        unsigned int *total_len)
>> +{
>> +    unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
>> +
>> +    *total_len += hlen;
>> +    if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
>> +        return "Dump read data error";
>> +    if (!in->inhdr && in->used) {
>> +        *total_len += in->used;
>> +        if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
>> +            return "Dump read data error";
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>>   /* Called twice: first with fp == NULL to get length, then for 
>> writing data. */
>>   const char *dump_state_buffered_data(FILE *fp, const struct 
>> connection *c,
>>                        struct xs_state_connection *sc)
>>   {
>>       unsigned int len = 0, used;
>> -    struct buffered_data *out, *in = c->in;
>> +    struct buffered_data *out;
>>       bool partial = true;
>> +    struct delayed_request *req;
>> +    const char *ret;
>> -    if (in) {
>> -        len = in->inhdr ? in->used : sizeof(in->hdr);
>> -        if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
>> -            return "Dump read data error";
>> -        if (!in->inhdr && in->used) {
>> -            len += in->used;
>> -            if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
>> -                return "Dump read data error";
>> -        }
>> +    /* Dump any command that was delayed */
>> +    list_for_each_entry(req, &c->delayed, list) {
> 
> Please add a comment here that the following if() serves especially to
> avoid dumping the data for do_lu_start().

Would you be happy to give your acked-by/reviewed-by if I add the 
following on commit?

"
We only want to preserve commands that wasn't processed at all. All the 
other delayed requests (such as do_lu_start()) must be processed before 
Live-Update.
"

> 
>> +        if (req->func != process_delayed_message)
>> +            continue;
>> +
>> +        assert(!req->in->inhdr);
>> +        if ((ret = dump_input_buffered_data(fp, req->in, &len)))
>> +            return ret;
>>       }
>> +    if (c->in && (ret = dump_input_buffered_data(fp, c->in, &len)))
>> +        return ret;
>> +
>>       if (sc) {
>>           sc->data_in_len = len;
>>           sc->data_resp_len = 0;
>>
> 
> Juergen

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update
  2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
                   ` (9 preceding siblings ...)
  2021-06-16 14:43 ` [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending Julien Grall
@ 2021-06-24 10:43 ` Julien Grall
  10 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-24 10:43 UTC (permalink / raw)
  To: xen-devel
  Cc: raphning, doebel, Julien Grall, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu,
	Juergen Gross

Hi,

On 16/06/2021 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Hi all,
> 
> At the moment, Live-Update will, by default, not proceed if there are
> in-flight transactions. It is possible force it by passing -F but this
> will break any connection with in-flight transactions.
> 
> There are PV drivers out that may never terminate some transaction. On
> host running such guest, we would need to use -F. Unfortunately, this
> also risks to break well-behaving guests (and even dom0) because
> Live-Update will happen as soon as the timeout is hit.
> 
> This series aims to allow to Live-Update more safely even when the option
> -F is used.
> 
> The first part of the series contains a few fixes for bug found while
> testing Live-Update.
> 
> Cheers,
> 
> Julien Grall (10):
>    MAINTAINERS: Add myself as reviewers for tools/xenstore
>    tools/xenstored: Introduce lu_get_connection() and use it
>    tools/xenstore: Don't assume conn->in points to the LU request
>    tools/xenstored: Limit the number of requests a connection can delay
>    tools/xenstored: xenstored_core.h should include fcntl.h
>    tools/xenstored: Introduce a wrapper for conn->funcs->can_{read,
>      write}
>    tools/xenstored: delay_request: don't assume conn->in == in
>    tools/xenstored: Extend restore code to handle multiple input buffer

I have committed the first 8 patches.

>    tools/xenstored: Dump delayed requests
>    tools/xenstored: Delay new transaction while Live-Update is pending

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-24 10:28     ` Julien Grall
@ 2021-06-24 10:45       ` Juergen Gross
  2021-06-24 10:46         ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24 10:45 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 3413 bytes --]

On 24.06.21 12:28, Julien Grall wrote:
> Hi Juergen,
> 
> On 24/06/2021 10:41, Juergen Gross wrote:
>> On 16.06.21 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> Currently, only Live-Update request can be delayed. In a follow-up,
>>> we will want to delay more requests (e.g. transaction start).
>>> Therefore we want to preserve delayed requests across Live-Update.
>>>
>>> Delayed requests are just complete "in" buffer. So the code is
>>> refactored to allow sharing the code to dump "in" buffer.
>>>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>> ---
>>>   tools/xenstore/xenstored_core.c | 42 +++++++++++++++++++++++++--------
>>>   1 file changed, 32 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/tools/xenstore/xenstored_core.c 
>>> b/tools/xenstore/xenstored_core.c
>>> index 5b7ab7f74013..9eca58682b51 100644
>>> --- a/tools/xenstore/xenstored_core.c
>>> +++ b/tools/xenstore/xenstored_core.c
>>> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
>>>       return NULL;
>>>   }
>>> +static const char *dump_input_buffered_data(FILE *fp,
>>> +                        conststruct buffered_data *in,
>>> +                        unsigned int *total_len)
>>> +{
>>> +    unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
>>> +
>>> +    *total_len += hlen;
>>> +    if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
>>> +        return "Dump read data error";
>>> +    if (!in->inhdr && in->used) {
>>> +        *total_len += in->used;
>>> +        if (fp && fwrite(in->buffer,in->used, 1, fp) != 1)
>>> +            return "Dump read data error";
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>>   /* Called twice: first with fp == NULL to get length, then 
for 
>>> writing data. */
>>>   const char *dump_state_buffered_data(FILE *fp, const struct 
>>> connection *c,
>>>                        struct xs_state_connection *sc)
>>>   {
>>>       unsigned int len = 0, used;
>>> -    struct buffered_data *out, *in = c->in;
>>> +    struct buffered_data *out;
>>>       bool partial = true;
>>> +    struct delayed_request *req;
>>> +    const char *ret;
>>> -    if (in) {
>>> -        len = in->inhdr ? in->used: sizeof(in->hdr);
>>> -        if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
>>> -            return "Dump read data error";
>>> -        if (!in->inhdr && in->used) {
>>> -            len += in->used;
>>> -            if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
>>> -                return "Dump read data error";
>>> -        }
>>> +    /* Dump any command that was delayed */
>>> +    list_for_each_entry(req, &c->delayed, list) {
>>
>> Please add a comment here that the following if() serves especially to
>> avoid dumping the data for do_lu_start().
> 
> Would you be happy to give your acked-by/reviewed-by if I add the 
> following on commit?
> 
> "
> We only want to preserve commands that wasn't processed at all. All the 


s/wasn't/weren't/

> other delayed requests (such as do_lu_start()) must be processed before 

> Live-Update.
> "

With that change I'm fine.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-24 10:45       ` Juergen Gross
@ 2021-06-24 10:46         ` Julien Grall
  2021-06-24 11:02           ` Juergen Gross
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2021-06-24 10:46 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu

Hi Juergen,

On 24/06/2021 12:45, Juergen Gross wrote:
> On 24.06.21 12:28, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 24/06/2021 10:41, Juergen Gross wrote:
>>> On 16.06.21 16:43, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> Currently, only Live-Update request can be delayed. In a follow-up,
>>>> we will want to delay more requests (e.g. transaction start).
>>>> Therefore we want to preserve delayed requests across Live-Update.
>>>>
>>>> Delayed requests are just complete "in" buffer. So the code is
>>>> refactored to allow sharing the code to dump "in" buffer.
>>>>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>> ---
>>>>   tools/xenstore/xenstored_core.c | 42 
>>>> +++++++++++++++++++++++++--------
>>>>   1 file changed, 32 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/tools/xenstore/xenstored_core.c 
>>>> b/tools/xenstore/xenstored_core.c
>>>> index 5b7ab7f74013..9eca58682b51 100644
>>>> --- a/tools/xenstore/xenstored_core.c
>>>> +++ b/tools/xenstore/xenstored_core.c
>>>> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
>>>>       return NULL;
>>>>   }
>>>> +static const char *dump_input_buffered_data(FILE *fp,
>>>> +                        conststruct buffered_data *in,
>>>> +                        unsigned int *total_len)
>>>> +{
>>>> +    unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
>>>> +
>>>> +    *total_len += hlen;
>>>> +    if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
>>>> +        return "Dump read data error";
>>>> +    if (!in->inhdr && in->used) {
>>>> +        *total_len += in->used;
>>>> +        if (fp && fwrite(in->buffer,in->used, 1, fp) != 1)
>>>> +            return "Dump read data error";
>>>> +    }
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>>   /* Called twice: first with fp == NULL to get length, then 
> for
>>>> writing data. */
>>>>   const char *dump_state_buffered_data(FILE *fp, const struct 
>>>> connection *c,
>>>>                        struct xs_state_connection *sc)
>>>>   {
>>>>       unsigned int len = 0, used;
>>>> -    struct buffered_data *out, *in = c->in;
>>>> +    struct buffered_data *out;
>>>>       bool partial = true;
>>>> +    struct delayed_request *req;
>>>> +    const char *ret;
>>>> -    if (in) {
>>>> -        len = in->inhdr ? in->used: sizeof(in->hdr);
>>>> -        if (fp && fwrite(&in->hdr, len, 1, fp) != 1)
>>>> -            return "Dump read data error";
>>>> -        if (!in->inhdr && in->used) {
>>>> -            len += in->used;
>>>> -            if (fp && fwrite(in->buffer, in->used, 1, fp) != 1)
>>>> -                return "Dump read data error";
>>>> -        }
>>>> +    /* Dump any command that was delayed */
>>>> +    list_for_each_entry(req, &c->delayed, list) {
>>>
>>> Please add a comment here that the following if() serves especially to
>>> avoid dumping the data for do_lu_start().
>>
>> Would you be happy to give your acked-by/reviewed-by if I add the 
>> following on commit?
>>
>> "
>> We only want to preserve commands that wasn't processed at all. All the 
> 
> 
> s/wasn't/weren't/

I will do.

> 
>> other delayed requests (such as do_lu_start()) must be processed before 
> 
>> Live-Update.
>> "
> 
> With that change I'm fine.

Can I translate that to an acked-by? :)

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-24 10:46         ` Julien Grall
@ 2021-06-24 11:02           ` Juergen Gross
  2021-06-24 11:17             ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2021-06-24 11:02 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 3810 bytes --]

On 24.06.21 12:46, Julien Grall wrote:
> Hi Juergen,
> 
> On 24/06/2021 12:45, Juergen Gross wrote:
>> On 24.06.21 12:28, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 24/06/2021 10:41, Juergen Gross wrote:
>>>> On 16.06.21 16:43, Julien Grall wrote:
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> Currently, only Live-Update request can be delayed. In a follow-up,
>>>>> we will want to delay more requests (e.g. transaction start).
>>>>> Therefore we want to preserve delayed requests across Live-Update.
>>>>>
>>>>> Delayed requests are just complete "in" buffer. So the code is
>>>>> refactored to allow sharing the code to dump "in" buffer.
>>>>>
>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>> ---
>>>>>   tools/xenstore/xenstored_core.c | 42 
>>>>> +++++++++++++++++++++++++--------
>>>>>   1 file changed, 32 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/tools/xenstore/xenstored_core.c 
>>>>> b/tools/xenstore/xenstored_core.c
>>>>> index 5b7ab7f74013..9eca58682b51 100644
>>>>> --- a/tools/xenstore/xenstored_core.c
>>>>> +++ b/tools/xenstore/xenstored_core.c
>>>>> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
>>>>>       return NULL;
>>>>>   }
>>>>> +static const char *dump_input_buffered_data(FILE *fp,
>>>>> +                        conststruct buffered_data *in,
>>>>> +                        unsigned int *total_len)
>>>>> +{
>>>>> +    unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
>>>>> +
>>>>> +    *total_len += hlen;
>>>>> +    if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
>>>>> +        return "Dump read data error";
>>>>> +    if (!in->inhdr && in->used) {
>>>>> +        *total_len += in->used;
>>>>> +        if (fp && fwrite(in->buffer,in->used, 1, fp) != 1)
>>>>> +            
return "Dump read data error";
>>>>> +    }
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>>   /* Called twice: first with fp == NULL to get length, then 
>> for
>>>>> writing data. */
>>>>>   const char *dump_state_buffered_data(FILE *fp, const struct 
>>>>> connection *c,
>>>>>                        struct xs_state_connection *sc)
>>>>>   {
>>>>>       unsigned int len = 0, used;
>>>>> -    struct buffered_data *out, *in = c->in;
>>>>> +    struct buffered_data *out;
>>>>>       bool partial = true;
>>>>> +    struct delayed_request *req;
>>>>> +    const char *ret;
>>>>> -    if (in) {
>>>>> -        len = in->inhdr ? in->used: sizeof(in->hdr);
>>>>> -        if (fp && fwrite(&in->hdr,len, 1, fp) != 1)
>>>>> -            
return "Dump read data error";
>>>>> -        if (!in->inhdr && in->used) {
>>>>> -            
len += in->used;
>>>>> -            
if(fp && fwrite(in->buffer, in->used, 1, fp) != 1)
>>>>> -                return "Dump read data error";
>>>>> -        }
>>>>> +    /* Dump any command that was delayed */
>>>>> +    list_for_each_entry(req, &c->delayed, list) {
>>>>
>>>> Please add a comment here that the following if() serves especially to
>>>> avoid dumping the data for do_lu_start().
>>>
>>> Would you be happy to give your acked-by/reviewed-by if I add the 
>>> following on commit?
>>>
>>> "
>>> We only want to preserve commands that wasn't processed at all. All the 
>>
>>
>> s/wasn't/weren't/
> 
> I will do.
> 
>>
>>> other delayed requests (such as do_lu_start()) must be processed before 
>>
>>> Live-Update.
>>> "
>>
>> With that change I'm fine.
> 
> Can I translate that to an acked-by? :)

Make it a "Reviewed-by:" :-)


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/10] tools/xenstored: Dump delayed requests
  2021-06-24 11:02           ` Juergen Gross
@ 2021-06-24 11:17             ` Julien Grall
  0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2021-06-24 11:17 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: raphning, doebel, Julien Grall, Ian Jackson, Wei Liu



On 24/06/2021 13:02, Juergen Gross wrote:
> On 24.06.21 12:46, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 24/06/2021 12:45, Juergen Gross wrote:
>>> On 24.06.21 12:28, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 24/06/2021 10:41, Juergen Gross wrote:
>>>>> On 16.06.21 16:43, Julien Grall wrote:
>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>
>>>>>> Currently, only Live-Update request can be delayed. In a follow-up,
>>>>>> we will want to delay more requests (e.g. transaction start).
>>>>>> Therefore we want to preserve delayed requests across Live-Update.
>>>>>>
>>>>>> Delayed requests are just complete "in" buffer. So the code is
>>>>>> refactored to allow sharing the code to dump "in" buffer.
>>>>>>
>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>> ---
>>>>>>   tools/xenstore/xenstored_core.c | 42 
>>>>>> +++++++++++++++++++++++++--------
>>>>>>   1 file changed, 32 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/xenstore/xenstored_core.c 
>>>>>> b/tools/xenstore/xenstored_core.c
>>>>>> index 5b7ab7f74013..9eca58682b51 100644
>>>>>> --- a/tools/xenstore/xenstored_core.c
>>>>>> +++ b/tools/xenstore/xenstored_core.c
>>>>>> @@ -2403,25 +2403,47 @@ const char *dump_state_global(FILE *fp)
>>>>>>       return NULL;
>>>>>>   }
>>>>>> +static const char *dump_input_buffered_data(FILE *fp,
>>>>>> +                        conststruct buffered_data *in,
>>>>>> +                        unsigned int *total_len)
>>>>>> +{
>>>>>> +    unsigned int hlen = in->inhdr ? in->used : sizeof(in->hdr);
>>>>>> +
>>>>>> +    *total_len += hlen;
>>>>>> +    if (fp && fwrite(&in->hdr, hlen, 1, fp) != 1)
>>>>>> +        return "Dump read data error";
>>>>>> +    if (!in->inhdr && in->used) {
>>>>>> +        *total_len += in->used;
>>>>>> +        if (fp && fwrite(in->buffer,in->used, 1, fp) != 1)
>>>>>> + 
> return "Dump read data error";
>>>>>> +    }
>>>>>> +
>>>>>> +    return NULL;
>>>>>> +}
>>>>>> +
>>>>>>   /* Called twice: first with fp == NULL to get length, then 
>>> for
>>>>>> writing data. */
>>>>>>   const char *dump_state_buffered_data(FILE *fp, const struct 
>>>>>> connection *c,
>>>>>>                        struct xs_state_connection *sc)
>>>>>>   {
>>>>>>       unsigned int len = 0, used;
>>>>>> -    struct buffered_data *out, *in = c->in;
>>>>>> +    struct buffered_data *out;
>>>>>>       bool partial = true;
>>>>>> +    struct delayed_request *req;
>>>>>> +    const char *ret;
>>>>>> -    if (in) {
>>>>>> -        len = in->inhdr ? in->used: sizeof(in->hdr);
>>>>>> -        if (fp && fwrite(&in->hdr,len, 1, fp) != 1)
>>>>>> - 
> return "Dump read data error";
>>>>>> -        if (!in->inhdr && in->used) {
>>>>>> - 
> len += in->used;
>>>>>> - 
> if(fp && fwrite(in->buffer, in->used, 1, fp) != 1)
>>>>>> -                return "Dump read data error";
>>>>>> -        }
>>>>>> +    /* Dump any command that was delayed */
>>>>>> +    list_for_each_entry(req, &c->delayed, list) {
>>>>>
>>>>> Please add a comment here that the following if() serves especially to
>>>>> avoid dumping the data for do_lu_start().
>>>>
>>>> Would you be happy to give your acked-by/reviewed-by if I add the 
>>>> following on commit?
>>>>
>>>> "
>>>> We only want to preserve commands that wasn't processed at all. All the 
>>>
>>>
>>> s/wasn't/weren't/
>>
>> I will do.
>>
>>>
>>>> other delayed requests (such as do_lu_start()) must be processed before 
>>>
>>>> Live-Update.
>>>> "
>>>
>>> With that change I'm fine.
>>
>> Can I translate that to an acked-by? :)
> 
> Make it a "Reviewed-by:" :-)

Thanks! I have committed it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2021-06-24 11:17 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 14:43 [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall
2021-06-16 14:43 ` [PATCH 01/10] MAINTAINERS: Add myself as reviewers for tools/xenstore Julien Grall
2021-06-16 15:15   ` Juergen Gross
2021-06-16 14:43 ` [PATCH 02/10] tools/xenstored: Introduce lu_get_connection() and use it Julien Grall
2021-06-21  8:21   ` Luca Fancellu
2021-06-24  7:21   ` Juergen Gross
2021-06-16 14:43 ` [PATCH 03/10] tools/xenstore: Don't assume conn->in points to the LU request Julien Grall
2021-06-21  8:55   ` Luca Fancellu
2021-06-24  8:06     ` Julien Grall
2021-06-24  7:32   ` Juergen Gross
2021-06-24  7:34     ` Juergen Gross
2021-06-24  7:56       ` Luca Fancellu
2021-06-24  8:05         ` Juergen Gross
2021-06-16 14:43 ` [PATCH 04/10] tools/xenstored: Limit the number of requests a connection can delay Julien Grall
2021-06-21  9:02   ` Luca Fancellu
2021-06-24  7:35   ` Juergen Gross
2021-06-16 14:43 ` [PATCH 05/10] tools/xenstored: xenstored_core.h should include fcntl.h Julien Grall
2021-06-21  9:03   ` Luca Fancellu
2021-06-24  7:36   ` Juergen Gross
2021-06-16 14:43 ` [PATCH 06/10] tools/xenstored: Introduce a wrapper for conn->funcs->can_{read, write} Julien Grall
2021-06-21  9:10   ` Luca Fancellu
2021-06-24  7:39   ` Juergen Gross
2021-06-16 14:43 ` [PATCH 07/10] tools/xenstored: delay_request: don't assume conn->in == in Julien Grall
2021-06-21  9:12   ` Luca Fancellu
2021-06-24  7:44   ` Juergen Gross
2021-06-24  7:58     ` Julien Grall
2021-06-16 14:43 ` [PATCH 08/10] tools/xenstored: Extend restore code to handle multiple input buffer Julien Grall
2021-06-21  9:21   ` Luca Fancellu
2021-06-24  8:30   ` Juergen Gross
2021-06-24  8:42     ` Julien Grall
2021-06-24  9:20       ` Juergen Gross
2021-06-16 14:43 ` [PATCH 09/10] tools/xenstored: Dump delayed requests Julien Grall
2021-06-21  9:27   ` Luca Fancellu
2021-06-24  8:41   ` Juergen Gross
2021-06-24 10:28     ` Julien Grall
2021-06-24 10:45       ` Juergen Gross
2021-06-24 10:46         ` Julien Grall
2021-06-24 11:02           ` Juergen Gross
2021-06-24 11:17             ` Julien Grall
2021-06-16 14:43 ` [PATCH 10/10] tools/xenstored: Delay new transaction while Live-Update is pending Julien Grall
2021-06-21  9:30   ` Luca Fancellu
2021-06-24  9:23   ` Juergen Gross
2021-06-24 10:43 ` [PATCH 00/10] tools/xenstored: Bug fixes + Improve Live-Update Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.