All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore
@ 2021-10-20 14:45 Julien Grall
  2021-10-20 15:45 ` Ian Jackson
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Julien Grall @ 2021-10-20 14:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien, Julien Grall, Ian Jackson, Wei Liu, Juergen Gross

From: Julien Grall <jgrall@amazon.com>

Commit 939775cfd3 "handle dying domains in live update" was meant to
handle gracefully dying domain. However, the @releaseDomain watch
will end up to be sent as soon as we finished to restore Xenstored
state.

This may be before Xen reports the domain to be dying (such as if
the guest decided to revoke access to the xenstore page). Consequently
daemon like xenconsoled will not clean-up the domain and it will be
left as a zombie.

To avoid the problem, mark the connection as ignored. This also
requires to tweak conn_can_write() and conn_can_read() to prevent
dereferencing a NULL pointer (the interface will not mapped).

The check conn->is_ignored was originally added after the callbacks
because the helpers for a socket connection may close the fd. However,
ignore_connection() will close a socket connection directly. So it is
fine to do the re-order.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

This issue was originally found when developping commit 939775cfd3.
Unfortunately, due to some miscommunication the wrong patch was sent
upstream. The approach is re-using the one we discussed back then.

This was tested by modifying Linux to revoke the Xenstore grant during
boot. Without this patch, the domain will be left after Live-Update. Note
that on a basic setup (only xenconsoled and xl watch @releaseDomain),
the domain may be cleaned on the next domain shutdown/start.

Xenstore Live-Update is so far a tech preview feature. But I would still
like to request this patch to be included in 4.16 as this was meant to
be part of the original work.
---
 tools/xenstore/xenstored_core.c   |  8 ++++----
 tools/xenstore/xenstored_core.h   |  1 +
 tools/xenstore/xenstored_domain.c | 21 ++++++++++++---------
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 0d4c73d6e20c..91d093a12ea6 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -338,10 +338,10 @@ static int destroy_conn(void *_conn)
 
 static bool conn_can_read(struct connection *conn)
 {
-	if (!conn->funcs->can_read(conn))
+	if (conn->is_ignored)
 		return false;
 
-	if (conn->is_ignored)
+	if (!conn->funcs->can_read(conn))
 		return false;
 
 	/*
@@ -356,7 +356,7 @@ static bool conn_can_read(struct connection *conn)
 
 static bool conn_can_write(struct connection *conn)
 {
-	return conn->funcs->can_write(conn) && !conn->is_ignored;
+	return !conn->is_ignored && conn->funcs->can_write(conn);
 }
 
 /* This function returns index inside the array if succeed, -1 if fail */
@@ -1466,7 +1466,7 @@ static struct {
  *
  * All watches, transactions, buffers will be freed.
  */
-static void ignore_connection(struct connection *conn)
+void ignore_connection(struct connection *conn)
 {
 	struct buffered_data *out, *tmp;
 
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 258f6ff38279..07d861d92499 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -206,6 +206,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
 
 struct connection *new_connection(const struct interface_funcs *funcs);
 struct connection *get_connection_by_id(unsigned int conn_id);
+void ignore_connection(struct connection *conn);
 void check_store(void);
 void corrupt(struct connection *conn, const char *fmt, ...);
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 47e9107c144e..d03c7d93a9e7 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -268,14 +268,7 @@ void check_domains(void)
 				domain->shutdown = true;
 				notify = 1;
 			}
-			/*
-			 * On Restore, we may have been unable to remap the
-			 * interface and the port. As we don't know whether
-			 * this was because of a dying domain, we need to
-			 * check if the interface and port are still valid.
-			 */
-			if (!dominfo.dying && domain->port &&
-			    domain->interface)
+			if (!dominfo.dying)
 				continue;
 		}
 		if (domain->conn) {
@@ -1303,6 +1296,17 @@ void read_state_connection(const void *ctx, const void *state)
 		if (!domain)
 			barf("domain allocation error");
 
+		conn = domain->conn;
+
+		/*
+		 * We may not have been able to restore the domain (for
+		 * instance because it revoked the Xenstore grant). We need
+		 * to keep it around to send @releaseDomain when it is
+		 * dead. So mark it as ignored.
+		 */
+		if (!domain->port || !domain->interface)
+			ignore_connection(conn);
+
 		if (sc->spec.ring.tdomid != DOMID_INVALID) {
 			tdomain = find_or_alloc_domain(ctx,
 						       sc->spec.ring.tdomid);
@@ -1311,7 +1315,6 @@ void read_state_connection(const void *ctx, const void *state)
 			talloc_reference(domain->conn, tdomain->conn);
 			domain->conn->target = tdomain->conn;
 		}
-		conn = domain->conn;
 	}
 
 	conn->conn_id = sc->conn_id;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore
  2021-10-20 14:45 [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore Julien Grall
@ 2021-10-20 15:45 ` Ian Jackson
  2021-10-21  6:59 ` Juergen Gross
  2021-10-21  7:07 ` Luca Fancellu
  2 siblings, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2021-10-20 15:45 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Julien Grall, Ian Jackson, Wei Liu, Juergen Gross

Julien Grall writes ("[PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore"):
> From: Julien Grall <jgrall@amazon.com>
> 
> Commit 939775cfd3 "handle dying domains in live update" was meant to
> handle gracefully dying domain. However, the @releaseDomain watch
> will end up to be sent as soon as we finished to restore Xenstored
> state.
> 
> This may be before Xen reports the domain to be dying (such as if
> the guest decided to revoke access to the xenstore page). Consequently
> daemon like xenconsoled will not clean-up the domain and it will be
> left as a zombie.
> 
> To avoid the problem, mark the connection as ignored. This also
> requires to tweak conn_can_write() and conn_can_read() to prevent
> dereferencing a NULL pointer (the interface will not mapped).
> 
> The check conn->is_ignored was originally added after the callbacks
> because the helpers for a socket connection may close the fd. However,
> ignore_connection() will close a socket connection directly. So it is
> fine to do the re-order.

Release-Acked-by: Ian Jackson <iwj@xenproject.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore
  2021-10-20 14:45 [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore Julien Grall
  2021-10-20 15:45 ` Ian Jackson
@ 2021-10-21  6:59 ` Juergen Gross
  2021-10-21  7:07 ` Luca Fancellu
  2 siblings, 0 replies; 5+ messages in thread
From: Juergen Gross @ 2021-10-21  6:59 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Julien Grall, Ian Jackson, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 1117 bytes --]

On 20.10.21 16:45, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Commit 939775cfd3 "handle dying domains in live update" was meant to
> handle gracefully dying domain. However, the @releaseDomain watch
> will end up to be sent as soon as we finished to restore Xenstored
> state.
> 
> This may be before Xen reports the domain to be dying (such as if
> the guest decided to revoke access to the xenstore page). Consequently
> daemon like xenconsoled will not clean-up the domain and it will be
> left as a zombie.
> 
> To avoid the problem, mark the connection as ignored. This also
> requires to tweak conn_can_write() and conn_can_read() to prevent
> dereferencing a NULL pointer (the interface will not mapped).
> 
> The check conn->is_ignored was originally added after the callbacks
> because the helpers for a socket connection may close the fd. However,
> ignore_connection() will close a socket connection directly. So it is
> fine to do the re-order.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore
  2021-10-20 14:45 [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore Julien Grall
  2021-10-20 15:45 ` Ian Jackson
  2021-10-21  6:59 ` Juergen Gross
@ 2021-10-21  7:07 ` Luca Fancellu
  2021-10-21 10:52   ` Ian Jackson
  2 siblings, 1 reply; 5+ messages in thread
From: Luca Fancellu @ 2021-10-21  7:07 UTC (permalink / raw)
  To: Julien Grall; +Cc: Xen-devel, Julien Grall, Ian Jackson, Wei Liu, Juergen Gross



> On 20 Oct 2021, at 15:45, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Commit 939775cfd3 "handle dying domains in live update" was meant to
> handle gracefully dying domain. However, the @releaseDomain watch
> will end up to be sent as soon as we finished to restore Xenstored
> state.
> 
> This may be before Xen reports the domain to be dying (such as if
> the guest decided to revoke access to the xenstore page). Consequently
> daemon like xenconsoled will not clean-up the domain and it will be
> left as a zombie.
> 
> To avoid the problem, mark the connection as ignored. This also
> requires to tweak conn_can_write() and conn_can_read() to prevent
> dereferencing a NULL pointer (the interface will not mapped).
> 
> The check conn->is_ignored was originally added after the callbacks
> because the helpers for a socket connection may close the fd. However,
> ignore_connection() will close a socket connection directly. So it is
> fine to do the re-order.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> 
> ---
> 
> This issue was originally found when developping commit 939775cfd3.
> Unfortunately, due to some miscommunication the wrong patch was sent
> upstream. The approach is re-using the one we discussed back then.
> 
> This was tested by modifying Linux to revoke the Xenstore grant during
> boot. Without this patch, the domain will be left after Live-Update. Note
> that on a basic setup (only xenconsoled and xl watch @releaseDomain),
> the domain may be cleaned on the next domain shutdown/start.
> 
> Xenstore Live-Update is so far a tech preview feature. But I would still
> like to request this patch to be included in 4.16 as this was meant to
> be part of the original work.
> ---
> tools/xenstore/xenstored_core.c   |  8 ++++----
> tools/xenstore/xenstored_core.h   |  1 +
> tools/xenstore/xenstored_domain.c | 21 ++++++++++++---------
> 3 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 0d4c73d6e20c..91d093a12ea6 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -338,10 +338,10 @@ static int destroy_conn(void *_conn)
> 
> static bool conn_can_read(struct connection *conn)
> {
> -	if (!conn->funcs->can_read(conn))
> +	if (conn->is_ignored)
> 		return false;
> 
> -	if (conn->is_ignored)
> +	if (!conn->funcs->can_read(conn))
> 		return false;
> 
> 	/*
> @@ -356,7 +356,7 @@ static bool conn_can_read(struct connection *conn)
> 
> static bool conn_can_write(struct connection *conn)
> {
> -	return conn->funcs->can_write(conn) && !conn->is_ignored;
> +	return !conn->is_ignored && conn->funcs->can_write(conn);
> }
> 
> /* This function returns index inside the array if succeed, -1 if fail */
> @@ -1466,7 +1466,7 @@ static struct {
>  *
>  * All watches, transactions, buffers will be freed.
>  */
> -static void ignore_connection(struct connection *conn)
> +void ignore_connection(struct connection *conn)
> {
> 	struct buffered_data *out, *tmp;
> 
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 258f6ff38279..07d861d92499 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -206,6 +206,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
> 
> struct connection *new_connection(const struct interface_funcs *funcs);
> struct connection *get_connection_by_id(unsigned int conn_id);
> +void ignore_connection(struct connection *conn);
> void check_store(void);
> void corrupt(struct connection *conn, const char *fmt, ...);
> 
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index 47e9107c144e..d03c7d93a9e7 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -268,14 +268,7 @@ void check_domains(void)
> 				domain->shutdown = true;
> 				notify = 1;
> 			}
> -			/*
> -			 * On Restore, we may have been unable to remap the
> -			 * interface and the port. As we don't know whether
> -			 * this was because of a dying domain, we need to
> -			 * check if the interface and port are still valid.
> -			 */
> -			if (!dominfo.dying && domain->port &&
> -			    domain->interface)
> +			if (!dominfo.dying)
> 				continue;
> 		}
> 		if (domain->conn) {
> @@ -1303,6 +1296,17 @@ void read_state_connection(const void *ctx, const void *state)
> 		if (!domain)
> 			barf("domain allocation error");
> 
> +		conn = domain->conn;
> +
> +		/*
> +		 * We may not have been able to restore the domain (for
> +		 * instance because it revoked the Xenstore grant). We need
> +		 * to keep it around to send @releaseDomain when it is
> +		 * dead. So mark it as ignored.
> +		 */
> +		if (!domain->port || !domain->interface)
> +			ignore_connection(conn);
> +
> 		if (sc->spec.ring.tdomid != DOMID_INVALID) {
> 			tdomain = find_or_alloc_domain(ctx,
> 						       sc->spec.ring.tdomid);
> @@ -1311,7 +1315,6 @@ void read_state_connection(const void *ctx, const void *state)
> 			talloc_reference(domain->conn, tdomain->conn);
> 			domain->conn->target = tdomain->conn;
> 		}
> -		conn = domain->conn;
> 	}
> 
> 	conn->conn_id = sc->conn_id;
> -- 
> 2.32.0
> 
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore
  2021-10-21  7:07 ` Luca Fancellu
@ 2021-10-21 10:52   ` Ian Jackson
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2021-10-21 10:52 UTC (permalink / raw)
  To: Luca Fancellu
  Cc: Julien Grall, Xen-devel, Julien Grall, Ian Jackson, Wei Liu,
	Juergen Gross

Thanks everyone, committed.

Ian.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-21 10:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 14:45 [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore Julien Grall
2021-10-20 15:45 ` Ian Jackson
2021-10-21  6:59 ` Juergen Gross
2021-10-21  7:07 ` Luca Fancellu
2021-10-21 10:52   ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.