All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Oxenstored live update fixes
@ 2022-11-22 15:20 Andrew Cooper
  2022-11-22 15:20 ` [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement Andrew Cooper
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Christian Lindig, David Scott, Edwin Torok, Rob Hoes

First set of Ocaml fixes, cleaned up from series posted previously.

Patch 1 fixes a logical error introduced in the xenstore live update support.

Patches 2-5 fix oxenstored to keep /dev/xen/evtchn open across live update.

Patches 6-8 fix various issues with diagnostics.

All previous feedback applied, but I've started this series again to avoid
confusion.

Andrew Cooper (1):
  tools/oxenstored: Fix incorrect scope after an if statement

Edwin Török (7):
  tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak
  tools/ocaml/evtchn: Add binding for xenevtchn_fdopen()
  tools/ocaml/evtchn: Extend the init() binding with a cloexec flag
  tools/oxenstored: Keep /dev/xen/evtchn open across live update
  tools/oxenstored: Log live update issues at warning level
  tools/oxenstored: Set uncaught exception handler
  tools/oxenstored/syslog: Avoid potential NULL dereference

 tools/ocaml/libs/eventchn/xeneventchn.ml      |  6 +-
 tools/ocaml/libs/eventchn/xeneventchn.mli     | 13 +++-
 tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 50 ++++++++++++--
 tools/ocaml/xenstored/domain.ml               |  6 +-
 tools/ocaml/xenstored/domains.ml              | 14 ++--
 tools/ocaml/xenstored/event.ml                |  8 ++-
 tools/ocaml/xenstored/logging.ml              | 29 +++++++++
 tools/ocaml/xenstored/syslog_stubs.c          |  7 +-
 tools/ocaml/xenstored/xenstored.ml            | 94 ++++++++++++++++++---------
 9 files changed, 179 insertions(+), 48 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-23 14:50   ` Andrew Cooper
  2022-11-22 15:20 ` [PATCH 2/8] tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak Andrew Cooper
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Christian Lindig, David Scott, Edwin Torok, Rob Hoes

A debug statement got inserted into a single-expression if statement.

Insert brackets to give the intended meaning, rather than the actual meaning
where the "let con = Connections..." is outside and executed unconditionally.

This results in some unnecessary ring checks for domains which otherwise have
IO credit.

Fixes: 42f0581a91d4 ("tools/oxenstored: Implement live update for socket connections")
Reported-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>
---
 tools/ocaml/xenstored/xenstored.ml | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
index ffd43a4eee64..c5dc7a28d082 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -475,7 +475,7 @@ let _ =
 
 	let ring_scan_checker dom =
 		(* no need to scan domains already marked as for processing *)
-		if not (Domain.get_io_credit dom > 0) then
+		if not (Domain.get_io_credit dom > 0) then (
 			debug "Looking up domid %d" (Domain.get_id dom);
 			let con = Connections.find_domain cons (Domain.get_id dom) in
 			if not (Connection.has_more_work con) then (
@@ -490,7 +490,8 @@ let _ =
 					let n = 32 + 2 * (Domains.number domains) in
 					info "found lazy domain %d, credit %d" (Domain.get_id dom) n;
 					Domain.set_io_credit ~n dom
-			) in
+			)
+		) in
 
 	let last_stat_time = ref 0. in
 	let last_scan_time = ref 0. in
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/8] tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
  2022-11-22 15:20 ` [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-22 15:20 ` [PATCH 3/8] tools/ocaml/evtchn: Add binding for xenevtchn_fdopen() Andrew Cooper
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Edwin Török, Andrew Cooper, Christian Lindig,
	David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

There is no binding for xenevtchn_close().  In principle, this is a resource
leak, but the typical usage is as a singleton that lives for the lifetime of
the program.

Ocaml 5 no longer permits storing a naked C pointer in an Ocaml value.

Therefore, use a Custom block.  This allows us to use the finaliser callback
to call xenevtchn_close(), if the Ocaml object goes out of scope.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>
---
 tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
index f889a7a2e4a1..37f1cc4e1478 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
@@ -33,7 +33,22 @@
 #include <caml/fail.h>
 #include <caml/signals.h>
 
-#define _H(__h) ((xenevtchn_handle *)(__h))
+#define _H(__h) (*((xenevtchn_handle **)Data_custom_val(__h)))
+
+static void stub_evtchn_finalize(value v)
+{
+	xenevtchn_close(_H(v));
+}
+
+static struct custom_operations xenevtchn_ops = {
+	.identifier  = "xenevtchn",
+	.finalize    = stub_evtchn_finalize,
+	.compare     = custom_compare_default,     /* Can't compare     */
+	.hash        = custom_hash_default,        /* Can't hash        */
+	.serialize   = custom_serialize_default,   /* Can't serialize   */
+	.deserialize = custom_deserialize_default, /* Can't deserialize */
+	.compare_ext = custom_compare_ext_default, /* Can't compare     */
+};
 
 CAMLprim value stub_eventchn_init(void)
 {
@@ -48,7 +63,9 @@ CAMLprim value stub_eventchn_init(void)
 	if (xce == NULL)
 		caml_failwith("open failed");
 
-	result = (value)xce;
+	result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
+	_H(result) = xce;
+
 	CAMLreturn(result);
 }
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/8] tools/ocaml/evtchn: Add binding for xenevtchn_fdopen()
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
  2022-11-22 15:20 ` [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement Andrew Cooper
  2022-11-22 15:20 ` [PATCH 2/8] tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-22 15:20 ` [PATCH 4/8] tools/ocaml/evtchn: Extend the init() binding with a cloexec flag Andrew Cooper
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Edwin Török, Andrew Cooper, Christian Lindig,
	David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

For live update, the new oxenstored needs to reconstruct an evtchn object
around an existing file descriptor.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>

Split out of combined patch
---
 tools/ocaml/libs/eventchn/xeneventchn.ml      |  1 +
 tools/ocaml/libs/eventchn/xeneventchn.mli     |  4 ++++
 tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 19 +++++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/tools/ocaml/libs/eventchn/xeneventchn.ml b/tools/ocaml/libs/eventchn/xeneventchn.ml
index dd00a1f0ead5..be4de82f46b9 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn.ml
+++ b/tools/ocaml/libs/eventchn/xeneventchn.ml
@@ -17,6 +17,7 @@
 type handle
 
 external init: unit -> handle = "stub_eventchn_init"
+external fdopen: Unix.file_descr -> handle = "stub_eventchn_fdopen"
 external fd: handle -> Unix.file_descr = "stub_eventchn_fd"
 
 type t = int
diff --git a/tools/ocaml/libs/eventchn/xeneventchn.mli b/tools/ocaml/libs/eventchn/xeneventchn.mli
index 08c73376438e..98b3c86f3702 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn.mli
+++ b/tools/ocaml/libs/eventchn/xeneventchn.mli
@@ -47,6 +47,10 @@ val init: unit -> handle
 (** Return an initialised event channel interface. On error it
     will throw a Failure exception. *)
 
+val fdopen: Unix.file_descr -> handle
+(** Return an initialised event channel interface, from an already open evtchn
+    file descriptor.  On error it will throw a Failure exception. *)
+
 val fd: handle -> Unix.file_descr
 (** Return a file descriptor suitable for Unix.select. When
     the descriptor becomes readable, it is safe to call 'pending'.
diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
index 37f1cc4e1478..7bdf711bc150 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
@@ -69,6 +69,25 @@ CAMLprim value stub_eventchn_init(void)
 	CAMLreturn(result);
 }
 
+CAMLprim value stub_eventchn_fdopen(value fdval)
+{
+	CAMLparam1(fdval);
+	CAMLlocal1(result);
+	xenevtchn_handle *xce;
+
+	caml_enter_blocking_section();
+	xce = xenevtchn_fdopen(NULL, Int_val(fdval), 0);
+	caml_leave_blocking_section();
+
+	if (xce == NULL)
+		caml_failwith("evtchn fdopen failed");
+
+	result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
+	_H(result) = xce;
+
+	CAMLreturn(result);
+}
+
 CAMLprim value stub_eventchn_fd(value xce)
 {
 	CAMLparam1(xce);
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/8] tools/ocaml/evtchn: Extend the init() binding with a cloexec flag
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
                   ` (2 preceding siblings ...)
  2022-11-22 15:20 ` [PATCH 3/8] tools/ocaml/evtchn: Add binding for xenevtchn_fdopen() Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-22 15:20 ` [PATCH 5/8] tools/oxenstored: Keep /dev/xen/evtchn open across live update Andrew Cooper
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Edwin Török, Andrew Cooper, Christian Lindig,
	David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

For live update, oxenstored wants to clear CLOEXEC on the evtchn handle, so it
survives the execve() into the new oxenstored.

Have the new interface match how cloexec works in other Ocaml standard
libraries.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>

Split out of combined patch
---
 tools/ocaml/libs/eventchn/xeneventchn.ml      |  5 ++++-
 tools/ocaml/libs/eventchn/xeneventchn.mli     |  9 ++++++---
 tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 10 +++++++---
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/tools/ocaml/libs/eventchn/xeneventchn.ml b/tools/ocaml/libs/eventchn/xeneventchn.ml
index be4de82f46b9..c16fdd4674f7 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn.ml
+++ b/tools/ocaml/libs/eventchn/xeneventchn.ml
@@ -16,7 +16,10 @@
 
 type handle
 
-external init: unit -> handle = "stub_eventchn_init"
+external _init: bool -> handle = "stub_eventchn_init"
+
+let init ?(cloexec=true) () = _init cloexec
+
 external fdopen: Unix.file_descr -> handle = "stub_eventchn_fdopen"
 external fd: handle -> Unix.file_descr = "stub_eventchn_fd"
 
diff --git a/tools/ocaml/libs/eventchn/xeneventchn.mli b/tools/ocaml/libs/eventchn/xeneventchn.mli
index 98b3c86f3702..870429b6b53a 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn.mli
+++ b/tools/ocaml/libs/eventchn/xeneventchn.mli
@@ -43,9 +43,12 @@ val to_int: t -> int
 
 val of_int: int -> t
 
-val init: unit -> handle
-(** Return an initialised event channel interface. On error it
-    will throw a Failure exception. *)
+val init: ?cloexec:bool -> unit -> handle
+(** [init ?cloexec ()]
+    Return an initialised event channel interface.
+    The default is to close the underlying file descriptor
+    on [execve], which can be overriden with [~cloexec:false].
+    On error it will throw a Failure exception. *)
 
 val fdopen: Unix.file_descr -> handle
 (** Return an initialised event channel interface, from an already open evtchn
diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
index 7bdf711bc150..aa8a69cc1ecb 100644
--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
@@ -50,14 +50,18 @@ static struct custom_operations xenevtchn_ops = {
 	.compare_ext = custom_compare_ext_default, /* Can't compare     */
 };
 
-CAMLprim value stub_eventchn_init(void)
+CAMLprim value stub_eventchn_init(value cloexec)
 {
-	CAMLparam0();
+	CAMLparam1(cloexec);
 	CAMLlocal1(result);
 	xenevtchn_handle *xce;
+	unsigned int flags = 0;
+
+	if ( !Bool_val(cloexec) )
+		flags |= XENEVTCHN_NO_CLOEXEC;
 
 	caml_enter_blocking_section();
-	xce = xenevtchn_open(NULL, 0);
+	xce = xenevtchn_open(NULL, flags);
 	caml_leave_blocking_section();
 
 	if (xce == NULL)
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/8] tools/oxenstored: Keep /dev/xen/evtchn open across live update
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
                   ` (3 preceding siblings ...)
  2022-11-22 15:20 ` [PATCH 4/8] tools/ocaml/evtchn: Extend the init() binding with a cloexec flag Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-23 10:59   ` Christian Lindig
  2022-11-22 15:20 ` [PATCH 6/8] tools/oxenstored: Log live update issues at warning level Andrew Cooper
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Edwin Török, Andrew Cooper, Christian Lindig,
	David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

Closing the evtchn handle will unbind and free all local ports.  The new
xenstored would need to rebind all evtchns, which is work that we don't want
or need to be doing during the critical handover period.

However, it turns out that the Windows PV drivers also rebind their local port
too across suspend/resume, leaving (o)xenstored with a stale idea of the
remote port to use.  In this case, reusing the established connection is the
only robust option.

Therefore:
 * Have oxenstored open /dev/xen/evtchn without CLOEXEC at start of day
 * Extend the handover information with the evtchn fd, and the local port
   number for each domain connection.
 * Have (the new) oxenstored recover the open handle using Xeneventchn.fdopen,
   and use the provided local ports rather than trying to rebind them.

When this new information isn't present (i.e. live updating from an oxenstored
prior to this change), the best-effort status quo will have to do.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>

Merge two patches to retain bisectability.  Drop changes to the evtchn bindings.
---
 tools/ocaml/xenstored/domain.ml    |  6 ++-
 tools/ocaml/xenstored/domains.ml   | 14 +++++--
 tools/ocaml/xenstored/event.ml     |  8 +++-
 tools/ocaml/xenstored/xenstored.ml | 82 ++++++++++++++++++++++++++------------
 4 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
index 81cb59b8f1a2..527035ffdd32 100644
--- a/tools/ocaml/xenstored/domain.ml
+++ b/tools/ocaml/xenstored/domain.ml
@@ -61,7 +61,7 @@ let string_of_port = function
 | Some x -> string_of_int (Xeneventchn.to_int x)
 
 let dump d chan =
-	fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
+	fprintf chan "dom,%d,%nd,%d,%s\n" d.id d.mfn d.remote_port (string_of_port d.port)
 
 let notify dom = match dom.port with
 | None ->
@@ -77,6 +77,10 @@ let bind_interdomain dom =
 	dom.port <- Some (Event.bind_interdomain dom.eventchn dom.id dom.remote_port);
 	debug "bound domain %d remote port %d to local port %s" dom.id dom.remote_port (string_of_port dom.port)
 
+let restore_interdomain dom localport =
+	assert (dom.port = None);
+	dom.port <- Some (Xeneventchn.of_int localport);
+	debug "restored interdomain %d remote port %d to local port %s" dom.id dom.remote_port (string_of_port dom.port)
 
 let close dom =
 	debug "domain %d unbound port %s" dom.id (string_of_port dom.port);
diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
index 17fe2fa25772..a91d2afd2a82 100644
--- a/tools/ocaml/xenstored/domains.ml
+++ b/tools/ocaml/xenstored/domains.ml
@@ -56,6 +56,7 @@ let exist doms id = Hashtbl.mem doms.table id
 let find doms id = Hashtbl.find doms.table id
 let number doms = Hashtbl.length doms.table
 let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
+let eventchn doms = doms.eventchn
 
 let rec is_empty_queue q =
 	Queue.is_empty q ||
@@ -122,17 +123,22 @@ let cleanup doms =
 let resume _doms _domid =
 	()
 
-let create doms domid mfn port =
+let maybe_bind_interdomain restore_localport dom =
+	match restore_localport with
+	| None -> Domain.bind_interdomain dom
+	| Some p -> Domain.restore_interdomain dom p
+
+let create doms domid mfn ?restore_localport port =
 	let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
 	let dom = Domain.make domid mfn port interface doms.eventchn in
 	Hashtbl.add doms.table domid dom;
-	Domain.bind_interdomain dom;
+	maybe_bind_interdomain restore_localport dom;
 	dom
 
 let xenstored_kva = ref ""
 let xenstored_port = ref ""
 
-let create0 doms =
+let create0 ?restore_localport doms =
 	let port, interface =
 		(
 			let port = Utils.read_file_single_integer !xenstored_port
@@ -146,7 +152,7 @@ let create0 doms =
 		in
 	let dom = Domain.make 0 Nativeint.zero port interface doms.eventchn in
 	Hashtbl.add doms.table 0 dom;
-	Domain.bind_interdomain dom;
+	maybe_bind_interdomain restore_localport dom;
 	Domain.notify dom;
 	dom
 
diff --git a/tools/ocaml/xenstored/event.ml b/tools/ocaml/xenstored/event.ml
index ccca90b6fc4f..0159daac91f4 100644
--- a/tools/ocaml/xenstored/event.ml
+++ b/tools/ocaml/xenstored/event.ml
@@ -20,7 +20,13 @@ type t = {
 	mutable virq_port: Xeneventchn.t option;
 }
 
-let init () = { handle = Xeneventchn.init (); virq_port = None; }
+let init ?fd () =
+	let handle = match fd with
+		| None -> Xeneventchn.init ~cloexec:false ()
+		| Some fd -> Xeneventchn.fdopen fd
+	in
+	{ handle; virq_port = None }
+
 let fd eventchn = Xeneventchn.fd eventchn.handle
 let bind_dom_exc_virq eventchn = eventchn.virq_port <- Some (Xeneventchn.bind_dom_exc_virq eventchn.handle)
 let bind_interdomain eventchn domid port = Xeneventchn.bind_interdomain eventchn.handle domid port
diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
index c5dc7a28d082..6ceab02dee1e 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -144,7 +144,7 @@ exception Bad_format of string
 
 let dump_format_header = "$xenstored-dump-format"
 
-let from_channel_f chan global_f socket_f domain_f watch_f store_f =
+let from_channel_f chan global_f event_f socket_f domain_f watch_f store_f =
 	let unhexify s = Utils.unhexify s in
 	let getpath s =
 		let u = Utils.unhexify s in
@@ -165,12 +165,17 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
 					(* there might be more parameters here,
 					   e.g. a RO socket from a previous version: ignore it *)
 					global_f ~rw
+				| "eventfd" :: eventfd :: [] ->
+					event_f ~eventfd
 				| "socket" :: fd :: [] ->
 					socket_f ~fd:(int_of_string fd)
-				| "dom" :: domid :: mfn :: port :: []->
+				| "dom" :: domid :: mfn :: port :: rest ->
 					domain_f (int_of_string domid)
 					         (Nativeint.of_string mfn)
 					         (int_of_string port)
+						 (match rest with
+						  | [] -> None (* backward compat: old version didn't have it *)
+						  | localport :: _ -> Some (int_of_string localport))
 				| "watch" :: domid :: path :: token :: [] ->
 					watch_f (int_of_string domid)
 					        (unhexify path) (unhexify token)
@@ -189,10 +194,27 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
 	done;
 	info "Completed loading xenstore dump"
 
-let from_channel store cons doms chan =
+let from_channel store cons createdoms chan =
 	(* don't let the permission get on our way, full perm ! *)
 	let op = Store.get_ops store Perms.Connection.full_rights in
 	let rwro = ref (None) in
+	let eventchnfd = ref (None) in
+	let doms = ref (None) in
+
+	let require_doms () =
+		match !doms with
+		| None ->
+			let missing_eventchnfd = !eventchnfd = None in
+			if missing_eventchnfd then
+				warn "No event channel file descriptor available in dump!";
+			let eventchn = Event.init ?fd:!eventchnfd () in
+			let domains = createdoms eventchn in
+			if missing_eventchnfd then
+				Event.bind_dom_exc_virq eventchn;
+			doms := Some domains;
+			domains
+		| Some d -> d
+	in
 	let global_f ~rw =
 		let get_listen_sock sockfd =
 			let fd = sockfd |> int_of_string |> Utils.FD.of_int in
@@ -201,6 +223,10 @@ let from_channel store cons doms chan =
 		in
 		rwro := get_listen_sock rw
 	in
+	let event_f ~eventfd =
+		let fd = eventfd |> int_of_string |> Utils.FD.of_int in
+		eventchnfd := Some fd
+	in
 	let socket_f ~fd =
 		let ufd = Utils.FD.of_int fd in
 		let is_valid = try (Unix.fstat ufd).Unix.st_kind = Unix.S_SOCK with _ -> false in
@@ -209,12 +235,13 @@ let from_channel store cons doms chan =
 		else
 			warn "Ignoring invalid socket FD %d" fd
 	in
-	let domain_f domid mfn port =
+	let domain_f domid mfn port restore_localport =
+		let doms = require_doms () in
 		let ndom =
 			if domid > 0 then
-				Domains.create doms domid mfn port
+				Domains.create doms domid mfn ?restore_localport port
 			else
-				Domains.create0 doms
+				Domains.create0 ?restore_localport doms
 			in
 		Connections.add_domain cons ndom;
 		in
@@ -229,8 +256,8 @@ let from_channel store cons doms chan =
 		op.Store.write path value;
 		op.Store.setperms path perms
 		in
-	from_channel_f chan global_f socket_f domain_f watch_f store_f;
-	!rwro
+	from_channel_f chan global_f event_f socket_f domain_f watch_f store_f;
+	!rwro, require_doms ()
 
 let from_file store cons doms file =
 	info "Loading xenstore dump from %s" file;
@@ -238,7 +265,7 @@ let from_file store cons doms file =
 	finally (fun () -> from_channel store doms cons channel)
 	        (fun () -> close_in channel)
 
-let to_channel store cons rw chan =
+let to_channel store cons (rw, eventchn) chan =
 	let hexify s = Utils.hexify s in
 
 	fprintf chan "%s\n" dump_format_header;
@@ -247,6 +274,7 @@ let to_channel store cons rw chan =
 		Unix.clear_close_on_exec fd;
 		Utils.FD.to_int fd in
 	fprintf chan "global,%d\n" (fdopt rw);
+	fprintf chan "eventchnfd,%d\n" (Utils.FD.to_int @@ Event.fd eventchn);
 
 	(* dump connections related to domains: domid, mfn, eventchn port/ sockets, and watches *)
 	Connections.iter cons (fun con -> Connection.dump con chan);
@@ -367,7 +395,6 @@ let _ =
 	| None         -> () end;
 
 	let store = Store.create () in
-	let eventchn = Event.init () in
 	let next_frequent_ops = ref 0. in
 	let advance_next_frequent_ops () =
 		next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
@@ -375,16 +402,8 @@ let _ =
 	let delay_next_frequent_ops_by duration =
 		next_frequent_ops := !next_frequent_ops +. duration
 	in
-	let domains = Domains.init eventchn advance_next_frequent_ops in
+	let domains eventchn = Domains.init eventchn advance_next_frequent_ops in
 
-	(* For things that need to be done periodically but more often
-	 * than the periodic_ops function *)
-	let frequent_ops () =
-		if Unix.gettimeofday () > !next_frequent_ops then (
-			History.trim ();
-			Domains.incr_conflict_credit domains;
-			advance_next_frequent_ops ()
-		) in
 	let cons = Connections.create () in
 
 	let quit = ref false in
@@ -393,15 +412,15 @@ let _ =
 	List.iter (fun path ->
 		Store.write store Perms.Connection.full_rights path "") Store.Path.specials;
 
-	let rw_sock =
+	let rw_sock, domains =
 	if cf.restart && Sys.file_exists Disk.xs_daemon_database then (
-		let rwro = DB.from_file store domains cons Disk.xs_daemon_database in
+		let rw, domains = DB.from_file store domains cons Disk.xs_daemon_database in
 		info "Live reload: database loaded";
-		Event.bind_dom_exc_virq eventchn;
 		Process.LiveUpdate.completed ();
-		rwro
+		rw, domains
 	) else (
 		info "No live reload: regular startup";
+		let domains = domains @@ Event.init () in
 		if !Disk.enable then (
 			info "reading store from disk";
 			Disk.read store
@@ -411,13 +430,23 @@ let _ =
 		if not (Store.path_exists store localpath) then
 			Store.mkdir store (Perms.Connection.create 0) localpath;
 
+		let eventchn = Event.init () in
 		if cf.domain_init then (
 			Connections.add_domain cons (Domains.create0 domains);
 			Event.bind_dom_exc_virq eventchn
 		);
-		rw_sock
+		rw_sock, domains
 	) in
 
+	(* For things that need to be done periodically but more often
+	 * than the periodic_ops function *)
+	let frequent_ops () =
+		if Unix.gettimeofday () > !next_frequent_ops then (
+			History.trim ();
+			Domains.incr_conflict_credit domains;
+			advance_next_frequent_ops ()
+		) in
+
 	(* required for xenstore-control to detect availability of live-update *)
 	let tool_path = Store.Path.of_string "/tool" in
 	if not (Store.path_exists store tool_path) then
@@ -433,10 +462,11 @@ let _ =
 	Sys.set_signal Sys.sigpipe Sys.Signal_ignore;
 
 	if cf.activate_access_log then begin
-		let post_rotate () = DB.to_file store cons (None) Disk.xs_daemon_database in
+		let post_rotate () = DB.to_file store cons (None, Domains.eventchn domains) Disk.xs_daemon_database in
 		Logging.init_access_log post_rotate
 	end;
 
+	let eventchn = Domains.eventchn domains in
 	let spec_fds =
 		(match rw_sock with None -> [] | Some x -> [ x ]) @
 		(if cf.domain_init then [ Event.fd eventchn ] else [])
@@ -595,7 +625,7 @@ let _ =
 			live_update := Process.LiveUpdate.should_run cons;
 			if !live_update || !quit then begin
 				(* don't initiate live update if saving state fails *)
-				DB.to_file store cons (rw_sock) Disk.xs_daemon_database;
+				DB.to_file store cons (rw_sock, eventchn) Disk.xs_daemon_database;
 				quit := true;
 			end
 		with exc ->
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/8] tools/oxenstored: Log live update issues at warning level
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
                   ` (4 preceding siblings ...)
  2022-11-22 15:20 ` [PATCH 5/8] tools/oxenstored: Keep /dev/xen/evtchn open across live update Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-22 15:20 ` [PATCH 7/8] tools/oxenstored: Set uncaught exception handler Andrew Cooper
  2022-11-22 15:20 ` [PATCH 8/8] tools/oxenstored/syslog: Avoid potential NULL dereference Andrew Cooper
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel; +Cc: Edwin Török, Christian Lindig, David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

During live update, oxenstored tries a best effort approach to recover as many
domains and information as possible even if it encounters errors restoring
some domains.

However, logging about misunderstood input is more severe than simply info.
Log it at warning instead.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>
---
 tools/ocaml/xenstored/xenstored.ml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
index 6ceab02dee1e..23621bd49397 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -184,9 +184,9 @@ let from_channel_f chan global_f event_f socket_f domain_f watch_f store_f =
 					        (Perms.Node.of_string (unhexify perms ^ "\000"))
 					        (unhexify value)
 				| _ ->
-					info "restoring: ignoring unknown line: %s" line
+					warn "restoring: ignoring unknown line: %s" line
 			with exn ->
-				info "restoring: ignoring unknown line: %s (exception: %s)"
+				warn "restoring: ignoring unknown line: %s (exception: %s)"
 				     line (Printexc.to_string exn);
 				()
 		with End_of_file ->
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/8] tools/oxenstored: Set uncaught exception handler
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
                   ` (5 preceding siblings ...)
  2022-11-22 15:20 ` [PATCH 6/8] tools/oxenstored: Log live update issues at warning level Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  2022-11-22 15:20 ` [PATCH 8/8] tools/oxenstored/syslog: Avoid potential NULL dereference Andrew Cooper
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Edwin Török, Andrew Cooper, Christian Lindig,
	David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

Unhandled exceptions go to stderr by default, but this doesn't typically work
for oxenstored because:
 * daemonize reopens stderr as /dev/null
 * systemd redirects stderr to /dev/null too

Debugging an unhandled exception requires reproducing the issue locally when
using --no-fork, and is not conducive to figuring out what went wrong on a
remote system.

Install a custom handler which also tries to render the backtrace to the
configured syslog facility, and DAEMON|ERR otherwise.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>

Drop print_flush as prerr_endline already flushes.
---
 tools/ocaml/xenstored/logging.ml   | 29 +++++++++++++++++++++++++++++
 tools/ocaml/xenstored/xenstored.ml |  3 ++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/tools/ocaml/xenstored/logging.ml b/tools/ocaml/xenstored/logging.ml
index 39c3036155a2..255051437d60 100644
--- a/tools/ocaml/xenstored/logging.ml
+++ b/tools/ocaml/xenstored/logging.ml
@@ -342,3 +342,32 @@ let xb_answer ~tid ~con ~ty data =
 let watch_not_fired ~con perms path =
 	let data = Printf.sprintf "EPERM perms=[%s] path=%s" perms path in
 	access_logging ~tid:0 ~con ~data Watch_not_fired ~level:Info
+
+let msg_of exn bt =
+	Printf.sprintf "Fatal exception: %s\n%s\n" (Printexc.to_string exn)
+		(Printexc.raw_backtrace_to_string bt)
+
+let fallback_exception_handler exn bt =
+	(* stderr goes to /dev/null, so use the logger where possible,
+	   but always print to stderr too, in case everything else fails,
+	   e.g. this can be used to debug with --no-fork
+
+	   this function should try not to raise exceptions, but if it does
+	   the ocaml runtime should still print the exception, both the original,
+	   and the one from this function, but to stderr this time
+	 *)
+	let msg = msg_of exn bt in
+	prerr_endline msg;
+	(* See Printexc.set_uncaught_exception_handler, need to flush,
+	   so has to call stop and flush *)
+	match !xenstored_logger with
+	| Some l -> error "xenstored-fallback" "%s" msg; l.stop ()
+	| None ->
+		(* Too early, no logger set yet.
+		   We normally try to use the configured logger so we don't flood syslog
+		   during development for example, or if the user has a file set
+		 *)
+		try Syslog.log Syslog.Daemon Syslog.Err msg
+		with e ->
+			let bt = Printexc.get_raw_backtrace () in
+			prerr_endline @@ msg_of e bt
diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
index 23621bd49397..257481285f05 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -357,7 +357,8 @@ let tweak_gc () =
 	Gc.set { (Gc.get ()) with Gc.max_overhead = !Define.gc_max_overhead }
 
 
-let _ =
+let () =
+	Printexc.set_uncaught_exception_handler Logging.fallback_exception_handler;
 	let cf = do_argv in
 	let pidfile =
 		if Sys.file_exists (config_filename cf) then
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 8/8] tools/oxenstored/syslog: Avoid potential NULL dereference
  2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
                   ` (6 preceding siblings ...)
  2022-11-22 15:20 ` [PATCH 7/8] tools/oxenstored: Set uncaught exception handler Andrew Cooper
@ 2022-11-22 15:20 ` Andrew Cooper
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-22 15:20 UTC (permalink / raw)
  To: Xen-devel; +Cc: Edwin Török, Christian Lindig, David Scott, Rob Hoes

From: Edwin Török <edvin.torok@citrix.com>

strdup() may return NULL.  Check for this before passing to syslog().

Drop const from c_msg.  It is bogus, as demonstrated by the need to cast to
void * in order to free the memory.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
CC: Christian Lindig <christian.lindig@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Edwin Torok <edvin.torok@citrix.com>
CC: Rob Hoes <Rob.Hoes@citrix.com>
---
 tools/ocaml/xenstored/syslog_stubs.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/xenstored/syslog_stubs.c b/tools/ocaml/xenstored/syslog_stubs.c
index 875d48ad57eb..e16c3a9491d0 100644
--- a/tools/ocaml/xenstored/syslog_stubs.c
+++ b/tools/ocaml/xenstored/syslog_stubs.c
@@ -14,6 +14,7 @@
 
 #include <syslog.h>
 #include <string.h>
+#include <caml/fail.h>
 #include <caml/mlvalues.h>
 #include <caml/memory.h>
 #include <caml/alloc.h>
@@ -35,14 +36,16 @@ static int __syslog_facility_table[] = {
 value stub_syslog(value facility, value level, value msg)
 {
 	CAMLparam3(facility, level, msg);
-	const char *c_msg = strdup(String_val(msg));
+	char *c_msg = strdup(String_val(msg));
 	int c_facility = __syslog_facility_table[Int_val(facility)]
 	               | __syslog_level_table[Int_val(level)];
 
+	if ( !c_msg )
+		caml_raise_out_of_memory();
 	caml_enter_blocking_section();
 	syslog(c_facility, "%s", c_msg);
 	caml_leave_blocking_section();
 
-	free((void*)c_msg);
+	free(c_msg);
 	CAMLreturn(Val_unit);
 }
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/8] tools/oxenstored: Keep /dev/xen/evtchn open across live update
  2022-11-22 15:20 ` [PATCH 5/8] tools/oxenstored: Keep /dev/xen/evtchn open across live update Andrew Cooper
@ 2022-11-23 10:59   ` Christian Lindig
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Lindig @ 2022-11-23 10:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Edwin Torok, David Scott, Rob Hoes



> On 22 Nov 2022, at 15:20, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> 
> From: Edwin Török <edvin.torok@citrix.com>
> 
> Closing the evtchn handle will unbind and free all local ports.  The new
> xenstored would need to rebind all evtchns, which is work that we don't want
> or need to be doing during the critical handover period.
> 
> However, it turns out that the Windows PV drivers also rebind their local port
> too across suspend/resume, leaving (o)xenstored with a stale idea of the
> remote port to use.  In this case, reusing the established connection is the
> only robust option.
> 
> Therefore:
> * Have oxenstored open /dev/xen/evtchn without CLOEXEC at start of day
> * Extend the handover information with the evtchn fd, and the local port
>   number for each domain connection.
> * Have (the new) oxenstored recover the open handle using Xeneventchn.fdopen,
>   and use the provided local ports rather than trying to rebind them.
> 
> When this new information isn't present (i.e. live updating from an oxenstored
> prior to this change), the best-effort status quo will have to do.
> 
> Signed-off-by: Edwin Török <edvin.torok@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Christian Lindig <christian.lindig@citrix.com>
> CC: David Scott <dave@recoil.org>
> CC: Edwin Torok <edvin.torok@citrix.com>
> CC: Rob Hoes <Rob.Hoes@citrix.com>
> 

Acked-by: Christian Lindig <christian.lindig@citrix.com>

Nothing stands out for me. But this is obviously delicate in support of a delicate feature, which is live update. I commented before that the commend line gets increasingly crowded.

— C


> Merge two patches to retain bisectability.  Drop changes to the evtchn bindings.
> ---
> tools/ocaml/xenstored/domain.ml    |  6 ++-
> tools/ocaml/xenstored/domains.ml   | 14 +++++--
> tools/ocaml/xenstored/event.ml     |  8 +++-
> tools/ocaml/xenstored/xenstored.ml | 82 ++++++++++++++++++++++++++------------
> 4 files changed, 78 insertions(+), 32 deletions(-)
> 
> diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
> index 81cb59b8f1a2..527035ffdd32 100644
> --- a/tools/ocaml/xenstored/domain.ml
> +++ b/tools/ocaml/xenstored/domain.ml
> @@ -61,7 +61,7 @@ let string_of_port = function
> | Some x -> string_of_int (Xeneventchn.to_int x)
> 
> let dump d chan =
> -	fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
> +	fprintf chan "dom,%d,%nd,%d,%s\n" d.id d.mfn d.remote_port (string_of_port d.port)
> 
> let notify dom = match dom.port with
> | None ->
> @@ -77,6 +77,10 @@ let bind_interdomain dom =
> 	dom.port <- Some (Event.bind_interdomain dom.eventchn dom.id dom.remote_port);
> 	debug "bound domain %d remote port %d to local port %s" dom.id dom.remote_port (string_of_port dom.port)
> 
> +let restore_interdomain dom localport =
> +	assert (dom.port = None);
> +	dom.port <- Some (Xeneventchn.of_int localport);
> +	debug "restored interdomain %d remote port %d to local port %s" dom.id dom.remote_port (string_of_port dom.port)
> 
> let close dom =
> 	debug "domain %d unbound port %s" dom.id (string_of_port dom.port);
> diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
> index 17fe2fa25772..a91d2afd2a82 100644
> --- a/tools/ocaml/xenstored/domains.ml
> +++ b/tools/ocaml/xenstored/domains.ml
> @@ -56,6 +56,7 @@ let exist doms id = Hashtbl.mem doms.table id
> let find doms id = Hashtbl.find doms.table id
> let number doms = Hashtbl.length doms.table
> let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
> +let eventchn doms = doms.eventchn
> 
> let rec is_empty_queue q =
> 	Queue.is_empty q ||
> @@ -122,17 +123,22 @@ let cleanup doms =
> let resume _doms _domid =
> 	()
> 
> -let create doms domid mfn port =
> +let maybe_bind_interdomain restore_localport dom =
> +	match restore_localport with
> +	| None -> Domain.bind_interdomain dom
> +	| Some p -> Domain.restore_interdomain dom p
> +
> +let create doms domid mfn ?restore_localport port =
> 	let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
> 	let dom = Domain.make domid mfn port interface doms.eventchn in
> 	Hashtbl.add doms.table domid dom;
> -	Domain.bind_interdomain dom;
> +	maybe_bind_interdomain restore_localport dom;
> 	dom
> 
> let xenstored_kva = ref ""
> let xenstored_port = ref ""
> 
> -let create0 doms =
> +let create0 ?restore_localport doms =
> 	let port, interface =
> 		(
> 			let port = Utils.read_file_single_integer !xenstored_port
> @@ -146,7 +152,7 @@ let create0 doms =
> 		in
> 	let dom = Domain.make 0 Nativeint.zero port interface doms.eventchn in
> 	Hashtbl.add doms.table 0 dom;
> -	Domain.bind_interdomain dom;
> +	maybe_bind_interdomain restore_localport dom;
> 	Domain.notify dom;
> 	dom
> 
> diff --git a/tools/ocaml/xenstored/event.ml b/tools/ocaml/xenstored/event.ml
> index ccca90b6fc4f..0159daac91f4 100644
> --- a/tools/ocaml/xenstored/event.ml
> +++ b/tools/ocaml/xenstored/event.ml
> @@ -20,7 +20,13 @@ type t = {
> 	mutable virq_port: Xeneventchn.t option;
> }
> 
> -let init () = { handle = Xeneventchn.init (); virq_port = None; }
> +let init ?fd () =
> +	let handle = match fd with
> +		| None -> Xeneventchn.init ~cloexec:false ()
> +		| Some fd -> Xeneventchn.fdopen fd
> +	in
> +	{ handle; virq_port = None }
> +
> let fd eventchn = Xeneventchn.fd eventchn.handle
> let bind_dom_exc_virq eventchn = eventchn.virq_port <- Some (Xeneventchn.bind_dom_exc_virq eventchn.handle)
> let bind_interdomain eventchn domid port = Xeneventchn.bind_interdomain eventchn.handle domid port
> diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
> index c5dc7a28d082..6ceab02dee1e 100644
> --- a/tools/ocaml/xenstored/xenstored.ml
> +++ b/tools/ocaml/xenstored/xenstored.ml
> @@ -144,7 +144,7 @@ exception Bad_format of string
> 
> let dump_format_header = "$xenstored-dump-format"
> 
> -let from_channel_f chan global_f socket_f domain_f watch_f store_f =
> +let from_channel_f chan global_f event_f socket_f domain_f watch_f store_f =
> 	let unhexify s = Utils.unhexify s in
> 	let getpath s =
> 		let u = Utils.unhexify s in
> @@ -165,12 +165,17 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
> 					(* there might be more parameters here,
> 					   e.g. a RO socket from a previous version: ignore it *)
> 					global_f ~rw
> +				| "eventfd" :: eventfd :: [] ->
> +					event_f ~eventfd
> 				| "socket" :: fd :: [] ->
> 					socket_f ~fd:(int_of_string fd)
> -				| "dom" :: domid :: mfn :: port :: []->
> +				| "dom" :: domid :: mfn :: port :: rest ->
> 					domain_f (int_of_string domid)
> 					         (Nativeint.of_string mfn)
> 					         (int_of_string port)
> +						 (match rest with
> +						  | [] -> None (* backward compat: old version didn't have it *)
> +						  | localport :: _ -> Some (int_of_string localport))
> 				| "watch" :: domid :: path :: token :: [] ->
> 					watch_f (int_of_string domid)
> 					        (unhexify path) (unhexify token)
> @@ -189,10 +194,27 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
> 	done;
> 	info "Completed loading xenstore dump"
> 
> -let from_channel store cons doms chan =
> +let from_channel store cons createdoms chan =
> 	(* don't let the permission get on our way, full perm ! *)
> 	let op = Store.get_ops store Perms.Connection.full_rights in
> 	let rwro = ref (None) in
> +	let eventchnfd = ref (None) in

No parenthesis required: "ref None” would be enough. But don’t bother - once we use OCamlformat instances like these will be picked up.


> +	let doms = ref (None) in
> +
> +	let require_doms () =
> +		match !doms with
> +		| None ->

Alternative could be:
| None when !eventchnfd = None ->
 ...
| None ->
 ...
| Some d -> d



> +			let missing_eventchnfd = !eventchnfd = None in
> +			if missing_eventchnfd then
> +				warn "No event channel file descriptor available in dump!";
> +			let eventchn = Event.init ?fd:!eventchnfd () in
> +			let domains = createdoms eventchn in
> +			if missing_eventchnfd then
> +				Event.bind_dom_exc_virq eventchn;
> +			doms := Some domains;
> +			domains
> +		| Some d -> d
> +	in
> 	let global_f ~rw =
> 		let get_listen_sock sockfd =
> 			let fd = sockfd |> int_of_string |> Utils.FD.of_int in
> @@ -201,6 +223,10 @@ let from_channel store cons doms chan =
> 		in
> 		rwro := get_listen_sock rw
> 	in
> +	let event_f ~eventfd =
> +		let fd = eventfd |> int_of_string |> Utils.FD.of_int in
> +		eventchnfd := Some fd
> +	in
> 	let socket_f ~fd =
> 		let ufd = Utils.FD.of_int fd in
> 		let is_valid = try (Unix.fstat ufd).Unix.st_kind = Unix.S_SOCK with _ -> false in
> @@ -209,12 +235,13 @@ let from_channel store cons doms chan =
> 		else
> 			warn "Ignoring invalid socket FD %d" fd
> 	in
> -	let domain_f domid mfn port =
> +	let domain_f domid mfn port restore_localport =
> +		let doms = require_doms () in
> 		let ndom =
> 			if domid > 0 then
> -				Domains.create doms domid mfn port
> +				Domains.create doms domid mfn ?restore_localport port
> 			else
> -				Domains.create0 doms
> +				Domains.create0 ?restore_localport doms
> 			in
> 		Connections.add_domain cons ndom;
> 		in
> @@ -229,8 +256,8 @@ let from_channel store cons doms chan =
> 		op.Store.write path value;
> 		op.Store.setperms path perms
> 		in
> -	from_channel_f chan global_f socket_f domain_f watch_f store_f;
> -	!rwro
> +	from_channel_f chan global_f event_f socket_f domain_f watch_f store_f;
> +	!rwro, require_doms ()
> 
> let from_file store cons doms file =
> 	info "Loading xenstore dump from %s" file;
> @@ -238,7 +265,7 @@ let from_file store cons doms file =
> 	finally (fun () -> from_channel store doms cons channel)
> 	        (fun () -> close_in channel)
> 
> -let to_channel store cons rw chan =
> +let to_channel store cons (rw, eventchn) chan =
> 	let hexify s = Utils.hexify s in
> 
> 	fprintf chan "%s\n" dump_format_header;
> @@ -247,6 +274,7 @@ let to_channel store cons rw chan =
> 		Unix.clear_close_on_exec fd;
> 		Utils.FD.to_int fd in
> 	fprintf chan "global,%d\n" (fdopt rw);
> +	fprintf chan "eventchnfd,%d\n" (Utils.FD.to_int @@ Event.fd eventchn);
> 
> 	(* dump connections related to domains: domid, mfn, eventchn port/ sockets, and watches *)
> 	Connections.iter cons (fun con -> Connection.dump con chan);
> @@ -367,7 +395,6 @@ let _ =
> 	| None         -> () end;
> 
> 	let store = Store.create () in
> -	let eventchn = Event.init () in
> 	let next_frequent_ops = ref 0. in
> 	let advance_next_frequent_ops () =
> 		next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
> @@ -375,16 +402,8 @@ let _ =
> 	let delay_next_frequent_ops_by duration =
> 		next_frequent_ops := !next_frequent_ops +. duration
> 	in
> -	let domains = Domains.init eventchn advance_next_frequent_ops in
> +	let domains eventchn = Domains.init eventchn advance_next_frequent_ops in
> 
> -	(* For things that need to be done periodically but more often
> -	 * than the periodic_ops function *)
> -	let frequent_ops () =
> -		if Unix.gettimeofday () > !next_frequent_ops then (
> -			History.trim ();
> -			Domains.incr_conflict_credit domains;
> -			advance_next_frequent_ops ()
> -		) in
> 	let cons = Connections.create () in
> 
> 	let quit = ref false in
> @@ -393,15 +412,15 @@ let _ =
> 	List.iter (fun path ->
> 		Store.write store Perms.Connection.full_rights path "") Store.Path.specials;
> 
> -	let rw_sock =
> +	let rw_sock, domains =
> 	if cf.restart && Sys.file_exists Disk.xs_daemon_database then (
> -		let rwro = DB.from_file store domains cons Disk.xs_daemon_database in
> +		let rw, domains = DB.from_file store domains cons Disk.xs_daemon_database in
> 		info "Live reload: database loaded";
> -		Event.bind_dom_exc_virq eventchn;
> 		Process.LiveUpdate.completed ();
> -		rwro
> +		rw, domains
> 	) else (
> 		info "No live reload: regular startup";
> +		let domains = domains @@ Event.init () in
> 		if !Disk.enable then (
> 			info "reading store from disk";
> 			Disk.read store
> @@ -411,13 +430,23 @@ let _ =
> 		if not (Store.path_exists store localpath) then
> 			Store.mkdir store (Perms.Connection.create 0) localpath;
> 
> +		let eventchn = Event.init () in
> 		if cf.domain_init then (
> 			Connections.add_domain cons (Domains.create0 domains);
> 			Event.bind_dom_exc_virq eventchn
> 		);
> -		rw_sock
> +		rw_sock, domains
> 	) in
> 
> +	(* For things that need to be done periodically but more often
> +	 * than the periodic_ops function *)
> +	let frequent_ops () =
> +		if Unix.gettimeofday () > !next_frequent_ops then (
> +			History.trim ();
> +			Domains.incr_conflict_credit domains;
> +			advance_next_frequent_ops ()
> +		) in
> +
> 	(* required for xenstore-control to detect availability of live-update *)
> 	let tool_path = Store.Path.of_string "/tool" in
> 	if not (Store.path_exists store tool_path) then
> @@ -433,10 +462,11 @@ let _ =
> 	Sys.set_signal Sys.sigpipe Sys.Signal_ignore;
> 
> 	if cf.activate_access_log then begin
> -		let post_rotate () = DB.to_file store cons (None) Disk.xs_daemon_database in
> +		let post_rotate () = DB.to_file store cons (None, Domains.eventchn domains) Disk.xs_daemon_database in
> 		Logging.init_access_log post_rotate
> 	end;
> 
> +	let eventchn = Domains.eventchn domains in
> 	let spec_fds =
> 		(match rw_sock with None -> [] | Some x -> [ x ]) @
> 		(if cf.domain_init then [ Event.fd eventchn ] else [])
> @@ -595,7 +625,7 @@ let _ =
> 			live_update := Process.LiveUpdate.should_run cons;
> 			if !live_update || !quit then begin
> 				(* don't initiate live update if saving state fails *)
> -				DB.to_file store cons (rw_sock) Disk.xs_daemon_database;
> +				DB.to_file store cons (rw_sock, eventchn) Disk.xs_daemon_database;
> 				quit := true;
> 			end
> 		with exc ->
> -- 
> 2.11.0
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement
  2022-11-22 15:20 ` [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement Andrew Cooper
@ 2022-11-23 14:50   ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2022-11-23 14:50 UTC (permalink / raw)
  To: Xen-devel; +Cc: Christian Lindig, David Scott, Edwin Torok, Rob Hoes

On 22/11/2022 15:20, Andrew Cooper wrote:
> A debug statement got inserted into a single-expression if statement.
>
> Insert brackets to give the intended meaning, rather than the actual meaning
> where the "let con = Connections..." is outside and executed unconditionally.
>
> This results in some unnecessary ring checks for domains which otherwise have
> IO credit.
>
> Fixes: 42f0581a91d4 ("tools/oxenstored: Implement live update for socket connections")
> Reported-by: Edwin Török <edvin.torok@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Christian Lindig <christian.lindig@citrix.com>
> CC: David Scott <dave@recoil.org>
> CC: Edwin Torok <edvin.torok@citrix.com>
> CC: Rob Hoes <Rob.Hoes@citrix.com>

Christian doesn't have this email for some reason, but has given me his
ack in private.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-11-23 14:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 15:20 [PATCH 0/8] Oxenstored live update fixes Andrew Cooper
2022-11-22 15:20 ` [PATCH 1/8] tools/oxenstored: Fix incorrect scope after an if statement Andrew Cooper
2022-11-23 14:50   ` Andrew Cooper
2022-11-22 15:20 ` [PATCH 2/8] tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak Andrew Cooper
2022-11-22 15:20 ` [PATCH 3/8] tools/ocaml/evtchn: Add binding for xenevtchn_fdopen() Andrew Cooper
2022-11-22 15:20 ` [PATCH 4/8] tools/ocaml/evtchn: Extend the init() binding with a cloexec flag Andrew Cooper
2022-11-22 15:20 ` [PATCH 5/8] tools/oxenstored: Keep /dev/xen/evtchn open across live update Andrew Cooper
2022-11-23 10:59   ` Christian Lindig
2022-11-22 15:20 ` [PATCH 6/8] tools/oxenstored: Log live update issues at warning level Andrew Cooper
2022-11-22 15:20 ` [PATCH 7/8] tools/oxenstored: Set uncaught exception handler Andrew Cooper
2022-11-22 15:20 ` [PATCH 8/8] tools/oxenstored/syslog: Avoid potential NULL dereference Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.