All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/25] Argo: hypervisor-mediated interdomain communication
@ 2018-12-01  1:32 Christopher Clark
  2018-12-01  1:32 ` [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen Christopher Clark
                   ` (25 more replies)
  0 siblings, 26 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Lars Kurth, Stefano Stabellini, Wei Liu,
	James McKenzie, Ross Philipson, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Jason Andryuk, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, Daniel De Graaf, Eric Chanudet,
	Jean Guyader, Roger Pau Monné

This patch series implements the Argo hypervisor-mediated interdomain
communication mechanism as an experimental feature for incorporation
into the Xen hypervisor.

Relevant to the ARM deadline for inclusion in the Xen 4.12 release,
there are very few and only minor ARM-specific changes in this series.

This is derived from the v4v work of XenClient, retained in the OpenXT
Project and developed further by Bromium in uxen. It has benefitted from
and been improved by previous rounds of review in this Xen community,
and is the combined work of a series of Xen engineers that have
preceeded the efforts of the current submission.

The motivation for this feature continues to be that a non-networking,
non-shared memory, hypervisor-mediated communication mechanism between
domains concurrently executing on the same hypervisor has attractive
properties for use cases that value strong mechanisms for policy
enforcement and isolation.

In this series, Argo is made optional for inclusion via Kconfig. When
included, it defaults to disabled and requires a Xen boot parameter to
enable it.  It has XSM integration for access control over
domain-to-domain communication, and a second boot parameter governs the
level of permissiveness over shared communication rings when using the
non-XSM/Flask default.

Design documentation can be found on the Xen wiki, at:
https://wiki.xenproject.org/wiki/Argo:_Hypervisor-Mediated_Exchange_(HMX)_for_Xen

and it will be updated to correspond to the submission here in the coming days.

Argo has recently been discussed on the Xen x86 Community Call, minutes:
https://docs.google.com/document/d/1VUPdWwd1raDOPhjReVVkmb6YoQB3X5oU12E4ExjO1n0/edit#heading=h.mz1wjb9vekjn

In (very) short, Argo is implemented by a new hypercall with five operations:
    * register ring
    * unregister ring
    * sendv
    * notify
    * get config

Ring registration is performed by a domain to provide a region of memory
for receiving messages from one or many other domains. A domain can
issue a send operation to send messages to another domain's ring. The
data is transferred synchronously by the hypervisor. There is no shared
memory between domains, allowing for increased confidence by the domain
that the memory accesses in the registered ring conform to the expected
protocol. The hypervisor is able to enforce access control policy over
the communication.

== Naming

v4v lives on in the Bromium uxen codebase. It is not the same
implementation as this, it doesn't have quite the same properties and
I don't expect the two to converge (though I do hope continued
cross-pollination will happen). Given that, this feature needs to be
describable with a different name.

It's also a complex enough system, with design details that matter and
affect important properties of it, that a generic term (eg. "message
rings") is not sufficient.

Xen's name originates from Xenia, the ancient Greek concept of
hospitality. Argo is the ship from Greek mythology that provided secure
transport for the mission to obtain the Golden Fleece. This feature aims
to provide secure transport.

With this series, I'm proposing that this work shall use the name: argo.
(short, pronouncable, unique within Xen's context so acceptable in code
and material artefacts will be discoverable with a search engine.)

Valued feedback was given in review prior to this posting about whether
naming aspects of the implementation 'argo' was ok. I took this
seriously, and spent significant time looking at how to reduce the level
of argo-ness in this implementation. This version does incorporate changes
from that effort but in general, my view is that use of the name in the
code assists the clarity of it, so much of it has been retained.

The term "Hypervisor-Mediated data eXchange (HMX)" was introduced in a
presentation at the Platform Security Summit 2018, to describe the
general, hypervisor-agnostic, capability of data transfer between
domains performed by the hypervisor. It is viewable at:

  https://www.platformsecuritysummit.com/2018/speaker/clark/

Argo conforms to HMX as described, as does Hyper-V's message-sending
primitive.

== Future items

The Linux device driver used to test this software is derived from the
OpenXT v4v Linux device driver, available at:
    https://github.com/OpenXT/v4v
The Argo implementation is not yet ready to publish (focus has been on
the hypervisor code to this point). A Linux device driver suitable for
inclusion in Xen will be submitted for a future Xen release and
incorporation into OpenXT.

This submission does not include a firewall for constraining
domain-to-domain communication. The XSM hooks added currently provide
granularity of control at domain-to-domain level. We intend to extend
this to provide finer-grained access control in a future submission, but
the current implementation should be sufficient to provide sufficient
isolation for some use cases.

Communication between VMs at different levels of nesting in a
multi-hypervisor system is of strong interest and will inform near-term
enhancements.

Optimization of notification delivery to VMs is a known area for improvement.
* uxen's v4v uses an edge-triggered interrupt to reduce VMEXIT load.
* delivering extended notification data via a dedicated registered ring
  will allow a guest to avoid a search to identify notification causes.

Additional items will be noted on the Xen wiki.

== Credits

Contributors to the design and implementation of this software include:
James McKenzie, Jean Guyader, Ross Philipson, Christopher Clark

with the support of the OpenXT Project.

Thanks are due for the helpful reviews of earlier revisions by
Tim Deegan, Jan Beulich, Ian Campbell and Eric Chanudet.


Christopher Clark (25):
  xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen
  argo: Introduce the Kconfig option to govern inclusion of Argo
  argo: introduce the argo_message_op hypercall boilerplate
  argo: define argo_dprintk for subsystem debugging
  argo: Add initial argo_init and argo_destroy
  argo: Xen command line parameter 'argo': bool to enable/disable
  xen: add errno-returning functions for copy to and from guest
  xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE
  errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  arm: introduce guest_handle_for_field()
  xsm, argo: XSM control for argo register operation, argo_mac bootparam
  xsm, argo: XSM control for argo message send operation
  argo: implement the register op
  argo: implement the unregister op
  argo: implement the sendv op
  argo: implement the notify op
  xsm, argo: XSM control for any access to argo by a domain
  argo: limit the max number of rings that a domain may register.
  argo: limit the max number of notify requests in a single operation.
  argo, xsm: notify: don't describe rings that cannot be sent to
  argo: add array_index_nospec to guard the result of the hash func
  xen/evtchn: expose send_guest_global_virq for use within Xen
  argo: signal x86 HVM and ARM via VIRQ
  argo: unmap rings on suspend and send signal to ring-owners on resume
  argo: implement the get_config op to query notification config

 xen/arch/x86/guest/hypercall_page.S   |    2 +-
 xen/arch/x86/hvm/hypercall.c          |    3 +
 xen/arch/x86/hypercall.c              |    3 +
 xen/arch/x86/pv/hypercall.c           |    3 +
 xen/common/Kconfig                    |   20 +
 xen/common/Makefile                   |    1 +
 xen/common/argo.c                     | 1960 +++++++++++++++++++++++++++++++++
 xen/common/domain.c                   |   24 +
 xen/common/event_channel.c            |   37 +-
 xen/include/asm-arm/guest_access.h    |   30 +
 xen/include/asm-x86/guest_access.h    |   31 +
 xen/include/public/argo.h             |  280 +++++
 xen/include/public/errno.h            |    2 +
 xen/include/public/xen.h              |    6 +-
 xen/include/xen/argo.h                |   32 +
 xen/include/xen/event.h               |   10 +
 xen/include/xen/guest_access.h        |    3 +
 xen/include/xen/hypercall.h           |    9 +
 xen/include/xen/sched.h               |    7 +
 xen/include/xsm/dummy.h               |   25 +
 xen/include/xsm/xsm.h                 |   29 +
 xen/xsm/dummy.c                       |    6 +
 xen/xsm/flask/hooks.c                 |   33 +
 xen/xsm/flask/policy/access_vectors   |   16 +
 xen/xsm/flask/policy/security_classes |    1 +
 25 files changed, 2563 insertions(+), 10 deletions(-)
 create mode 100644 xen/common/argo.c
 create mode 100644 xen/include/public/argo.h
 create mode 100644 xen/include/xen/argo.h

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-03 16:20   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Allocates an IPI-bound event channel on vcpu0 for specified domain.

Is able to bypass the existence check on vcpu number since vcpu 0
should always exist. Bypass is required at the point of use by Argo.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/event_channel.c | 35 +++++++++++++++++++++++++++++------
 xen/include/xen/event.h    |  3 +++
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index f34d4f0..3dfde83 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -411,17 +411,12 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
 }
 
 
-static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
+static long evtchn_bind_ipi_domain(struct domain *d, evtchn_bind_ipi_t *bind)
 {
     struct evtchn *chn;
-    struct domain *d = current->domain;
     int            port, vcpu = bind->vcpu;
     long           rc = 0;
 
-    if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
-         (d->vcpu[vcpu] == NULL) )
-        return -ENOENT;
-
     spin_lock(&d->event_lock);
 
     if ( (port = get_free_port(d)) < 0 )
@@ -446,6 +441,34 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
 }
 
 
+static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
+{
+    struct domain *d = current->domain;
+    int         vcpu = bind->vcpu;
+
+    if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
+         (d->vcpu[vcpu] == NULL) )
+        return -ENOENT;
+
+    return evtchn_bind_ipi_domain(d, bind);
+}
+
+long evtchn_bind_ipi_vcpu0_domain(struct domain *d, evtchn_port_t *out_port)
+{
+    evtchn_bind_ipi_t bind_ipi;
+    long              rc;
+
+    bind_ipi.vcpu = 0;
+
+    rc = evtchn_bind_ipi_domain(d, &bind_ipi);
+
+    if ( !rc )
+        *out_port = bind_ipi.port;
+
+    return rc;
+}
+
+
 static void link_pirq_port(int port, struct evtchn *chn, struct vcpu *v)
 {
     chn->u.pirq.prev_port = 0;
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index ebb879e..18c3738 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -86,6 +86,9 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 /* Inject an event channel notification into the guest */
 void arch_evtchn_inject(struct vcpu *v);
 
+/* Allocate an IPI event channel on vcpu0 for the specified domain */
+long evtchn_bind_ipi_vcpu0_domain(struct domain *d, evtchn_port_t *out_port);
+
 /*
  * Internal event channel object storage.
  *
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
  2018-12-01  1:32 ` [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-03 15:51   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate Christopher Clark
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/Kconfig | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 68132a3..a06ddcb 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -200,6 +200,26 @@ config LATE_HWDOM
 
 	  If unsure, say N.
 
+config ARGO
+    bool "Argo: hypervisor-mediated interdomain communication"
+    default y
+    ---help---
+      Enables a hypercall for domains to ask the hypervisor to perform
+      data transfer of messages between domains.
+
+      This allows communication channels to be established that do not
+      require any shared memory between domains; the hypervisor is the
+      entity that each domain interacts with. The hypervisor is able to
+      enforce Mandatory Access Control policy over the communication.
+
+      If XSM_FLASK is enabled, XSM policy can govern which domains may
+      communicate via the Argo system.
+
+      This feature does nothing if the "argo" boot parameter is not present.
+      Argo is disabled at runtime by default.
+
+      If unsure, say Y.
+
 menu "Schedulers"
 	visible if EXPERT = "y"
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
  2018-12-01  1:32 ` [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen Christopher Clark
  2018-12-01  1:32 ` [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:44   ` Paul Durrant
  2018-12-01  1:32 ` [PATCH 04/25] argo: define argo_dprintk for subsystem debugging Christopher Clark
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet, Roger Pau Monné

Presence is gated upon CONFIG_ARGO.

Registers the hypercall previously reserved for this.
Takes 5 arguments, does nothing and returns -ENOSYS.

Will be avoiding a compat ABI by using fixed-size types in hypercall ops.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/arch/x86/guest/hypercall_page.S |  2 +-
 xen/arch/x86/hvm/hypercall.c        |  3 +++
 xen/arch/x86/hypercall.c            |  3 +++
 xen/arch/x86/pv/hypercall.c         |  3 +++
 xen/common/Makefile                 |  1 +
 xen/common/argo.c                   | 28 ++++++++++++++++++++++++++++
 xen/include/public/xen.h            |  2 +-
 xen/include/xen/hypercall.h         |  9 +++++++++
 8 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 xen/common/argo.c

diff --git a/xen/arch/x86/guest/hypercall_page.S b/xen/arch/x86/guest/hypercall_page.S
index fdd2e72..6c56d66 100644
--- a/xen/arch/x86/guest/hypercall_page.S
+++ b/xen/arch/x86/guest/hypercall_page.S
@@ -59,7 +59,7 @@ DECLARE_HYPERCALL(sysctl)
 DECLARE_HYPERCALL(domctl)
 DECLARE_HYPERCALL(kexec_op)
 DECLARE_HYPERCALL(tmem_op)
-DECLARE_HYPERCALL(xc_reserved_op)
+DECLARE_HYPERCALL(argo_message_op)
 DECLARE_HYPERCALL(xenpmu_op)
 
 DECLARE_HYPERCALL(arch_0)
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 19d1263..ee3c9f1 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -134,6 +134,9 @@ static const hypercall_table_t hvm_hypercall_table[] = {
 #ifdef CONFIG_TMEM
     HYPERCALL(tmem_op),
 #endif
+#ifdef CONFIG_ARGO
+    HYPERCALL(argo_message_op),
+#endif
     COMPAT_CALL(platform_op),
 #ifdef CONFIG_PV
     COMPAT_CALL(mmuext_op),
diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index 032de8f..7da7e89 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -64,6 +64,9 @@ const hypercall_args_t hypercall_args_table[NR_hypercalls] =
     ARGS(domctl, 1),
     ARGS(kexec_op, 2),
     ARGS(tmem_op, 1),
+#ifdef CONFIG_ARGO
+    ARGS(argo_message_op, 5),
+#endif
     ARGS(xenpmu_op, 2),
 #ifdef CONFIG_HVM
     ARGS(hvm_op, 2),
diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
index 5d11911..c3fd555 100644
--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -77,6 +77,9 @@ const hypercall_table_t pv_hypercall_table[] = {
 #ifdef CONFIG_TMEM
     HYPERCALL(tmem_op),
 #endif
+#ifdef CONFIG_ARGO
+    HYPERCALL(argo_message_op),
+#endif
     HYPERCALL(xenpmu_op),
 #ifdef CONFIG_HVM
     HYPERCALL(hvm_op),
diff --git a/xen/common/Makefile b/xen/common/Makefile
index ffdfb74..8c65c6f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,3 +1,4 @@
+obj-$(CONFIG_ARGO) += argo.o
 obj-y += bitmap.o
 obj-y += bsearch.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
diff --git a/xen/common/argo.c b/xen/common/argo.c
new file mode 100644
index 0000000..76017d4
--- /dev/null
+++ b/xen/common/argo.c
@@ -0,0 +1,28 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Derived from v4v, the version 2 of v2v.
+ *
+ * Copyright (c) 2010, Citrix Systems
+ * Copyright (c) 2018, BAE Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/errno.h>
+#include <xen/guest_access.h>
+
+long
+do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
+                   XEN_GUEST_HANDLE_PARAM(void) arg2,
+                   uint32_t arg3, uint32_t arg4)
+{
+    return -ENOSYS;
+}
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 68ee098..0a27546 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -118,7 +118,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_domctl               36
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
-#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_argo_message_op      39
 #define __HYPERVISOR_xenpmu_op            40
 #define __HYPERVISOR_dm_op                41
 
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index cc99aea..112514c 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -136,6 +136,15 @@ do_tmem_op(
     XEN_GUEST_HANDLE_PARAM(tmem_op_t) uops);
 #endif
 
+#ifdef CONFIG_ARGO
+extern long do_argo_message_op(
+    int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg1,
+    XEN_GUEST_HANDLE_PARAM(void) arg2,
+    uint32_t arg3,
+    uint32_t arg4);
+#endif
+
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 04/25] argo: define argo_dprintk for subsystem debugging
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (2 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-03 15:59   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 05/25] argo: Add initial argo_init and argo_destroy Christopher Clark
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

A convenience for working on development of the argo subsystem:
toggling a local #define variable turns on just the debug messages
in this subsystem.

  printk("argo: " format, ## args )

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 76017d4..6917f98 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -19,6 +19,19 @@
 #include <xen/errno.h>
 #include <xen/guest_access.h>
 
+/*
+ * Debugs
+ */
+
+#ifdef ARGO_DEBUG
+#define argo_dprintk(format, args...)            \
+    do {                                         \
+        printk("argo: " format, ## args );       \
+    } while ( 1 == 0 )
+#else
+#define argo_dprintk(format, ... ) (void)0
+#endif
+
 long
 do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
                    XEN_GUEST_HANDLE_PARAM(void) arg2,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 05/25] argo: Add initial argo_init and argo_destroy
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (3 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 04/25] argo: define argo_dprintk for subsystem debugging Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:12   ` Paul Durrant
  2018-12-13 13:16   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable Christopher Clark
                   ` (20 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Initialises basic data structures and performs teardown of argo state
for domain shutdown.

Introduces headers:
  <public/argo.h> with definions of addresses and ring structure, including
  indexes for atomic update for communication between domain and hypervisor,
  and <xen/argo.h> to support hooking init and destroy into domain lifecycle.

If CONFIG_ARGO is enabled:

Adds per-domain init of argo data structures to domain_create by calling
argo_init, and similarly adds teardown via argo_destroy into domain_destroy
and the error exit path of domain_create.

argo_init allocates an event channel for use for signalling to the domain.
The event channel is of type IPI since that behaves in the required way;
unbound event channels are unsuitable since they silently drop events.
The only disadvantage of the IPI type is that the channel cannot be rebound
to any other VCPU; that seems to be tolerable and avoids introducing any
further changes to add another channel type.

In accordance with recent work on _domain_destroy, argo_destroy is idempotent.

Adds two new fields to struct domain:
    rwlock_t argo_lock;
    struct argo_domain *argo;

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c         | 277 +++++++++++++++++++++++++++++++++++++++++++++-
 xen/common/domain.c       |  15 +++
 xen/include/public/argo.h |  55 +++++++++
 xen/include/xen/argo.h    |  30 +++++
 xen/include/xen/sched.h   |   7 ++
 5 files changed, 383 insertions(+), 1 deletion(-)
 create mode 100644 xen/include/public/argo.h
 create mode 100644 xen/include/xen/argo.h

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 6917f98..1872d37 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -17,7 +17,101 @@
  */
 
 #include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/argo.h>
+#include <xen/event.h>
+#include <xen/domain_page.h>
 #include <xen/guest_access.h>
+#include <xen/time.h>
+
+DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
+DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
+
+struct argo_pending_ent
+{
+    struct hlist_node node;
+    domid_t id;
+    uint32_t len;
+};
+
+struct argo_ring_info
+{
+    /* next node in the hash, protected by L2 */
+    struct hlist_node node;
+    /* this ring's id, protected by L2 */
+    argo_ring_id_t id;
+    /* used to confirm sender id, protected by L2 */
+    uint64_t partner_cookie;
+    /* L3 */
+    spinlock_t lock;
+    /* cached length of the ring (from ring->len), protected by L3 */
+    uint32_t len;
+    /* number of pages in the ring, protected by L3 */
+    uint32_t npage;
+    /* number of pages translated into mfns, protected by L3 */
+    uint32_t nmfns;
+    /* cached tx pointer location, protected by L3 */
+    uint32_t tx_ptr;
+    /* mapped ring pages protected by L3 */
+    uint8_t **mfn_mapping;
+    /* list of mfns of guest ring, protected by L3 */
+    mfn_t *mfns;
+    /* list of struct argo_pending_ent for this ring, protected by L3 */
+    struct hlist_head pending;
+};
+
+/*
+ * The value of the argo element in a struct domain is
+ * protected by the global lock argo_lock: L1
+ */
+#define ARGO_HTABLE_SIZE 32
+struct argo_domain
+{
+    /* L2 */
+    rwlock_t lock;
+    /* event channel */
+    evtchn_port_t evtchn_port;
+    /* protected by L2 */
+    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
+    /* id cookie, written only at init, so readable with R(L1) */
+    uint64_t domain_cookie;
+};
+
+/*
+ * locks
+ */
+
+/*
+ * locking is organized as follows:
+ *
+ * L1 : The global lock: argo_lock
+ * Protects the argo elements of all struct domain *d in the system.
+ * It does not protect any of the elements of d->argo, only their
+ * addresses.
+ * By extension since the destruction of a domain with a non-NULL
+ * d->argo will need to free the d->argo pointer, holding this lock
+ * guarantees that no domains pointers that argo is interested in
+ * become invalid whilst this lock is held.
+ */
+
+static DEFINE_RWLOCK(argo_lock); /* L1 */
+
+/*
+ * L2 : The per-domain lock: d->argo->lock
+ * Holding a read lock on L2 protects the hash table and
+ * the elements in the hash_table d->argo->ring_hash, and
+ * the node and id fields in struct argo_ring_info in the
+ * hash table.
+ * Holding a write lock on L2 protects all of the elements of
+ * struct argo_ring_info.
+ * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
+ *
+ * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
+ * Protects len, tx_ptr, the guest ring, the guest ring_data and
+ * the pending list.
+ * To aquire L3 you must already have R(L2). W(L2) implies L3.
+ */
 
 /*
  * Debugs
@@ -32,10 +126,191 @@
 #define argo_dprintk(format, ... ) (void)0
 #endif
 
+/*
+ * ring buffer
+ */
+
+/* caller must have L3 or W(L2) */
+static void
+argo_ring_unmap(struct argo_ring_info *ring_info)
+{
+    int i;
+
+    if ( !ring_info->mfn_mapping )
+        return;
+
+    for ( i = 0; i < ring_info->nmfns; i++ )
+    {
+        if ( !ring_info->mfn_mapping[i] )
+            continue;
+        if ( ring_info->mfns )
+            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
+                         mfn_x(ring_info->mfns[i]),
+                         ring_info->mfn_mapping[i]);
+        unmap_domain_page_global(ring_info->mfn_mapping[i]);
+        ring_info->mfn_mapping[i] = NULL;
+    }
+}
+
+/*
+ * pending
+ */
+static void
+argo_pending_remove_ent(struct argo_pending_ent *ent)
+{
+    hlist_del(&ent->node);
+    xfree(ent);
+}
+
+static void
+argo_pending_remove_all(struct argo_ring_info *ring_info)
+{
+    struct hlist_node *node, *next;
+    struct argo_pending_ent *pending_ent;
+
+    hlist_for_each_entry_safe(pending_ent, node, next,
+                              &ring_info->pending, node)
+    {
+        argo_pending_remove_ent(pending_ent);
+    }
+}
+
+static void argo_ring_remove_mfns(const struct domain *d,
+                                  struct argo_ring_info *ring_info)
+{
+    int i;
+
+    ASSERT(rw_is_write_locked(&d->argo->lock));
+
+    if ( !ring_info->mfns )
+        return;
+    ASSERT(ring_info->mfn_mapping);
+
+    argo_ring_unmap(ring_info);
+
+    for ( i = 0; i < ring_info->nmfns; i++ )
+        if ( mfn_x(ring_info->mfns[i]) != mfn_x(INVALID_MFN) )
+            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
+
+    xfree(ring_info->mfns);
+    ring_info->mfns = NULL;
+    ring_info->npage = 0;
+    xfree(ring_info->mfn_mapping);
+    ring_info->mfn_mapping = NULL;
+    ring_info->nmfns = 0;
+}
+
+static void
+argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
+{
+    ASSERT(rw_is_write_locked(&d->argo->lock));
+
+    /* Holding W(L2) so do not need to acquire L3 */
+    argo_pending_remove_all(ring_info);
+    hlist_del(&ring_info->node);
+    argo_ring_remove_mfns(d, ring_info);
+    xfree(ring_info);
+}
+
 long
 do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
                    XEN_GUEST_HANDLE_PARAM(void) arg2,
                    uint32_t arg3, uint32_t arg4)
 {
-    return -ENOSYS;
+    struct domain *d = current->domain;
+    long rc = -EFAULT;
+
+    argo_dprintk("->do_argo_message_op(%d,%p,%p,%d,%d)\n", cmd,
+                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
+
+    domain_lock(d);
+
+    switch (cmd)
+    {
+    default:
+        rc = -ENOSYS;
+        break;
+    }
+
+    domain_unlock(d);
+    argo_dprintk("<-do_argo_message_op()=%ld\n", rc);
+    return rc;
+}
+
+int
+argo_init(struct domain *d)
+{
+    struct argo_domain *argo;
+    evtchn_port_t port;
+    int i;
+    int rc;
+
+    argo = xmalloc(struct argo_domain);
+    if ( !argo )
+        return -ENOMEM;
+
+    rwlock_init(&argo->lock);
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
+        INIT_HLIST_HEAD(&argo->ring_hash[i]);
+
+    rc = evtchn_bind_ipi_vcpu0_domain(d, &port);
+    if ( rc )
+    {
+        xfree(argo);
+        return rc;
+    }
+    argo->evtchn_port = port;
+    argo->domain_cookie = (uint64_t)NOW();
+
+    write_lock(&argo_lock);
+    d->argo = argo;
+    write_unlock(&argo_lock);
+
+    return 0;
+}
+
+void
+argo_destroy(struct domain *d)
+{
+    int i;
+
+    BUG_ON(!d->is_dying);
+    write_lock(&argo_lock);
+
+    argo_dprintk("d->v=%p\n", d->argo);
+
+    if ( d->argo )
+    {
+        for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
+        {
+            struct hlist_node *node, *next;
+            struct argo_ring_info *ring_info;
+
+            hlist_for_each_entry_safe(ring_info, node,
+                                      next, &d->argo->ring_hash[i],
+                                      node)
+            {
+                argo_ring_remove_info(d, ring_info);
+            }
+        }
+        /*
+         * Since this function is only called during domain destruction,
+         * argo->evtchn_port need not be closed here. ref: evtchn_destroy
+         */
+        d->argo->domain_cookie = 0;
+        xfree(d->argo);
+        d->argo = NULL;
+    }
+    write_unlock(&argo_lock);
+
+    /*
+     * This (dying) domain's domid may be recorded as the authorized sender
+     * to rings registered by other domains, and those rings are not
+     * unregistered here.
+     * If a later domain is created that has the same domid as this one, the
+     * domain_cookie will differ, which ensures that the new domain cannot
+     * use the inherited authorizations to transmit that were issued to this
+     * domain.
+     */
 }
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 78cc524..eadea4d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -277,6 +277,10 @@ static void _domain_destroy(struct domain *d)
 
     xfree(d->pbuf);
 
+#ifdef CONFIG_ARGO
+    argo_destroy(d);
+#endif
+
     rangeset_domain_destroy(d);
 
     free_cpumask_var(d->dirty_cpumask);
@@ -376,6 +380,9 @@ struct domain *domain_create(domid_t domid,
     spin_lock_init(&d->hypercall_deadlock_mutex);
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
+#ifdef CONFIG_ARGO
+    rwlock_init(&d->argo_lock);
+#endif
 
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
@@ -445,6 +452,11 @@ struct domain *domain_create(domid_t domid,
             goto fail;
         init_status |= INIT_gnttab;
 
+#ifdef CONFIG_ARGO
+        if ( (err = argo_init(d)) != 0 )
+            goto fail;
+#endif
+
         err = -ENOMEM;
 
         d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
@@ -717,6 +729,9 @@ int domain_kill(struct domain *d)
         if ( d->is_dying != DOMDYING_alive )
             return domain_kill(d);
         d->is_dying = DOMDYING_dying;
+#ifdef CONFIG_ARGO
+        argo_destroy(d);
+#endif
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem_client);
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
new file mode 100644
index 0000000..20dabc0
--- /dev/null
+++ b/xen/include/public/argo.h
@@ -0,0 +1,55 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Derived from v4v, the version 2 of v2v.
+ *
+ * Copyright (c) 2010, Citrix Systems
+ * Copyright (c) 2018, BAE Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_PUBLIC_ARGO_H__
+#define __XEN_PUBLIC_ARGO_H__
+
+#include "xen.h"
+
+typedef struct argo_addr
+{
+    uint32_t port;
+    domid_t domain_id;
+    uint16_t pad;
+} argo_addr_t;
+
+typedef struct argo_ring_id
+{
+    struct argo_addr addr;
+    domid_t partner;
+    uint16_t pad;
+} argo_ring_id_t;
+
+typedef struct argo_ring
+{
+    uint64_t magic;
+    argo_ring_id_t id;
+    uint32_t len;
+    /* Guests should use atomic operations to access rx_ptr */
+    uint32_t rx_ptr;
+    /* Guests should use atomic operations to access tx_ptr */
+    uint32_t tx_ptr;
+    uint8_t reserved[32];
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    uint8_t ring[];
+#elif defined(__GNUC__)
+    uint8_t ring[0];
+#endif
+} argo_ring_t;
+
+#endif
diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
new file mode 100644
index 0000000..c037de6
--- /dev/null
+++ b/xen/include/xen/argo.h
@@ -0,0 +1,30 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Derived from v4v, the version 2 of v2v.
+ *
+ * Copyright (c) 2010, Citrix Systems
+ * Copyright (c) 2018, BAE Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_ARGO_H__
+#define __XEN_ARGO_H__
+
+#include <xen/types.h>
+#include <public/argo.h>
+
+struct argo_domain;
+
+int argo_init(struct domain *d);
+void argo_destroy(struct domain *d);
+
+#endif
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0309c1f..4a19b55 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -22,6 +22,7 @@
 #include <asm/atomic.h>
 #include <xen/vpci.h>
 #include <xen/wait.h>
+#include <xen/argo.h>
 #include <public/xen.h>
 #include <public/domctl.h>
 #include <public/sysctl.h>
@@ -490,6 +491,12 @@ struct domain
         unsigned int guest_request_enabled       : 1;
         unsigned int guest_request_sync          : 1;
     } monitor;
+
+#ifdef CONFIG_ARGO
+    /* Argo interdomain communication support */
+    rwlock_t argo_lock;
+    struct argo_domain *argo;
+#endif
 };
 
 /* Protect updates/reads (resp.) of domain_list and domain_hash. */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (4 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 05/25] argo: Add initial argo_init and argo_destroy Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:18   ` Paul Durrant
  2018-12-04 11:35   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy Christopher Clark
                   ` (19 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Default to disabled.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 1872d37..82fab36 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -28,6 +28,10 @@
 DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
 DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
 
+/* Xen command line option to enable argo */
+static bool __read_mostly opt_argo_enabled = 0;
+boolean_param("argo", opt_argo_enabled);
+
 struct argo_pending_ent
 {
     struct hlist_node node;
@@ -223,6 +227,13 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
     argo_dprintk("->do_argo_message_op(%d,%p,%p,%d,%d)\n", cmd,
                  (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
 
+    if ( unlikely(!opt_argo_enabled) )
+    {
+        rc = -ENOSYS;
+        argo_dprintk("<-do_argo_message_op()=%ld\n", rc);
+        return rc;
+    }
+
     domain_lock(d);
 
     switch (cmd)
@@ -245,6 +256,14 @@ argo_init(struct domain *d)
     int i;
     int rc;
 
+    if ( !opt_argo_enabled )
+    {
+        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
+        return 0;
+    }
+
+    argo_dprintk("argo init: domid: %d\n", d->domain_id);
+
     argo = xmalloc(struct argo_domain);
     if ( !argo )
         return -ENOMEM;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (5 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:35   ` Paul Durrant
  2018-12-12 16:01   ` Roger Pau Monné
  2018-12-01  1:32 ` [PATCH 08/25] xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE Christopher Clark
                   ` (18 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet, Roger Pau Monné

Applied to both x86 and ARM headers.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/include/asm-arm/guest_access.h | 25 +++++++++++++++++++++++++
 xen/include/asm-x86/guest_access.h | 29 +++++++++++++++++++++++++++++
 xen/include/xen/guest_access.h     |  3 +++
 3 files changed, 57 insertions(+)

diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
index 224d2a0..7b6f89c 100644
--- a/xen/include/asm-arm/guest_access.h
+++ b/xen/include/asm-arm/guest_access.h
@@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
 #define __raw_copy_from_guest raw_copy_from_guest
 #define __raw_clear_guest raw_clear_guest
 
+#define raw_copy_from_guest_errno(dst, src, len)             \
+    (raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
+#define raw_copy_to_guest_errno(dst, src, len)               \
+    (raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)
+
 /* Remainder copied from x86 -- could be common? */
 
 /* Is the guest handle a NULL reference? */
@@ -113,6 +118,26 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
     raw_copy_from_guest(_d, _s, sizeof(*_d));           \
 })
 
+/* errno returning copy functions */
+#define copy_from_guest_offset_errno(ptr, hnd, off, nr) ({              \
+            const typeof(*(ptr)) *_s = (hnd).p;                         \
+            typeof(*(ptr)) *_d = (ptr);                                 \
+            raw_copy_from_guest_errno(_d, _s + (off), sizeof(*_d) * (nr)); \
+        })
+
+#define copy_field_to_guest_errno(hnd, ptr, field) ({           \
+            const typeof(&(ptr)->field) _s = &(ptr)->field;     \
+            void *_d = &(hnd).p->field;                         \
+            ((void)(&(hnd).p->field == &(ptr)->field));         \
+            raw_copy_to_guest_errno(_d, _s, sizeof(*_s));       \
+        })
+
+#define copy_field_from_guest_errno(ptr, hnd, field) ({         \
+            const typeof(&(ptr)->field) _s = &(hnd).p->field;   \
+            typeof(&(ptr)->field) _d = &(ptr)->field;           \
+            raw_copy_from_guest_errno(_d, _s, sizeof(*_d));     \
+        })
+
 /*
  * Pre-validate a guest handle.
  * Allows use of faster __copy_* functions.
diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index ca700c9..9391cd3 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -38,6 +38,15 @@
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 
+#define raw_copy_from_guest_errno(dst, src, len)                        \
+    (is_hvm_vcpu(current) ?                                             \
+     copy_from_user_hvm((dst), (src), (len)) :                         \
+     (copy_from_user((dst), (src), (len)) ? -EFAULT : 0))
+#define raw_copy_to_guest_errno(dst, src, len)          \
+    (is_hvm_vcpu(current) ?                             \
+     copy_to_user_hvm((dst), (src), (len)) :           \
+     (copy_to_user((dst), (src), (len)) ? -EFAULT : 0))
+
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
 
@@ -121,6 +130,26 @@
     raw_copy_from_guest(_d, _s, sizeof(*_d));           \
 })
 
+/* errno returning copy functions */
+#define copy_from_guest_offset_errno(ptr, hnd, off, nr) ({              \
+            const typeof(*(ptr)) *_s = (hnd).p;                         \
+            typeof(*(ptr)) *_d = (ptr);                                 \
+            raw_copy_from_guest_errno(_d, _s + (off), sizeof(*_d) * (nr)); \
+        })
+
+#define copy_field_to_guest_errno(hnd, ptr, field) ({           \
+            const typeof(&(ptr)->field) _s = &(ptr)->field;     \
+            void *_d = &(hnd).p->field;                         \
+            ((void)(&(hnd).p->field == &(ptr)->field));         \
+            raw_copy_to_guest_errno(_d, _s, sizeof(*_s));       \
+        })
+
+#define copy_field_from_guest_errno(ptr, hnd, field) ({         \
+            const typeof(&(ptr)->field) _s = &(hnd).p->field;   \
+            typeof(&(ptr)->field) _d = &(ptr)->field;           \
+            raw_copy_from_guest_errno(_d, _s, sizeof(*_d));     \
+        })
+
 /*
  * Pre-validate a guest handle.
  * Allows use of faster __copy_* functions.
diff --git a/xen/include/xen/guest_access.h b/xen/include/xen/guest_access.h
index 09989df..3494c5f 100644
--- a/xen/include/xen/guest_access.h
+++ b/xen/include/xen/guest_access.h
@@ -26,6 +26,9 @@
 #define __copy_from_guest(ptr, hnd, nr)                 \
     __copy_from_guest_offset(ptr, hnd, 0, nr)
 
+#define copy_from_guest_errno(ptr, hnd, nr)             \
+    copy_from_guest_offset_errno(ptr, hnd, 0, nr)
+
 #define __clear_guest(hnd, nr)                          \
     __clear_guest_offset(hnd, 0, nr)
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 08/25] xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (6 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04 11:39   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/include/public/xen.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 0a27546..8dc032b 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -982,6 +982,8 @@ typedef struct {
 #define XEN_GUEST_HANDLE_64(name) XEN_GUEST_HANDLE(name)
 #endif
 
+#define XEN_GUEST_HANDLE_NULL(name) (XEN_GUEST_HANDLE(name)){(name *)0}
+
 #ifndef __ASSEMBLY__
 struct xenctl_bitmap {
     XEN_GUEST_HANDLE_64(uint8) bitmap;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (7 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 08/25] xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-03 15:42   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 10/25] arm: introduce guest_handle_for_field() Christopher Clark
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
describes these codes thus:
    EMSGSIZE     : "Message too large"
    ECONNREFUSED : "Connection refused".

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/include/public/errno.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/public/errno.h b/xen/include/public/errno.h
index 305c112..e1d02fc 100644
--- a/xen/include/public/errno.h
+++ b/xen/include/public/errno.h
@@ -102,6 +102,7 @@ XEN_ERRNO(EILSEQ,	84)	/* Illegal byte sequence */
 XEN_ERRNO(ERESTART,	85)	/* Interrupted system call should be restarted */
 #endif
 XEN_ERRNO(ENOTSOCK,	88)	/* Socket operation on non-socket */
+XEN_ERRNO(EMSGSIZE,	90)	/* Message too large. */
 XEN_ERRNO(EOPNOTSUPP,	95)	/* Operation not supported on transport endpoint */
 XEN_ERRNO(EADDRINUSE,	98)	/* Address already in use */
 XEN_ERRNO(EADDRNOTAVAIL, 99)	/* Cannot assign requested address */
@@ -109,6 +110,7 @@ XEN_ERRNO(ENOBUFS,	105)	/* No buffer space available */
 XEN_ERRNO(EISCONN,	106)	/* Transport endpoint is already connected */
 XEN_ERRNO(ENOTCONN,	107)	/* Transport endpoint is not connected */
 XEN_ERRNO(ETIMEDOUT,	110)	/* Connection timed out */
+XEN_ERRNO(ECONNREFUSED,	111)	/* Connection refused */
 
 #undef XEN_ERRNO
 #endif /* XEN_ERRNO */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 10/25] arm: introduce guest_handle_for_field()
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (8 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:46   ` Paul Durrant
  2018-12-01  1:32 ` [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam Christopher Clark
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Ross Philipson, Jason Andryuk, Daniel Smith,
	Rich Persaud, James McKenzie, Julien Grall, Paul Durrant,
	Eric Chanudet

arm port of commit bb544585137259545d4adc9afe6eed8dc7c7376d

This helper turns a field of a GUEST_HANDLE into a GUEST_HANDLE.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/include/asm-arm/guest_access.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
index 7b6f89c..1137c54 100644
--- a/xen/include/asm-arm/guest_access.h
+++ b/xen/include/asm-arm/guest_access.h
@@ -68,6 +68,9 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
     _y;                                                     \
 })
 
+#define guest_handle_for_field(hnd, type, fld)          \
+    ((XEN_GUEST_HANDLE(type)) { &(hnd).p->fld })
+
 #define guest_handle_from_ptr(ptr, type)        \
     ((XEN_GUEST_HANDLE_PARAM(type)) { (type *)ptr })
 #define const_guest_handle_from_ptr(ptr, type)  \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (9 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 10/25] arm: introduce guest_handle_for_field() Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:52   ` Paul Durrant
  2018-12-01  1:32 ` [PATCH 12/25] xsm, argo: XSM control for argo message send operation Christopher Clark
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, Eric Chanudet

XSM hooks implement distinct permissions for these two distinct cases of
Argo ring registration:

* Single source:  registering a ring for communication to receive messages
                  from a specified single other domain.
  Default policy: allow.

* Any source:     registering a ring for communication to receive messages
                  from any, or all, other domains (ie. wildcard).
  Default policy: deny, with runtime policy configuration via new bootparam.

The reason why the default for wildcard rings is 'deny' is that there is
currently no means other than XSM to protect the ring from DoS by a noisy
domain spamming the ring, reducing the ability of other domains to send to it.
Using XSM at least allows per-domain control over access to the send
permission, to limit communication to domains that can be trusted.

Since denying access to any-sender rings unless a flask XSM policy is active
will prevent many users from using a key Argo feature, also introduce a bootparam
that can override this constraint:
 "argo_mac" variable has allowed values: 'permissive' and 'enforcing'.
Even though this is a boolean variable, use these descriptive strings in order
to make it obvious to an administrator that this has potential security impact.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c                     | 15 +++++++++++++++
 xen/include/xsm/dummy.h               | 15 +++++++++++++++
 xen/include/xsm/xsm.h                 | 17 +++++++++++++++++
 xen/xsm/dummy.c                       |  4 ++++
 xen/xsm/flask/hooks.c                 | 19 +++++++++++++++++++
 xen/xsm/flask/policy/access_vectors   | 11 +++++++++++
 xen/xsm/flask/policy/security_classes |  1 +
 7 files changed, 82 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 82fab36..2a95e09 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -32,6 +32,21 @@ DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
 static bool __read_mostly opt_argo_enabled = 0;
 boolean_param("argo", opt_argo_enabled);
 
+/* Xen command line option for conservative or relaxed access control */
+bool __read_mostly argo_mac_bootparam_enforcing = true;
+
+static int __init parse_argo_mac_param(const char *s)
+{
+    if ( !strncmp(s, "enforcing", 10) )
+        argo_mac_bootparam_enforcing = true;
+    else if ( !strncmp(s, "permissive", 11) )
+        argo_mac_bootparam_enforcing = false;
+    else
+        return -EINVAL;
+    return 0;
+}
+custom_param("argo_mac", parse_argo_mac_param);
+
 struct argo_pending_ent
 {
     struct hlist_node node;
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index a29d1ef..55113c3 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -720,6 +720,21 @@ static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 
 #endif /* CONFIG_X86 */
 
+#ifdef CONFIG_ARGO
+static XSM_INLINE int xsm_argo_register_single_source(struct domain *d,
+                                                      struct domain *t)
+{
+    return 0;
+}
+
+static XSM_INLINE int xsm_argo_register_any_source(struct domain *d,
+                                                   bool strict)
+{
+    return strict ? -EPERM : 0;
+}
+
+#endif /* CONFIG_ARGO */
+
 #include <public/version.h>
 static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3b192b5..65577fd 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -181,6 +181,10 @@ struct xsm_operations {
 #endif
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
+#ifdef CONFIG_ARGO
+    int (*argo_register_single_source) (struct domain *d, struct domain *t);
+    int (*argo_register_any_source) (struct domain *d);
+#endif
 };
 
 #ifdef CONFIG_XSM
@@ -698,6 +702,19 @@ static inline int xsm_domain_resource_map(xsm_default_t def, struct domain *d)
     return xsm_ops->domain_resource_map(d);
 }
 
+#ifdef CONFIG_ARGO
+static inline xsm_argo_register_single_source(struct domain *d, struct domain *t)
+{
+    return xsm_ops->argo_register_single_source(d, t);
+}
+
+static inline xsm_argo_register_any_source(struct domain *d, bool strict)
+{
+    return xsm_ops->argo_register_any_source(d);
+}
+
+#endif /* CONFIG_ARGO */
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 5701047..ed236b0 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -152,4 +152,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
 #endif
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
+#ifdef CONFIG_ARGO
+    set_to_dummy_if_null(ops, argo_register_single_source);
+    set_to_dummy_if_null(ops, argo_register_any_source);
+#endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 96d31aa..3166561 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1717,6 +1717,21 @@ static int flask_domain_resource_map(struct domain *d)
     return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__RESOURCE_MAP);
 }
 
+#ifdef CONFIG_ARGO
+static int flask_argo_register_single_source(struct domain *d,
+                                             struct domain *t)
+{
+    return domain_has_perm(d, t, SECCLASS_ARGO,
+                           ARGO__REGISTER_SINGLE_SOURCE);
+}
+
+static int flask_argo_register_any_source(struct domain *d)
+{
+    return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
+                        ARGO__REGISTER_ANY_SOURCE, NULL);
+}
+#endif
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1851,6 +1866,10 @@ static struct xsm_operations flask_ops = {
 #endif
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
+#ifdef CONFIG_ARGO
+    .argo_register_single_source = flask_argo_register_single_source,
+    .argo_register_any_source = flask_argo_register_any_source,
+#endif
 };
 
 void __init flask_init(const void *policy_buffer, size_t policy_size)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 6fecfda..fb95c97 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -531,3 +531,14 @@ class version
 # Xen build id
     xen_build_id
 }
+
+# Class argo is used to describe the Argo interdomain communication system.
+class argo
+{
+    # Domain requesting registration of a communication ring
+    # to receive messages from a specific other domain.
+    register_single_source
+    # Domain requesting registration of a communication ring
+    # to receive messages from any other domain.
+    register_any_source
+}
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index cde4e1a..50ecbab 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -19,5 +19,6 @@ class event
 class grant
 class security
 class version
+class argo
 
 # FLASK
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 12/25] xsm, argo: XSM control for argo message send operation
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (10 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04  9:53   ` Paul Durrant
  2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Ross Philipson, Jason Andryuk, Daniel Smith,
	James McKenzie, Rich Persaud, Paul Durrant, Daniel De Graaf,
	Eric Chanudet

Default policy: allow.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/include/xsm/dummy.h             | 5 +++++
 xen/include/xsm/xsm.h               | 6 ++++++
 xen/xsm/dummy.c                     | 1 +
 xen/xsm/flask/hooks.c               | 7 +++++++
 xen/xsm/flask/policy/access_vectors | 2 ++
 5 files changed, 21 insertions(+)

diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 55113c3..85965fc 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -733,6 +733,11 @@ static XSM_INLINE int xsm_argo_register_any_source(struct domain *d,
     return strict ? -EPERM : 0;
 }
 
+static XSM_INLINE int xsm_argo_send(struct domain *d, struct domain *t)
+{
+    return 0;
+}
+
 #endif /* CONFIG_ARGO */
 
 #include <public/version.h>
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 65577fd..470e7c3 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -184,6 +184,7 @@ struct xsm_operations {
 #ifdef CONFIG_ARGO
     int (*argo_register_single_source) (struct domain *d, struct domain *t);
     int (*argo_register_any_source) (struct domain *d);
+    int (*argo_send) (struct domain *d, struct domain *t);
 #endif
 };
 
@@ -713,6 +714,11 @@ static inline xsm_argo_register_any_source(struct domain *d, bool strict)
     return xsm_ops->argo_register_any_source(d);
 }
 
+static inline int xsm_argo_send(struct domain *d, struct domain *t)
+{
+    return xsm_ops->argo_send(d, t);
+}
+
 #endif /* CONFIG_ARGO */
 
 #endif /* XSM_NO_WRAPPERS */
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index ed236b0..ffac774 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -155,5 +155,6 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
 #ifdef CONFIG_ARGO
     set_to_dummy_if_null(ops, argo_register_single_source);
     set_to_dummy_if_null(ops, argo_register_any_source);
+    set_to_dummy_if_null(ops, argo_send);
 #endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 3166561..7b4e5ff 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1730,6 +1730,12 @@ static int flask_argo_register_any_source(struct domain *d)
     return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
                         ARGO__REGISTER_ANY_SOURCE, NULL);
 }
+
+static int flask_argo_send(struct domain *d, struct domain *t)
+{
+    return domain_has_perm(d, t, SECCLASS_ARGO, ARGO__SEND);
+}
+
 #endif
 
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
@@ -1869,6 +1875,7 @@ static struct xsm_operations flask_ops = {
 #ifdef CONFIG_ARGO
     .argo_register_single_source = flask_argo_register_single_source,
     .argo_register_any_source = flask_argo_register_any_source,
+    .argo_send = flask_argo_send,
 #endif
 };
 
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index fb95c97..f6c5377 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -541,4 +541,6 @@ class argo
     # Domain requesting registration of a communication ring
     # to receive messages from any other domain.
     register_any_source
+    # Domain sending a message to another domain.
+    send
 }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 13/25] argo: implement the register op
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (11 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 12/25] xsm, argo: XSM control for argo message send operation Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-02 20:10   ` Julien Grall
                     ` (3 more replies)
  2018-12-01  1:32 ` [PATCH 14/25] argo: implement the unregister op Christopher Clark
                   ` (12 subsequent siblings)
  25 siblings, 4 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet, Roger Pau Monné

Used by a domain to register a region of memory for receiving messages from
either a specified other domain, or, if specifying a wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits, the
hypervisor will use this mapping to copy data from a sending domain into this
registered ring, making it accessible to the domain that registered the ring to
receive data.

In this code, the p2m type of the memory supplied by the guest for the ring
must be p2m_ram_rw, which is a conservative choice made to defer the need to
reason about the other p2m types with this commit.

argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit on
all architectures, to assist with avoiding the need to add a compat ABI.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c                  | 498 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/guest_access.h |   2 +
 xen/include/asm-x86/guest_access.h |   2 +
 xen/include/public/argo.h          |  64 +++++
 4 files changed, 566 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 2a95e09..f4e82cf 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -25,6 +25,7 @@
 #include <xen/guest_access.h>
 #include <xen/time.h>
 
+DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
 DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
 DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
 
@@ -98,6 +99,25 @@ struct argo_domain
 };
 
 /*
+ * Helper functions
+ */
+
+static inline uint16_t
+argo_hash_fn(const struct argo_ring_id *id)
+{
+    uint16_t ret;
+
+    ret = (uint16_t)(id->addr.port >> 16);
+    ret ^= (uint16_t)id->addr.port;
+    ret ^= id->addr.domain_id;
+    ret ^= id->partner;
+
+    ret &= (ARGO_HTABLE_SIZE - 1);
+
+    return ret;
+}
+
+/*
  * locks
  */
 
@@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info)
     }
 }
 
+/* caller must have L3 or W(L2) */
+static int
+argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
+                   uint8_t **page)
+{
+    if ( i >= ring_info->nmfns )
+    {
+        printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map page"
+               " %u of %u\n", ring_info->id.addr.domain_id,
+               ring_info->id.addr.port, ring_info->id.partner, ring_info,
+               i, ring_info->nmfns);
+        return -EFAULT;
+    }
+    ASSERT(ring_info->mfns);
+    ASSERT(ring_info->mfn_mapping);
+
+    if ( !ring_info->mfn_mapping[i] )
+    {
+        /*
+         * TODO:
+         * The first page of the ring contains the ring indices, so both read and
+         * write access to the page is required by the hypervisor, but read-access
+         * is not needed for this mapping for the remainder of the ring.
+         * Since this mapping will remain resident in Xen's address space for
+         * the lifetime of the ring, and following the principle of least privilege,
+         * it could be preferable to:
+         *  # add a XSM check to determine what policy is wanted here
+         *  # depending on the XSM query, optionally create this mapping as
+         *    _write-only_ on platforms that can support it.
+         *    (eg. Intel EPT/AMD NPT).
+         */
+        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]);
+
+        if ( !ring_info->mfn_mapping[i] )
+        {
+            printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map page"
+                   " %u of %u\n", ring_info->id.addr.domain_id,
+                   ring_info->id.addr.port, ring_info->id.partner, ring_info,
+                   i, ring_info->nmfns);
+            return -EFAULT;
+        }
+        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
+               mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
+    }
+
+    if ( page )
+        *page = ring_info->mfn_mapping[i];
+    return 0;
+}
+
+/* caller must have L3 or W(L2) */
+static int
+argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
+{
+    uint8_t *dst;
+    uint32_t *p;
+    int ret;
+
+    ret = argo_ring_map_page(ring_info, 0, &dst);
+    if ( ret )
+        return ret;
+
+    p = (uint32_t *)(dst + offsetof(argo_ring_t, tx_ptr));
+    write_atomic(p, tx_ptr);
+    mb();
+    return 0;
+}
+
 /*
  * pending
  */
@@ -231,6 +319,388 @@ argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
     xfree(ring_info);
 }
 
+/*
+ * ring
+ */
+
+static int
+argo_find_ring_mfn(struct domain *d, argo_pfn_t pfn, mfn_t *mfn)
+{
+    p2m_type_t p2mt;
+    int ret = 0;
+
+#ifdef CONFIG_X86
+    *mfn = get_gfn_unshare(d, pfn, &p2mt);
+#else
+    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
+#endif
+
+    if ( !mfn_valid(*mfn) )
+        ret = -EINVAL;
+#ifdef CONFIG_X86
+    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
+        ret = -EAGAIN;
+#endif
+    else if ( (p2mt != p2m_ram_rw) ||
+              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
+        ret = -EINVAL;
+
+#ifdef CONFIG_X86
+    put_gfn(d, pfn);
+#endif
+
+    return ret;
+}
+
+static int
+argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
+                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
+                    uint32_t len)
+{
+    int i;
+    int ret = 0;
+
+    if ( (npage << PAGE_SHIFT) < len )
+        return -EINVAL;
+
+    if ( ring_info->mfns )
+    {
+        /*
+         * Ring already existed. Check if it's the same ring,
+         * i.e. same number of pages and all translated gpfns still
+         * translating to the same mfns
+         */
+        if ( ring_info->npage != npage )
+            i = ring_info->nmfns + 1; /* forces re-register below */
+        else
+        {
+            for ( i = 0; i < ring_info->nmfns; i++ )
+            {
+                argo_pfn_t pfn;
+                mfn_t mfn;
+
+                ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
+                if ( ret )
+                    break;
+
+                ret = argo_find_ring_mfn(d, pfn, &mfn);
+                if ( ret )
+                    break;
+
+                if ( mfn_x(mfn) != mfn_x(ring_info->mfns[i]) )
+                    break;
+            }
+        }
+        if ( i != ring_info->nmfns )
+        {
+            printk(XENLOG_INFO "argo: vm%u re-registering existing argo ring"
+                   " (vm%u:%x vm%d), clearing MFN list\n",
+                   current->domain->domain_id, ring_info->id.addr.domain_id,
+                   ring_info->id.addr.port, ring_info->id.partner);
+
+            argo_ring_remove_mfns(d, ring_info);
+            ASSERT(!ring_info->mfns);
+        }
+    }
+
+    if ( !ring_info->mfns )
+    {
+        mfn_t *mfns;
+        uint8_t **mfn_mapping;
+
+        mfns = xmalloc_array(mfn_t, npage);
+        if ( !mfns )
+            return -ENOMEM;
+
+        for ( i = 0; i < npage; i++ )
+            mfns[i] = INVALID_MFN;
+
+        mfn_mapping = xmalloc_array(uint8_t *, npage);
+        if ( !mfn_mapping )
+        {
+            xfree(mfns);
+            return -ENOMEM;
+        }
+
+        ring_info->npage = npage;
+        ring_info->mfns = mfns;
+        ring_info->mfn_mapping = mfn_mapping;
+    }
+    ASSERT(ring_info->npage == npage);
+
+    if ( ring_info->nmfns == ring_info->npage )
+        return 0;
+
+    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
+    {
+        argo_pfn_t pfn;
+        mfn_t mfn;
+
+        ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
+        if ( ret )
+            break;
+
+        ret = argo_find_ring_mfn(d, pfn, &mfn);
+        if ( ret )
+        {
+            printk(XENLOG_ERR "argo: vm%u passed invalid gpfn %"PRI_xen_pfn
+                   " ring (vm%u:%x vm%d) %p seq %d of %d\n",
+                   d->domain_id, pfn, ring_info->id.addr.domain_id,
+                   ring_info->id.addr.port, ring_info->id.partner,
+                   ring_info, i, ring_info->npage);
+            break;
+        }
+
+        ring_info->mfns[i] = mfn;
+        ring_info->nmfns = i + 1;
+
+        argo_dprintk("%d: %"PRI_xen_pfn" -> %"PRI_mfn"\n",
+               i, pfn, mfn_x(ring_info->mfns[i]));
+
+        ring_info->mfn_mapping[i] = NULL;
+    }
+
+    if ( ret )
+        argo_ring_remove_mfns(d, ring_info);
+    else
+    {
+        ASSERT(ring_info->nmfns == ring_info->npage);
+
+        printk(XENLOG_ERR "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p"
+               " npage %d nmfns %d\n", current->domain->domain_id,
+               ring_info->id.addr.domain_id, ring_info->id.addr.port,
+               ring_info->id.partner, ring_info, ring_info->mfn_mapping,
+               ring_info->npage, ring_info->nmfns);
+    }
+    return ret;
+}
+
+static struct argo_ring_info *
+argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
+{
+    uint16_t hash;
+    struct hlist_node *node;
+    struct argo_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    hash = argo_hash_fn(id);
+
+    argo_dprintk("d->argo=%p, d->argo->ring_hash[%d]=%p id=%p\n",
+                 d->argo, hash, d->argo->ring_hash[hash].first, id);
+    argo_dprintk("id.addr.port=%d id.addr.domain=vm%u"
+                 " id.addr.partner=vm%d\n",
+                 id->addr.port, id->addr.domain_id, id->partner);
+
+    hlist_for_each_entry(ring_info, node, &d->argo->ring_hash[hash], node)
+    {
+        argo_ring_id_t *cmpid = &ring_info->id;
+
+        if ( cmpid->addr.port == id->addr.port &&
+             cmpid->addr.domain_id == id->addr.domain_id &&
+             cmpid->partner == id->partner )
+        {
+            argo_dprintk("ring_info=%p\n", ring_info);
+            return ring_info;
+        }
+    }
+    argo_dprintk("no ring_info found\n");
+
+    return NULL;
+}
+
+static long
+argo_register_ring(struct domain *d,
+                   XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd,
+                   XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, uint32_t npage,
+                   bool fail_exist)
+{
+    struct argo_ring ring;
+    struct argo_ring_info *ring_info;
+    int ret = 0;
+    bool update_tx_ptr = 0;
+    uint64_t dst_domain_cookie = 0;
+
+    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
+        return -EINVAL;
+
+    read_lock (&argo_lock);
+
+    do {
+        if ( !d->argo )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        if ( copy_from_guest(&ring, ring_hnd, 1) )
+        {
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ring.magic != ARGO_RING_MAGIC )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( (ring.len < (sizeof(struct argo_ring_message_header)
+                          + ARGO_ROUNDUP(1) + ARGO_ROUNDUP(1)))   ||
+             (ARGO_ROUNDUP(ring.len) != ring.len) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( ring.len > ARGO_MAX_RING_SIZE )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( ring.id.partner == ARGO_DOMID_ANY )
+        {
+            ret = xsm_argo_register_any_source(d, argo_mac_bootparam_enforcing);
+            if ( ret )
+                break;
+        }
+        else
+        {
+            struct domain *dst_d = get_domain_by_id(ring.id.partner);
+            if ( !dst_d )
+            {
+                argo_dprintk("!dst_d, ECONNREFUSED\n");
+                ret = -ECONNREFUSED;
+                break;
+            }
+
+            ret = xsm_argo_register_single_source(d, dst_d);
+            if ( ret )
+            {
+                put_domain(dst_d);
+                break;
+            }
+
+            if ( !dst_d->argo )
+            {
+                argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
+                ret = -ECONNREFUSED;
+                put_domain(dst_d);
+                break;
+            }
+
+            dst_domain_cookie = dst_d->argo->domain_cookie;
+
+            put_domain(dst_d);
+        }
+
+        ring.id.addr.domain_id = d->domain_id;
+        if ( copy_field_to_guest(ring_hnd, &ring, id) )
+        {
+            ret = -EFAULT;
+            break;
+        }
+
+        /*
+         * no need for a lock yet, because only we know about this
+         * set the tx pointer if it looks bogus (we don't reset it
+         * because this might be a re-register after S4)
+         */
+
+        if ( ring.tx_ptr >= ring.len ||
+             ARGO_ROUNDUP(ring.tx_ptr) != ring.tx_ptr )
+        {
+            /*
+             * Since the ring is a mess, attempt to flush the contents of it
+             * here by setting the tx_ptr to the next aligned message slot past
+             * the latest rx_ptr we have observed. Handle ring wrap correctly.
+             */
+            ring.tx_ptr = ARGO_ROUNDUP(ring.rx_ptr);
+
+            if ( ring.tx_ptr >= ring.len )
+                ring.tx_ptr = 0;
+
+            /* ring.tx_ptr will be written back to the guest ring below. */
+            update_tx_ptr = 1;
+        }
+
+        /* W(L2) protects all the elements of the domain's ring_info */
+        write_lock(&d->argo->lock);
+
+        do {
+            ring_info = argo_ring_find_info(d, &ring.id);
+
+            if ( !ring_info )
+            {
+                uint16_t hash;
+
+                ring_info = xmalloc(struct argo_ring_info);
+                if ( !ring_info )
+                {
+                    ret = -ENOMEM;
+                    break;
+                }
+
+                spin_lock_init(&ring_info->lock);
+
+                ring_info->mfns = NULL;
+                ring_info->npage = 0;
+                ring_info->mfn_mapping = NULL;
+                ring_info->len = 0;
+                ring_info->nmfns = 0;
+                ring_info->tx_ptr = 0;
+                ring_info->partner_cookie = dst_domain_cookie;
+
+                ring_info->id = ring.id;
+                INIT_HLIST_HEAD(&ring_info->pending);
+
+                hash = argo_hash_fn(&ring_info->id);
+                hlist_add_head(&ring_info->node, &d->argo->ring_hash[hash]);
+
+                printk(XENLOG_INFO "argo: vm%u registering ring (vm%u:%x vm%d)\n",
+                       current->domain->domain_id, ring.id.addr.domain_id,
+                       ring.id.addr.port, ring.id.partner);
+            }
+            else
+            {
+                /*
+                 * If the caller specified that the ring must not already exist,
+                 * fail at attempt to add a completed ring which already exists.
+                 */
+                if ( fail_exist && ring_info->len )
+                {
+                    ret = -EEXIST;
+                    break;
+                }
+
+                printk(XENLOG_INFO
+                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
+                     current->domain->domain_id, ring.id.addr.domain_id,
+                     ring.id.addr.port, ring.id.partner);
+            }
+
+            /* Since we hold W(L2), there is no need to take L3 here */
+            ring_info->tx_ptr = ring.tx_ptr;
+
+            ret = argo_find_ring_mfns(d, ring_info, npage, pfn_hnd, ring.len);
+            if ( !ret )
+                ret = update_tx_ptr ? argo_update_tx_ptr(ring_info, ring.tx_ptr)
+                                    : argo_ring_map_page(ring_info, 0, NULL);
+            if ( !ret )
+                ring_info->len = ring.len;
+
+        } while ( 0 );
+
+        write_unlock(&d->argo->lock);
+
+    } while ( 0 );
+
+    read_unlock(&argo_lock);
+
+    return ret;
+}
+
 long
 do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
                    XEN_GUEST_HANDLE_PARAM(void) arg2,
@@ -253,6 +723,34 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
 
     switch (cmd)
     {
+    case ARGO_MESSAGE_OP_register_ring:
+    {
+        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
+            guest_handle_cast(arg1, argo_ring_t);
+        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
+            guest_handle_cast(arg2, argo_pfn_t);
+        uint32_t npage = arg3;
+        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
+
+        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+            break;
+        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
+            break;
+        /* arg4: reserve currently-undefined bits, require zero.  */
+        if ( unlikely(arg4 & ~ARGO_REGISTER_FLAG_MASK) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = argo_register_ring(d, ring_hnd, pfn_hnd, npage, fail_exist);
+        break;
+    }
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
index 1137c54..98006f8 100644
--- a/xen/include/asm-arm/guest_access.h
+++ b/xen/include/asm-arm/guest_access.h
@@ -34,6 +34,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
 
+#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
+
 /* Offset the given guest handle into the array it refers to. */
 #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
 #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index 9391cd3..e9d25d6 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -50,6 +50,8 @@
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
 
+#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
+
 /* Offset the given guest handle into the array it refers to. */
 #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
 #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 20dabc0..5ad8e2b 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -21,6 +21,20 @@
 
 #include "xen.h"
 
+#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
+
+#define ARGO_DOMID_ANY           DOMID_INVALID
+
+/*
+ * The maximum size of an Argo ring is defined to be: 16GB
+ *  -- which is 0x1000000 or 16777216 bytes.
+ * A byte index into the ring is at most 24 bits.
+ */
+#define ARGO_MAX_RING_SIZE  (16777216ULL)
+
+/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
+typedef uint64_t argo_pfn_t;
+
 typedef struct argo_addr
 {
     uint32_t port;
@@ -52,4 +66,54 @@ typedef struct argo_ring
 #endif
 } argo_ring_t;
 
+/*
+ * Messages on the ring are padded to 128 bits
+ * Len here refers to the exact length of the data not including the
+ * 128 bit header. The message uses
+ * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
+ * Using typeof(a) make clear that this does not truncate any high-order bits.
+ */
+#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
+
+struct argo_ring_message_header
+{
+    uint32_t len;
+    argo_addr_t source;
+    uint32_t message_type;
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    uint8_t data[];
+#elif defined(__GNUC__)
+    uint8_t data[0];
+#endif
+};
+
+/*
+ * Hypercall operations
+ */
+
+/*
+ * ARGO_MESSAGE_OP_register_ring
+ *
+ * Register a ring using the indicated memory.
+ * Also used to reregister an existing ring (eg. after resume from sleep).
+ *
+ * arg1: XEN_GUEST_HANDLE(argo_ring_t)
+ * arg2: XEN_GUEST_HANDLE(argo_pfn_t)
+ * arg3: uint32_t npages
+ * arg4: uint32_t flags
+ */
+#define ARGO_MESSAGE_OP_register_ring     1
+
+/* Register op flags */
+/*
+ * Fail exist:
+ * If set, reject attempts to (re)register an existing established ring.
+ * If clear, reregistration occurs if the ring exists, with the new ring
+ * taking the place of the old, preserving tx_ptr if it remains valid.
+ */
+#define ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
+
+/* Mask for all defined flags */
+#define ARGO_REGISTER_FLAG_MASK ARGO_REGISTER_FLAG_FAIL_EXIST
+
 #endif
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 14/25] argo: implement the unregister op
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (12 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04 11:10   ` Paul Durrant
  2018-12-12  9:51   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 15/25] argo: implement the sendv op Christopher Clark
                   ` (11 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Takes a single argument: a handle to the registered ring.

The ring's entry is removed from the hashtable of registered rings;
any entries for pending notifications are removed; and the ring is
unmapped from Xen's address space.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c         | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h |  9 +++++++
 2 files changed, 71 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index f4e82cf..387e650 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -510,6 +510,59 @@ argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
 }
 
 static long
+argo_unregister_ring(struct domain *d,
+                     XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd)
+{
+    struct argo_ring ring;
+    struct argo_ring_info *ring_info;
+    int ret = 0;
+
+    read_lock(&argo_lock);
+
+    do {
+        if ( !d->argo )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        ret = copy_from_guest_errno(&ring, ring_hnd, 1);
+        if ( ret )
+            break;
+
+        if ( ring.magic != ARGO_RING_MAGIC )
+        {
+            argo_dprintk(
+                "ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx), EINVAL\n",
+                ring.magic, ARGO_RING_MAGIC);
+            ret = -EINVAL;
+            break;
+        }
+
+        ring.id.addr.domain_id = d->domain_id;
+
+        write_lock(&d->argo->lock);
+
+        ring_info = argo_ring_find_info(d, &ring.id);
+        if ( ring_info )
+            argo_ring_remove_info(d, ring_info);
+
+        write_unlock(&d->argo->lock);
+
+        if ( !ring_info )
+        {
+            argo_dprintk("ENOENT\n");
+            ret = -ENOENT;
+            break;
+        }
+
+    } while ( 0 );
+
+    read_unlock(&argo_lock);
+    return ret;
+}
+
+static long
 argo_register_ring(struct domain *d,
                    XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd,
                    XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, uint32_t npage,
@@ -751,6 +804,15 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
         rc = argo_register_ring(d, ring_hnd, pfn_hnd, npage, fail_exist);
         break;
     }
+    case ARGO_MESSAGE_OP_unregister_ring:
+    {
+        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
+            guest_handle_cast(arg1, argo_ring_t);
+        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
+            break;
+        rc = argo_unregister_ring(d, ring_hnd);
+        break;
+    }
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 5ad8e2b..6cf10a8 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -116,4 +116,13 @@ struct argo_ring_message_header
 /* Mask for all defined flags */
 #define ARGO_REGISTER_FLAG_MASK ARGO_REGISTER_FLAG_FAIL_EXIST
 
+/*
+ * ARGO_MESSAGE_OP_unregister_ring
+ *
+ * Unregister a previously-registered ring, ending communication.
+ *
+ * arg1: XEN_GUEST_HANDLE(argo_ring_t)
+ */
+#define ARGO_MESSAGE_OP_unregister_ring     2
+
 #endif
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 15/25] argo: implement the sendv op
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (13 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 14/25] argo: implement the unregister op Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-04 11:22   ` Paul Durrant
  2018-12-12 11:52   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 16/25] argo: implement the notify op Christopher Clark
                   ` (10 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

sendv operation is invoked to perform a synchronous send of buffers
contained in iovs to a remote domain's registered ring.

It takes:
 * A destination address (domid, port) for the ring to send to.
   It performs a most-specific match lookup, to allow for wildcard.
 * A source address, used to inform the destination of where to reply.
 * The address of an array of iovs containing the data to send
 * .. and the length of that array of iovs
 * and a 32-bit message type, available to communicate message context
   data (eg. kernel-to-kernel, separate from the application data).

If insufficient space exists in the destination ring, it will return -EAGAIN
and Xen will notify the caller when sufficient space becomes available.

Accesses to the ring indices are appropriately atomic. The rings are
mapped into Xen's private address space to write as needed and the
mappings are retained for later use.

When locating the destination ring, a check is performed via a cookie
installed at ring registration time, to ensure that the source domain
is the same as it was when the ring was registered.

Fixed-size types are used in some areas within this code where caution
around avoiding integer overflow is important.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c         | 528 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h |  59 ++++++
 2 files changed, 587 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 387e650..0c3972c 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -24,10 +24,13 @@
 #include <xen/domain_page.h>
 #include <xen/guest_access.h>
 #include <xen/time.h>
+#include <xsm/xsm.h>
 
 DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
 DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
+DEFINE_XEN_GUEST_HANDLE(argo_send_addr_t);
 DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
+DEFINE_XEN_GUEST_HANDLE(uint8_t);
 
 /* Xen command line option to enable argo */
 static bool __read_mostly opt_argo_enabled = 0;
@@ -166,6 +169,21 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
 #endif
 
 /*
+ * Event channel
+ */
+
+static void
+argo_signal_domain(struct domain *d)
+{
+    argo_dprintk("signalling domid:%d\n", d->domain_id);
+
+    if ( !d->argo ) /* This can happen if the domain is being destroyed */
+        return;
+
+    evtchn_send(d, d->argo->evtchn_port);
+}
+
+/*
  * ring buffer
  */
 
@@ -259,6 +277,333 @@ argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
     return 0;
 }
 
+static int
+argo_memcpy_to_guest_ring(struct argo_ring_info *ring_info,
+                          uint32_t offset,
+                          const void *src,
+                          XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                          uint32_t len)
+{
+    int page = offset >> PAGE_SHIFT;
+    uint8_t *dst;
+    int ret;
+    unsigned int src_offset = 0;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= ~PAGE_MASK;
+
+    if ( (len > ARGO_MAX_RING_SIZE) || (offset > ARGO_MAX_RING_SIZE) )
+        return -EFAULT;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        ret = argo_ring_map_page(ring_info, page, &dst);
+        if ( ret )
+            return ret;
+
+        if ( src )
+        {
+            memcpy(dst + offset, src + src_offset, PAGE_SIZE - offset);
+            src_offset += (PAGE_SIZE - offset);
+        }
+        else
+        {
+            ret = copy_from_guest_errno(dst + offset, src_hnd,
+                                        PAGE_SIZE - offset);
+            if ( ret )
+                return ret;
+
+            guest_handle_add_offset(src_hnd, PAGE_SIZE - offset);
+        }
+
+        page++;
+        len -= PAGE_SIZE - offset;
+        offset = 0;
+    }
+
+    ret = argo_ring_map_page(ring_info, page, &dst);
+    if ( ret )
+    {
+        argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page"
+               " %d of %d\n", ring_info->id.addr.domain_id,
+               ring_info->id.addr.port, ring_info->id.partner, ring_info,
+               page, ring_info->nmfns);
+        return ret;
+    }
+
+    if ( src )
+        memcpy(dst + offset, src + src_offset, len);
+    else
+        ret = copy_from_guest_errno(dst + offset, src_hnd, len);
+
+    return ret;
+}
+
+static int
+argo_ringbuf_get_rx_ptr(struct argo_ring_info *ring_info, uint32_t *rx_ptr)
+{
+    uint8_t *src;
+    argo_ring_t *ringp;
+    int ret;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( !ring_info->nmfns || ring_info->nmfns < ring_info->npage )
+        return -EINVAL;
+
+    ret = argo_ring_map_page(ring_info, 0, &src);
+    if ( ret )
+        return ret;
+
+    ringp = (argo_ring_t *)src;
+
+    *rx_ptr = read_atomic(&ringp->rx_ptr);
+
+    return 0;
+}
+
+/*
+ * argo_sanitize_ring creates a modified copy of the ring pointers
+ * where the rx_ptr is rounded up to ensure it is aligned, and then
+ * ring wrap is handled. Simplifies safe use of the rx_ptr for
+ * available space calculation.
+ */
+static void
+argo_sanitize_ring(argo_ring_t *ring, const struct argo_ring_info *ring_info)
+{
+    uint32_t rx_ptr = ring->rx_ptr;
+
+    ring->tx_ptr = ring_info->tx_ptr;
+    ring->len = ring_info->len;
+
+    rx_ptr = ARGO_ROUNDUP(rx_ptr);
+    if ( rx_ptr >= ring_info->len )
+        rx_ptr = 0;
+
+    ring->rx_ptr = rx_ptr;
+}
+
+/*
+ * argo_iov_count returns its count on success via an out variable
+ * to avoid potential for a negative return value to be used incorrectly
+ * (eg. coerced into an unsigned variable resulting in a large incorrect value)
+ */
+static int
+argo_iov_count(XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
+               uint32_t *count)
+{
+    argo_iov_t iov;
+    uint32_t sum_iov_lens = 0;
+    int ret;
+
+    if ( niov > ARGO_MAXIOV )
+        return -EINVAL;
+
+    while ( niov-- )
+    {
+        ret = copy_from_guest_errno(&iov, iovs, 1);
+        if ( ret )
+            return ret;
+
+        /* check each to protect sum against integer overflow */
+        if ( iov.iov_len > ARGO_MAX_RING_SIZE )
+            return -EINVAL;
+
+        sum_iov_lens += iov.iov_len;
+
+        /*
+         * Again protect sum from integer overflow
+         * and ensure total msg size will be within bounds.
+         */
+        if ( sum_iov_lens > ARGO_MAX_MSG_SIZE )
+            return -EINVAL;
+
+        guest_handle_add_offset(iovs, 1);
+    }
+
+    *count = sum_iov_lens;
+    return 0;
+}
+
+static int
+argo_ringbuf_insert(struct domain *d,
+                    struct argo_ring_info *ring_info,
+                    const struct argo_ring_id *src_id,
+                    XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
+                    uint32_t message_type, unsigned long *out_len)
+{
+    argo_ring_t ring;
+    struct argo_ring_message_header mh = { 0 };
+    int32_t sp;
+    int32_t ret = 0;
+    uint32_t len;
+    uint32_t iov_len;
+    uint32_t sum_iov_len = 0;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( (ret = argo_iov_count(iovs, niov, &len)) )
+        return ret;
+
+    if ( ((ARGO_ROUNDUP(len) + sizeof (struct argo_ring_message_header) ) >=
+          ring_info->len)
+         || (len > ARGO_MAX_MSG_SIZE) )
+        return -EMSGSIZE;
+
+    do {
+        ret =  argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr);
+        if ( ret )
+            break;
+
+        argo_sanitize_ring(&ring, ring_info);
+
+        argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d"
+                     " ring_info->tx_ptr=%d\n",
+                     ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
+
+        if ( ring.rx_ptr == ring.tx_ptr )
+            sp = ring_info->len;
+        else
+        {
+            sp = ring.rx_ptr - ring.tx_ptr;
+            if ( sp < 0 )
+                sp += ring.len;
+        }
+
+        if ( (ARGO_ROUNDUP(len) + sizeof(struct argo_ring_message_header)) >= sp )
+        {
+            argo_dprintk("EAGAIN\n");
+            ret = -EAGAIN;
+            break;
+        }
+
+        mh.len = len + sizeof(struct argo_ring_message_header);
+        mh.source.port = src_id->addr.port;
+        mh.source.domain_id = src_id->addr.domain_id;
+        mh.message_type = message_type;
+
+        /*
+         * For this copy to the guest ring, tx_ptr is always 16-byte aligned
+         * and the message header is 16 bytes long.
+         */
+        BUILD_BUG_ON(sizeof(struct argo_ring_message_header) != ARGO_ROUNDUP(1));
+
+        if ( (ret = argo_memcpy_to_guest_ring(ring_info,
+                                              ring.tx_ptr + sizeof(argo_ring_t),
+                                              &mh,
+                                              XEN_GUEST_HANDLE_NULL(uint8_t),
+                                              sizeof(mh))) )
+            break;
+
+        ring.tx_ptr += sizeof(mh);
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        while ( niov-- )
+        {
+            XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd;
+            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
+            argo_iov_t iov;
+
+            ret = copy_from_guest_errno(&iov, iovs, 1);
+            if ( ret )
+                break;
+
+            bufp_hnd = guest_handle_from_ptr((uintptr_t)iov.iov_base, uint8_t);
+            buf_hnd = guest_handle_from_param(bufp_hnd, uint8_t);
+            iov_len = iov.iov_len;
+
+            if ( !iov_len )
+            {
+                printk(XENLOG_ERR "argo: iov.iov_len=0 iov.iov_base=%"
+                       PRIx64" ring (vm%u:%x vm%d)\n",
+                       iov.iov_base, ring_info->id.addr.domain_id,
+                       ring_info->id.addr.port, ring_info->id.partner);
+
+                guest_handle_add_offset(iovs, 1);
+                continue;
+            }
+
+            if ( iov_len > ARGO_MAX_MSG_SIZE )
+            {
+                ret = -EINVAL;
+                break;
+            }
+
+            sum_iov_len += iov_len;
+            if ( sum_iov_len > len )
+            {
+                ret = -EINVAL;
+                break;
+            }
+
+            if ( unlikely(!guest_handle_okay(buf_hnd, iov_len)) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            sp = ring.len - ring.tx_ptr;
+
+            if ( iov_len > sp )
+            {
+                ret = argo_memcpy_to_guest_ring(ring_info,
+                        ring.tx_ptr + sizeof(argo_ring_t),
+                        NULL, buf_hnd, sp);
+                if ( ret )
+                    break;
+
+                ring.tx_ptr = 0;
+                iov_len -= sp;
+                guest_handle_add_offset(buf_hnd, sp);
+            }
+
+            ret = argo_memcpy_to_guest_ring(ring_info,
+                        ring.tx_ptr + sizeof(argo_ring_t),
+                        NULL, buf_hnd, iov_len);
+            if ( ret )
+                break;
+
+            ring.tx_ptr += iov_len;
+
+            if ( ring.tx_ptr == ring_info->len )
+                ring.tx_ptr = 0;
+
+            guest_handle_add_offset(iovs, 1);
+        }
+
+        if ( ret )
+            break;
+
+        ring.tx_ptr = ARGO_ROUNDUP(ring.tx_ptr);
+
+        if ( ring.tx_ptr >= ring_info->len )
+            ring.tx_ptr -= ring_info->len;
+
+        mb();
+        ring_info->tx_ptr = ring.tx_ptr;
+        if ( (ret = argo_update_tx_ptr(ring_info, ring.tx_ptr)) )
+            break;
+
+    } while ( 0 );
+
+    /*
+     * At this point it is possible to unmap the ring_info, ie:
+     *   argo_ring_unmap(ring_info);
+     * but performance should be improved by not doing so, and retaining
+     * the mapping.
+     * An XSM policy control over level of confidentiality required
+     * versus performance cost could be added to decide that here.
+     * See the similar comment in argo_ring_map_page re: write-only mappings.
+     */
+
+    if ( !ret )
+        *out_len = len;
+
+    return ret;
+}
+
 /*
  * pending
  */
@@ -282,6 +627,47 @@ argo_pending_remove_all(struct argo_ring_info *ring_info)
     }
 }
 
+static int
+argo_pending_queue(struct argo_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct argo_pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    ent = xmalloc(struct argo_pending_ent);
+
+    if ( !ent )
+        return -ENOMEM;
+
+    ent->len = len;
+    ent->id = src_id;
+
+    hlist_add_head(&ent->node, &ring_info->pending);
+
+    return 0;
+}
+
+static int
+argo_pending_requeue(struct argo_ring_info *ring_info, domid_t src_id, int len)
+{
+    struct hlist_node *node;
+    struct argo_pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    hlist_for_each_entry(ent, node, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id )
+        {
+            if ( ent->len < len )
+                ent->len = len;
+            return 0;
+        }
+    }
+
+    return argo_pending_queue(ring_info, src_id, len);
+}
+
 static void argo_ring_remove_mfns(const struct domain *d,
                                   struct argo_ring_info *ring_info)
 {
@@ -509,6 +895,28 @@ argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
     return NULL;
 }
 
+static struct argo_ring_info *
+argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
+                             domid_t partner_id, uint64_t partner_cookie)
+{
+    argo_ring_id_t id;
+    struct argo_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    id.addr.port = port;
+    id.addr.domain_id = d->domain_id;
+    id.partner = partner_id;
+
+    ring_info = argo_ring_find_info(d, &id);
+    if ( ring_info && (partner_cookie == ring_info->partner_cookie) )
+        return ring_info;
+
+    id.partner = ARGO_DOMID_ANY;
+
+    return argo_ring_find_info(d, &id);
+}
+
 static long
 argo_unregister_ring(struct domain *d,
                      XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd)
@@ -754,6 +1162,103 @@ argo_register_ring(struct domain *d,
     return ret;
 }
 
+/*
+ * io
+ */
+
+static long
+argo_sendv(struct domain *src_d, const argo_addr_t *src_addr,
+           const argo_addr_t *dst_addr,
+           XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint32_t niov,
+           uint32_t message_type)
+{
+    struct domain *dst_d = NULL;
+    struct argo_ring_id src_id;
+    struct argo_ring_info *ring_info;
+    int ret = 0;
+    unsigned long len = 0;
+
+    ASSERT(src_d->domain_id == src_addr->domain_id);
+
+    read_lock(&argo_lock);
+
+    do {
+        if ( !src_d->argo )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        src_id.addr.pad = 0;
+        src_id.addr.port = src_addr->port;
+        src_id.addr.domain_id = src_d->domain_id;
+        src_id.partner = dst_addr->domain_id;
+
+        dst_d = get_domain_by_id(dst_addr->domain_id);
+        if ( !dst_d || !dst_d->argo )
+        {
+            argo_dprintk("!dst_d, ECONNREFUSED\n");
+            ret = -ECONNREFUSED;
+            break;
+        }
+
+        ret = xsm_argo_send(src_d, dst_d);
+        if ( ret )
+        {
+            printk(XENLOG_ERR "argo: XSM REJECTED %i -> %i\n",
+                   src_addr->domain_id, dst_addr->domain_id);
+            break;
+        }
+
+        read_lock(&dst_d->argo->lock);
+
+        do {
+            ring_info = argo_ring_find_info_by_match(dst_d, dst_addr->port,
+                                                 src_addr->domain_id,
+                                                 src_d->argo->domain_cookie);
+            if ( !ring_info )
+            {
+                printk(XENLOG_ERR "argo: vm%u connection refused, "
+                       "src (vm%u:%x) dst (vm%u:%x)\n",
+                       current->domain->domain_id,
+                       src_id.addr.domain_id, src_id.addr.port,
+                       dst_addr->domain_id, dst_addr->port);
+
+                ret = -ECONNREFUSED;
+                break;
+            }
+
+            spin_lock(&ring_info->lock);
+
+            ret = argo_ringbuf_insert(dst_d, ring_info, &src_id,
+                                      iovs, niov, message_type, &len);
+            if ( ret == -EAGAIN )
+            {
+                argo_dprintk("argo_ringbuf_sendv failed, EAGAIN\n");
+                /* requeue to issue a notification when space is there */
+                if ( argo_pending_requeue(ring_info, src_addr->domain_id, len) )
+                     ret = -ENOMEM;
+            }
+
+            spin_unlock(&ring_info->lock);
+
+            if ( ret >= 0 )
+                argo_signal_domain(dst_d);
+
+        } while ( 0 );
+
+        read_unlock(&dst_d->argo->lock);
+
+    } while ( 0 );
+
+    if ( dst_d )
+        put_domain(dst_d);
+
+    read_unlock(&argo_lock);
+
+    return ( ret < 0 ) ? ret : len;
+}
+
 long
 do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
                    XEN_GUEST_HANDLE_PARAM(void) arg2,
@@ -813,6 +1318,29 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
         rc = argo_unregister_ring(d, ring_hnd);
         break;
     }
+    case ARGO_MESSAGE_OP_sendv:
+    {
+        argo_send_addr_t send_addr;
+        uint32_t niov = arg3;
+        uint32_t message_type = arg4;
+
+        XEN_GUEST_HANDLE_PARAM(argo_send_addr_t) send_addr_hnd =
+            guest_handle_cast(arg1, argo_send_addr_t);
+        XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs =
+            guest_handle_cast(arg2, argo_iov_t);
+
+        if ( unlikely(!guest_handle_okay(send_addr_hnd, 1)) )
+            break;
+        rc = copy_from_guest_errno(&send_addr, send_addr_hnd, 1);
+        if ( rc )
+            break;
+
+        send_addr.src.domain_id = d->domain_id;
+
+        rc = argo_sendv(d, &send_addr.src, &send_addr.dst,
+                        iovs, niov, message_type);
+        break;
+    }
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 6cf10a8..123efc5 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -32,6 +32,28 @@
  */
 #define ARGO_MAX_RING_SIZE  (16777216ULL)
 
+/*
+ * ARGO_MAXIOV : maximum number of iovs accepted in a single sendv.
+ * Rationale for the value:
+ * The Linux argo driver never passes more than two iovs.
+ * Linux defines UIO_MAXIOV as 1024.
+ * POSIX mandates at least 16 -- not that this is a POSIX API of course.
+ *
+ * Limit the total amount of data posted in a single argo operation to
+ * no more than 2^31 bytes to reduce risk of integer overflow defects.
+ * Each argo iov can hold ~ 2^24 bytes, so set ARGO_MAXIOV to 2^(31-24),
+ * minus one to enable simple efficient bounds checking via masking: 127.
+*/
+#define ARGO_MAXIOV          127U
+
+typedef struct argo_iov
+{
+    uint64_t iov_base;
+    uint32_t iov_len;
+    uint32_t pad;
+} argo_iov_t;
+DEFINE_XEN_GUEST_HANDLE(argo_iov_t);
+
 /* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
 typedef uint64_t argo_pfn_t;
 
@@ -42,6 +64,12 @@ typedef struct argo_addr
     uint16_t pad;
 } argo_addr_t;
 
+typedef struct argo_send_addr
+{
+    argo_addr_t src;
+    argo_addr_t dst;
+} argo_send_addr_t;
+
 typedef struct argo_ring_id
 {
     struct argo_addr addr;
@@ -125,4 +153,35 @@ struct argo_ring_message_header
  */
 #define ARGO_MESSAGE_OP_unregister_ring     2
 
+/*
+ * ARGO_MESSAGE_OP_sendv
+ *
+ * Send a list of buffers contained in iovs.
+ *
+ * The send address struct specifies the source and destination addresses
+ * for the message being sent, which are used to find the destination ring:
+ * Xen first looks for a most-specific match with a registered ring with
+ *  (id.addr == dst) and (id.partner == sending_domain) ;
+ * if that fails, it then looks for a wildcard match (aka multicast receiver)
+ * where (id.addr == dst) and (id.partner == DOMID_ANY).
+ *
+ * For each iov entry, send iov_len bytes from iov_base to the destination ring.
+ * If insufficient space exists in the destination ring, it will return -EAGAIN
+ * and Xen will notify the caller when sufficient space becomes available.
+ *
+ * The message type is a 32-bit data field available to communicate message
+ * context data (eg. kernel-to-kernel, rather than application layer).
+ *
+ * arg1: XEN_GUEST_HANDLE(argo_send_addr_t) source and dest addresses
+ * arg2: XEN_GUEST_HANDLE(argo_iov_t) iovs
+ * arg3: uint32_t niov
+ * arg4: uint32_t message type
+ */
+#define ARGO_MESSAGE_OP_sendv               5
+
+/* The maximum size of a guest message that may be sent on an Argo ring. */
+#define ARGO_MAX_MSG_SIZE ((ARGO_MAX_RING_SIZE) - \
+        (sizeof(struct argo_ring_message_header)) - \
+        ARGO_ROUNDUP(1))
+
 #endif
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 16/25] argo: implement the notify op
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (14 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 15/25] argo: implement the sendv op Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-13 14:06   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 17/25] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Queries for data about space availability in registered rings and
causes notification to be sent when space has become available.

The hypercall op populates a supplied data structure with information about
ring state, and if insufficent space is currently available in a given ring,
the hypervisor will record the domain's expressed interest and notify it
when it observes that space has become available.

Checks for free space occur when this notify op is invoked, so it may be
intentionally invoked with no data structure to populate (ie. NULL argument),
to trigger such a check and consequent notifications.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c         | 326 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h |  62 +++++++++
 2 files changed, 388 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 0c3972c..a171191 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -30,6 +30,8 @@ DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
 DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
 DEFINE_XEN_GUEST_HANDLE(argo_send_addr_t);
 DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
+DEFINE_XEN_GUEST_HANDLE(argo_ring_data_t);
+DEFINE_XEN_GUEST_HANDLE(argo_ring_data_ent_t);
 DEFINE_XEN_GUEST_HANDLE(uint8_t);
 
 /* Xen command line option to enable argo */
@@ -120,6 +122,10 @@ argo_hash_fn(const struct argo_ring_id *id)
     return ret;
 }
 
+static struct argo_ring_info *
+argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
+                             domid_t partner_id, uint64_t partner_cookie);
+
 /*
  * locks
  */
@@ -183,6 +189,19 @@ argo_signal_domain(struct domain *d)
     evtchn_send(d, d->argo->evtchn_port);
 }
 
+static void
+argo_signal_domid(domid_t id)
+{
+    struct domain *d = get_domain_by_id(id);
+
+    if ( !d )
+        return;
+
+    argo_signal_domain(d);
+
+    put_domain(d);
+}
+
 /*
  * ring buffer
  */
@@ -363,6 +382,39 @@ argo_ringbuf_get_rx_ptr(struct argo_ring_info *ring_info, uint32_t *rx_ptr)
     return 0;
 }
 
+static uint32_t
+argo_ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
+{
+    argo_ring_t ring;
+    int32_t ret;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    ring.len = ring_info->len;
+    if ( !ring.len )
+        return 0;
+
+    ring.tx_ptr = ring_info->tx_ptr;
+
+    if ( argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr) )
+        return 0;
+
+    argo_dprintk("argo_ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
+                 ring.tx_ptr, ring.rx_ptr);
+
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return ring.len - sizeof(struct argo_ring_message_header);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += ring.len;
+
+    ret -= sizeof(struct argo_ring_message_header);
+    ret -= ARGO_ROUNDUP(1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
 /*
  * argo_sanitize_ring creates a modified copy of the ring pointers
  * where the rx_ptr is rounded up to ensure it is aligned, and then
@@ -627,6 +679,43 @@ argo_pending_remove_all(struct argo_ring_info *ring_info)
     }
 }
 
+static void
+argo_pending_notify(struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct argo_pending_ent *pending_ent;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
+    {
+        hlist_del(&pending_ent->node);
+        argo_signal_domid(pending_ent->id);
+        xfree(pending_ent);
+    }
+}
+
+static void
+argo_pending_find(const struct domain *d, struct argo_ring_info *ring_info,
+                  uint32_t payload_space, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct argo_pending_ent *ent;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    spin_lock(&ring_info->lock);
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( payload_space >= ent->len )
+        {
+            hlist_del(&ent->node);
+            hlist_add_head(&ent->node, to_notify);
+        }
+    }
+    spin_unlock(&ring_info->lock);
+}
+
 static int
 argo_pending_queue(struct argo_ring_info *ring_info, domid_t src_id, int len)
 {
@@ -668,6 +757,24 @@ argo_pending_requeue(struct argo_ring_info *ring_info, domid_t src_id, int len)
     return argo_pending_queue(ring_info, src_id, len);
 }
 
+static void
+argo_pending_cancel(struct argo_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct argo_pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->id == src_id)
+        {
+            hlist_del(&ent->node);
+            xfree(ent);
+        }
+    }
+}
+
 static void argo_ring_remove_mfns(const struct domain *d,
                                   struct argo_ring_info *ring_info)
 {
@@ -705,6 +812,107 @@ argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
     xfree(ring_info);
 }
 
+/*ring data*/
+
+static int
+argo_fill_ring_data(struct domain *src_d,
+                    XEN_GUEST_HANDLE(argo_ring_data_ent_t) data_ent_hnd)
+{
+    argo_ring_data_ent_t ent;
+    domid_t src_id;
+    struct domain *dst_d;
+    struct argo_ring_info *ring_info;
+    int ret;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    ret = copy_from_guest_errno(&ent, data_ent_hnd, 1);
+    if ( ret )
+        return ret;
+
+    argo_dprintk("argo_fill_ring_data: ent.ring.domain=%u,ent.ring.port=%u\n",
+                 ent.ring.domain_id, ent.ring.port);
+
+    src_id = src_d->domain_id;
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id(ent.ring.domain_id);
+
+    if ( dst_d && dst_d->argo )
+    {
+        read_lock(&dst_d->argo->lock);
+
+        ring_info = argo_ring_find_info_by_match(dst_d, ent.ring.port, src_id,
+                                                 src_d->argo->domain_cookie);
+
+        if ( ring_info )
+        {
+            uint32_t space_avail;
+
+            ent.flags |= ARGO_RING_DATA_F_EXISTS;
+            ent.max_message_size =
+                ring_info->len - sizeof(struct argo_ring_message_header) -
+                ARGO_ROUNDUP(1);
+
+            spin_lock(&ring_info->lock);
+
+            space_avail = argo_ringbuf_payload_space(dst_d, ring_info);
+
+            argo_dprintk("argo_fill_ring_data: port=%d space_avail=%d"
+                         " space_wanted=%d\n",
+                         ring_info->id.addr.port, space_avail,
+                         ent.space_required);
+
+            if ( space_avail >= ent.space_required )
+            {
+                argo_pending_cancel(ring_info, src_id);
+                ent.flags |= ARGO_RING_DATA_F_SUFFICIENT;
+            }
+            else
+            {
+                argo_pending_requeue(ring_info, src_id, ent.space_required);
+                ent.flags |= ARGO_RING_DATA_F_PENDING;
+            }
+
+            spin_unlock(&ring_info->lock);
+
+            if ( space_avail == ent.max_message_size )
+                ent.flags |= ARGO_RING_DATA_F_EMPTY;
+
+        }
+        read_unlock(&dst_d->argo->lock);
+    }
+
+    if ( dst_d )
+        put_domain(dst_d);
+
+    ret = copy_field_to_guest_errno(data_ent_hnd, &ent, flags);
+    if ( ret )
+        return ret;
+    ret = copy_field_to_guest_errno(data_ent_hnd, &ent, max_message_size);
+    if ( ret )
+        return ret;
+
+    return 0;
+}
+
+static int
+argo_fill_ring_data_array(struct domain *d, int nent,
+                          XEN_GUEST_HANDLE(argo_ring_data_ent_t) data_ent_hnd)
+{
+    int ret = 0;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    while ( !ret && nent-- )
+    {
+        ret = argo_fill_ring_data(d, data_ent_hnd);
+        guest_handle_add_offset(data_ent_hnd, 1);
+    }
+
+    return ret;
+}
+
 /*
  * ring
  */
@@ -1166,6 +1374,116 @@ argo_register_ring(struct domain *d,
  * io
  */
 
+static void
+argo_notify_ring(struct domain *d, struct argo_ring_info *ring_info,
+                struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    ASSERT(rw_is_locked(&argo_lock));
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    spin_lock(&ring_info->lock);
+
+    if ( ring_info->len )
+        space = argo_ringbuf_payload_space(d, ring_info);
+    else
+        space = 0;
+
+    spin_unlock(&ring_info->lock);
+
+    if ( space )
+        argo_pending_find(d, ring_info, space, to_notify);
+}
+
+static void
+argo_notify_check_pending(struct domain *d)
+{
+    int i;
+    HLIST_HEAD(to_notify);
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    read_lock(&d->argo->lock);
+
+    mb();
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; i++ )
+    {
+        struct hlist_node *node, *next;
+        struct argo_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node, next,
+                                  &d->argo->ring_hash[i], node)
+        {
+            argo_notify_ring(d, ring_info, &to_notify);
+        }
+    }
+    read_unlock(&d->argo->lock);
+
+    if ( !hlist_empty(&to_notify) )
+        argo_pending_notify(&to_notify);
+}
+
+static long
+argo_notify(struct domain *d,
+            XEN_GUEST_HANDLE_PARAM(argo_ring_data_t) ring_data_hnd)
+{
+    argo_ring_data_t ring_data;
+    int ret = 0;
+
+    read_lock(&argo_lock);
+
+    if ( !d->argo )
+    {
+        read_unlock(&argo_lock);
+        argo_dprintk("!d->argo, ENODEV\n");
+        return -ENODEV;
+    }
+
+    argo_notify_check_pending(d);
+
+    do {
+        if ( !guest_handle_is_null(ring_data_hnd) )
+        {
+            /* Quick sanity check on ring_data_hnd */
+            ret = copy_field_from_guest_errno(&ring_data, ring_data_hnd, magic);
+            if ( ret )
+                break;
+
+            if ( ring_data.magic != ARGO_RING_DATA_MAGIC )
+            {
+                argo_dprintk(
+                    "ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx), EINVAL\n",
+                    ring_data.magic, ARGO_RING_MAGIC);
+                ret = -EINVAL;
+                break;
+            }
+
+            ret = copy_from_guest_errno(&ring_data, ring_data_hnd, 1);
+            if ( ret )
+                break;
+
+            {
+                /*
+                 * This is a guest pointer passed as a field in a struct
+                 * so XEN_GUEST_HANDLE is used.
+                 */
+                XEN_GUEST_HANDLE(argo_ring_data_ent_t) ring_data_ent_hnd;
+                ring_data_ent_hnd = guest_handle_for_field(ring_data_hnd,
+                                                           argo_ring_data_ent_t,
+                                                           data[0]);
+                ret = argo_fill_ring_data_array(d, ring_data.nent,
+                                                ring_data_ent_hnd);
+            }
+        }
+    } while ( 0 );
+
+    read_unlock(&argo_lock);
+
+    return ret;
+}
+
 static long
 argo_sendv(struct domain *src_d, const argo_addr_t *src_addr,
            const argo_addr_t *dst_addr,
@@ -1341,6 +1659,14 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
                         iovs, niov, message_type);
         break;
     }
+    case ARGO_MESSAGE_OP_notify:
+    {
+        XEN_GUEST_HANDLE_PARAM(argo_ring_data_t) ring_data_hnd =
+                   guest_handle_cast(arg1, argo_ring_data_t);
+
+        rc = argo_notify(d, ring_data_hnd);
+        break;
+    }
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 123efc5..42f551f 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -22,6 +22,7 @@
 #include "xen.h"
 
 #define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
+#define ARGO_RING_DATA_MAGIC 0xcce4d30fbc82e92aULL
 
 #define ARGO_DOMID_ANY           DOMID_INVALID
 
@@ -103,6 +104,40 @@ typedef struct argo_ring
  */
 #define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
 
+/*
+ * Notify flags
+ */
+/* Ring is empty */
+#define ARGO_RING_DATA_F_EMPTY       (1U << 0)
+/* Ring exists */
+#define ARGO_RING_DATA_F_EXISTS      (1U << 1)
+/* Pending interrupt exists. Do not rely on this field - for profiling only */
+#define ARGO_RING_DATA_F_PENDING     (1U << 2)
+/* Sufficient space to queue space_required bytes exists */
+#define ARGO_RING_DATA_F_SUFFICIENT  (1U << 3)
+
+typedef struct argo_ring_data_ent
+{
+    argo_addr_t ring;
+    uint16_t flags;
+    uint16_t pad;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} argo_ring_data_ent_t;
+
+typedef struct argo_ring_data
+{
+    uint64_t magic;
+    uint32_t nent;
+    uint32_t pad;
+    uint64_t reserved[4];
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    argo_ring_data_ent_t data[];
+#elif defined(__GNUC__)
+    argo_ring_data_ent_t data[0];
+#endif
+} argo_ring_data_t;
+
 struct argo_ring_message_header
 {
     uint32_t len;
@@ -179,6 +214,33 @@ struct argo_ring_message_header
  */
 #define ARGO_MESSAGE_OP_sendv               5
 
+/*
+ * ARGO_MESSAGE_OP_notify
+ *
+ * Asks Xen for information about other rings in the system.
+ *
+ * ent->ring is the argo_addr_t of the ring you want information on.
+ * Uses the same ring matching rules as ARGO_MESSAGE_OP_sendv.
+ *
+ * ent->space_required : if this field is not null then Xen will check
+ * that there is space in the destination ring for this many bytes of payload.
+ * If sufficient space is available, it will set ARGO_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending notification for that ent->ring; otherwise it
+ * will schedule a notification event and the flag will not be set.
+ *
+ * These flags are set by Xen when notify replies:
+ * ARGO_RING_DATA_F_EMPTY       ring is empty
+ * ARGO_RING_DATA_F_PENDING     notify event is pending - * don't rely on this *
+ * ARGO_RING_DATA_F_SUFFICIENT  sufficient space for space_required is there
+ * ARGO_RING_DATA_F_EXISTS      ring exists
+ *
+ * arg1: XEN_GUEST_HANDLE(argo_ring_data_t) ring_data (may be NULL)
+ * arg2: NULL
+ * arg3: 0 (ZERO)
+ * arg4: 0 (ZERO)
+ */
+#define ARGO_MESSAGE_OP_notify              4
+
 /* The maximum size of a guest message that may be sent on an Argo ring. */
 #define ARGO_MAX_MSG_SIZE ((ARGO_MAX_RING_SIZE) - \
         (sizeof(struct argo_ring_message_header)) - \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 17/25] xsm, argo: XSM control for any access to argo by a domain
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (15 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 16/25] argo: implement the notify op Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-01  1:32 ` [PATCH 18/25] argo: limit the max number of rings that a domain may register Christopher Clark
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, Eric Chanudet

Will inhibit initialization of the domain's argo data structure to
prevent receiving any messages or notifications and access to any of
the argo hypercall operations.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c                   | 4 ++--
 xen/include/xsm/dummy.h             | 5 +++++
 xen/include/xsm/xsm.h               | 6 ++++++
 xen/xsm/dummy.c                     | 1 +
 xen/xsm/flask/hooks.c               | 7 +++++++
 xen/xsm/flask/policy/access_vectors | 3 +++
 6 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index a171191..ca48032 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -1588,7 +1588,7 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
     argo_dprintk("->do_argo_message_op(%d,%p,%p,%d,%d)\n", cmd,
                  (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
 
-    if ( unlikely(!opt_argo_enabled) )
+    if ( unlikely(!opt_argo_enabled || xsm_argo_enable(d)) )
     {
         rc = -ENOSYS;
         argo_dprintk("<-do_argo_message_op()=%ld\n", rc);
@@ -1685,7 +1685,7 @@ argo_init(struct domain *d)
     int i;
     int rc;
 
-    if ( !opt_argo_enabled )
+    if ( !opt_argo_enabled || xsm_argo_enable(d) )
     {
         argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
         return 0;
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 85965fc..1ad52c0 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -721,6 +721,11 @@ static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 #endif /* CONFIG_X86 */
 
 #ifdef CONFIG_ARGO
+static XSM_INLINE int xsm_argo_enable(struct domain *d)
+{
+    return 0;
+}
+
 static XSM_INLINE int xsm_argo_register_single_source(struct domain *d,
                                                       struct domain *t)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 470e7c3..70d7e86 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -182,6 +182,7 @@ struct xsm_operations {
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
 #ifdef CONFIG_ARGO
+    int (*argo_enable) (struct domain *d);
     int (*argo_register_single_source) (struct domain *d, struct domain *t);
     int (*argo_register_any_source) (struct domain *d);
     int (*argo_send) (struct domain *d, struct domain *t);
@@ -704,6 +705,11 @@ static inline int xsm_domain_resource_map(xsm_default_t def, struct domain *d)
 }
 
 #ifdef CONFIG_ARGO
+static inline xsm_argo_enable(struct domain *d)
+{
+    return xsm_ops->argo_enable(d);
+}
+
 static inline xsm_argo_register_single_source(struct domain *d, struct domain *t)
 {
     return xsm_ops->argo_register_single_source(d, t);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index ffac774..1fe0e74 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -153,6 +153,7 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
 #ifdef CONFIG_ARGO
+    set_to_dummy_if_null(ops, argo_enable);
     set_to_dummy_if_null(ops, argo_register_single_source);
     set_to_dummy_if_null(ops, argo_register_any_source);
     set_to_dummy_if_null(ops, argo_send);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 7b4e5ff..897bc94 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1718,6 +1718,12 @@ static int flask_domain_resource_map(struct domain *d)
 }
 
 #ifdef CONFIG_ARGO
+static int flask_argo_enable(struct domain *d)
+{
+    return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
+                        ARGO__ENABLE, NULL);
+}
+
 static int flask_argo_register_single_source(struct domain *d,
                                              struct domain *t)
 {
@@ -1873,6 +1879,7 @@ static struct xsm_operations flask_ops = {
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
 #ifdef CONFIG_ARGO
+    .argo_enable = flask_argo_enable,
     .argo_register_single_source = flask_argo_register_single_source,
     .argo_register_any_source = flask_argo_register_any_source,
     .argo_send = flask_argo_send,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index f6c5377..e00448b 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -535,6 +535,9 @@ class version
 # Class argo is used to describe the Argo interdomain communication system.
 class argo
 {
+    # Enable initialization of a domain's argo subsystem and
+    # permission to access the argo hypercall operations.
+    enable
     # Domain requesting registration of a communication ring
     # to receive messages from a specific other domain.
     register_single_source
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 18/25] argo: limit the max number of rings that a domain may register.
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (16 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 17/25] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-13 14:08   ` Jan Beulich
  2018-12-01  1:32 ` [PATCH 19/25] argo: limit the max number of notify requests in a single operation Christopher Clark
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Very basic implementation: a fixed limit of 128.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index ca48032..cc908f4 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -26,6 +26,8 @@
 #include <xen/time.h>
 #include <xsm/xsm.h>
 
+#define ARGO_MAX_RINGS_PER_DOMAIN       128U
+
 DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
 DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
 DEFINE_XEN_GUEST_HANDLE(argo_send_addr_t);
@@ -101,6 +103,8 @@ struct argo_domain
     struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
     /* id cookie, written only at init, so readable with R(L1) */
     uint64_t domain_cookie;
+    /* counter of rings registered by this domain, protected by L2 */
+    uint32_t ring_count;
 };
 
 /*
@@ -1161,7 +1165,10 @@ argo_unregister_ring(struct domain *d,
 
         ring_info = argo_ring_find_info(d, &ring.id);
         if ( ring_info )
+        {
             argo_ring_remove_info(d, ring_info);
+            d->argo->ring_count--;
+        }
 
         write_unlock(&d->argo->lock);
 
@@ -1298,6 +1305,12 @@ argo_register_ring(struct domain *d,
         write_lock(&d->argo->lock);
 
         do {
+            if ( d->argo->ring_count >= ARGO_MAX_RINGS_PER_DOMAIN )
+            {
+                ret = -ENOSPC;
+                break;
+            }
+
             ring_info = argo_ring_find_info(d, &ring.id);
 
             if ( !ring_info )
@@ -1357,7 +1370,10 @@ argo_register_ring(struct domain *d,
                 ret = update_tx_ptr ? argo_update_tx_ptr(ring_info, ring.tx_ptr)
                                     : argo_ring_map_page(ring_info, 0, NULL);
             if ( !ret )
+            {
                 ring_info->len = ring.len;
+                d->argo->ring_count++;
+            }
 
         } while ( 0 );
 
@@ -1698,6 +1714,7 @@ argo_init(struct domain *d)
         return -ENOMEM;
 
     rwlock_init(&argo->lock);
+    argo->ring_count = 0;
 
     for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
         INIT_HLIST_HEAD(&argo->ring_hash[i]);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 19/25] argo: limit the max number of notify requests in a single operation.
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (17 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 18/25] argo: limit the max number of rings that a domain may register Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-01  1:32 ` [PATCH 20/25] argo, xsm: notify: don't describe rings that cannot be sent to Christopher Clark
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Very basic implementation: a fixed limit of 256.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index cc908f4..0858fb2 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -27,6 +27,7 @@
 #include <xsm/xsm.h>
 
 #define ARGO_MAX_RINGS_PER_DOMAIN       128U
+#define ARGO_MAX_NOTIFY_COUNT           256U
 
 DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
 DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
@@ -1480,6 +1481,12 @@ argo_notify(struct domain *d,
             if ( ret )
                 break;
 
+            if ( ring_data.nent > ARGO_MAX_NOTIFY_COUNT )
+            {
+                ret = -EACCES;
+                break;
+            }
+
             {
                 /*
                  * This is a guest pointer passed as a field in a struct
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 20/25] argo, xsm: notify: don't describe rings that cannot be sent to
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (18 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 19/25] argo: limit the max number of notify requests in a single operation Christopher Clark
@ 2018-12-01  1:32 ` Christopher Clark
  2018-12-01  1:33 ` [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func Christopher Clark
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, Eric Chanudet

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 0858fb2..39778fd 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -845,6 +845,17 @@ argo_fill_ring_data(struct domain *src_d,
 
     if ( dst_d && dst_d->argo )
     {
+        /*
+         * Don't supply information about rings that a guest is not
+         * allowed to send to.
+         */
+        ret = xsm_argo_send(src_d, dst_d);
+        if ( ret )
+        {
+            put_domain(dst_d);
+            return ret;
+        }
+
         read_lock(&dst_d->argo->lock);
 
         ring_info = argo_ring_find_info_by_match(dst_d, ent.ring.port, src_id,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (19 preceding siblings ...)
  2018-12-01  1:32 ` [PATCH 20/25] argo, xsm: notify: don't describe rings that cannot be sent to Christopher Clark
@ 2018-12-01  1:33 ` Christopher Clark
  2018-12-13 14:10   ` Jan Beulich
  2018-12-01  1:33 ` [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen Christopher Clark
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

This is out of an abundance of caution, since this is a very basic hash
function, chosen more for its bucket distribution properties to cluster related
rings rather than for cryptographic strength or any uniformness of output,
and it operates upon values supplied by the guest just before being used as an
array index.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 39778fd..fa969ab 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -23,6 +23,7 @@
 #include <xen/event.h>
 #include <xen/domain_page.h>
 #include <xen/guest_access.h>
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <xsm/xsm.h>
 
@@ -1094,7 +1095,7 @@ argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
 
     ASSERT(rw_is_locked(&d->argo->lock));
 
-    hash = argo_hash_fn(id);
+    hash = array_index_nospec(argo_hash_fn(id), ARGO_HTABLE_SIZE);
 
     argo_dprintk("d->argo=%p, d->argo->ring_hash[%d]=%p id=%p\n",
                  d->argo, hash, d->argo->ring_hash[hash].first, id);
@@ -1349,7 +1350,8 @@ argo_register_ring(struct domain *d,
                 ring_info->id = ring.id;
                 INIT_HLIST_HEAD(&ring_info->pending);
 
-                hash = argo_hash_fn(&ring_info->id);
+                hash = array_index_nospec(argo_hash_fn(&ring_info->id),
+                                          ARGO_HTABLE_SIZE);
                 hlist_add_head(&ring_info->node, &d->argo->ring_hash[hash]);
 
                 printk(XENLOG_INFO "argo: vm%u registering ring (vm%u:%x vm%d)\n",
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (20 preceding siblings ...)
  2018-12-01  1:33 ` [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func Christopher Clark
@ 2018-12-01  1:33 ` Christopher Clark
  2018-12-13 14:12   ` Jan Beulich
  2018-12-01  1:33 ` [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ Christopher Clark
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

To be used by Argo for delivery of notifications to some guests.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/event_channel.c | 2 +-
 xen/include/xen/event.h    | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 3dfde83..eec3acf 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -769,7 +769,7 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
     spin_unlock_irqrestore(&v->virq_lock, flags);
 }
 
-static void send_guest_global_virq(struct domain *d, uint32_t virq)
+void send_guest_global_virq(struct domain *d, uint32_t virq)
 {
     unsigned long flags;
     int port;
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 18c3738..7463927 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -29,6 +29,13 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq);
 void send_global_virq(uint32_t virq);
 
 /*
+ * send_guest_global_virq:
+ *  @d:        Domain to which VIRQ should be sent
+ *  @virq:     Virtual IRQ number (VIRQ_*), must be global
+ */
+void send_guest_global_virq(struct domain *d, uint32_t virq);
+
+/*
  * sent_global_virq_handler: Set a global VIRQ handler.
  *  @d:        New target domain for this VIRQ
  *  @virq:     Virtual IRQ number (VIRQ_*), must be global
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (21 preceding siblings ...)
  2018-12-01  1:33 ` [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen Christopher Clark
@ 2018-12-01  1:33 ` Christopher Clark
  2018-12-02 19:55   ` Julien Grall
  2018-12-13 14:16   ` Jan Beulich
  2018-12-01  1:33 ` [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume Christopher Clark
                   ` (2 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

* x86 PV domains are notified via event channel.

PV guests are known to have the event channel software present in the guest
kernel, so it is fine to depend on and use it.

* x86 HVM domains and all ARM domains are notified via VIRQ.

The intent is to remove the requirement for event channel software to be
installed within these guests in order to use Argo. VIRQ signalling is also
the method that has been in use for the longest period with this hypercall
in both XenClient and OpenXT.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c         | 39 ++++++++++++++++++++++++++++++++++++---
 xen/include/public/argo.h |  3 +++
 xen/include/public/xen.h  |  2 +-
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index fa969ab..9b12e6b 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -181,18 +181,51 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
 #endif
 
 /*
- * Event channel
+ * Signalling
  */
 
+static unsigned int argo_signal_method(const struct domain *d)
+{
+    unsigned int method;
+#ifdef CONFIG_X86
+    if ( is_hvm_domain(d) )
+        method = ARGO_SIGNAL_METHOD_VIRQ;
+    else
+        method = ARGO_SIGNAL_METHOD_EVTCHN;
+#else
+    method = ARGO_SIGNAL_METHOD_VIRQ;
+#endif
+    return method;
+}
+
 static void
 argo_signal_domain(struct domain *d)
 {
-    argo_dprintk("signalling domid:%d\n", d->domain_id);
+    unsigned int method = argo_signal_method(d);
 
     if ( !d->argo ) /* This can happen if the domain is being destroyed */
         return;
 
-    evtchn_send(d, d->argo->evtchn_port);
+    argo_dprintk("signalling domid:%d via method:%u\n", d->domain_id, method);
+
+    switch ( method )
+    {
+        case ARGO_SIGNAL_METHOD_EVTCHN:
+        {
+            evtchn_send(d, d->argo->evtchn_port);
+            break;
+        }
+        case ARGO_SIGNAL_METHOD_VIRQ:
+        {
+            send_guest_global_virq(d, VIRQ_ARGO);
+            break;
+        }
+        default:
+        {
+            BUG();
+            break;
+        }
+    }
 }
 
 static void
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 42f551f..710baa6 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -150,6 +150,9 @@ struct argo_ring_message_header
 #endif
 };
 
+#define ARGO_SIGNAL_METHOD_EVTCHN      1
+#define ARGO_SIGNAL_METHOD_VIRQ        2
+
 /*
  * Hypercall operations
  */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 8dc032b..8a64875 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -178,7 +178,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_CON_RING   8  /* G. (DOM0) Bytes received on console            */
 #define VIRQ_PCPU_STATE 9  /* G. (DOM0) PCPU state changed                   */
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occurred          */
-#define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
+#define VIRQ_ARGO       11 /* G. Argo interdomain message notification       */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
 #define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (22 preceding siblings ...)
  2018-12-01  1:33 ` [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ Christopher Clark
@ 2018-12-01  1:33 ` Christopher Clark
  2018-12-13 14:26   ` Jan Beulich
  2018-12-01  1:33 ` [PATCH 25/25] argo: implement the get_config op to query notification config Christopher Clark
  2018-12-03 16:49 ` [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Chris Patterson
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

so that the guest may re-register the rings on resume with current mappings.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c      | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++
 xen/common/domain.c    |  9 +++++++
 xen/include/xen/argo.h |  2 ++
 3 files changed, 80 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 9b12e6b..98de9a9 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -840,6 +840,16 @@ static void argo_ring_remove_mfns(const struct domain *d,
 }
 
 static void
+argo_ring_reset(struct domain *d, struct argo_ring_info *ring_info)
+{
+    ASSERT(rw_is_write_locked(&d->argo->lock));
+
+    argo_ring_remove_mfns(d, ring_info);
+    ring_info->len = 0;
+    ring_info->tx_ptr = 0;
+}
+
+static void
 argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
 {
     ASSERT(rw_is_write_locked(&d->argo->lock));
@@ -1832,3 +1842,62 @@ argo_destroy(struct domain *d)
      * domain.
      */
 }
+
+void
+argo_shutdown_for_suspend(struct domain *d)
+{
+    int i;
+
+    if ( !d )
+        return;
+
+    if ( get_domain(d) )
+    {
+        read_lock(&argo_lock);
+
+        if ( d->argo )
+        {
+            write_lock(&d->argo->lock);
+
+            for ( i = 0; i < ARGO_HTABLE_SIZE; i++ )
+            {
+                struct hlist_node *node, *next;
+                struct argo_ring_info *ring_info;
+                hlist_for_each_entry_safe(ring_info, node,
+                                          next, &d->argo->ring_hash[i], node)
+                argo_ring_reset(d, ring_info);
+            }
+
+            write_unlock(&d->argo->lock);
+        }
+
+        read_unlock(&argo_lock);
+
+        put_domain(d);
+    }
+}
+
+void
+argo_resume(struct domain *d)
+{
+    bool send_wakeup;
+
+    if ( !d )
+        return;
+
+    if ( !get_domain(d) )
+        return;
+
+    read_lock(&argo_lock);
+
+    read_lock(&d->argo->lock);
+    send_wakeup = ( d->argo->ring_count > 0 );
+    read_unlock(&d->argo->lock);
+
+    if ( send_wakeup )
+        argo_signal_domain(d);
+
+    read_unlock(&argo_lock);
+
+    put_domain(d);
+}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index eadea4d..176dc74 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -88,6 +88,11 @@ static void __domain_finalise_shutdown(struct domain *d)
         if ( !v->paused_for_shutdown )
             return;
 
+#ifdef CONFIG_ARGO
+    if ( d->shutdown_code == SHUTDOWN_suspend )
+        argo_shutdown_for_suspend(d);
+#endif
+
     d->is_shut_down = 1;
     if ( (d->shutdown_code == SHUTDOWN_suspend) && d->suspend_evtchn )
         evtchn_send(d, d->suspend_evtchn);
@@ -856,6 +861,10 @@ void domain_resume(struct domain *d)
     spin_unlock(&d->shutdown_lock);
 
     domain_unpause(d);
+
+#ifdef CONFIG_ARGO
+    argo_resume(d);
+#endif
 }
 
 int vcpu_start_shutdown_deferral(struct vcpu *v)
diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
index c037de6..b466158 100644
--- a/xen/include/xen/argo.h
+++ b/xen/include/xen/argo.h
@@ -26,5 +26,7 @@ struct argo_domain;
 
 int argo_init(struct domain *d);
 void argo_destroy(struct domain *d);
+void argo_shutdown_for_suspend(struct domain *d);
+void argo_resume(struct domain *d);
 
 #endif
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 25/25] argo: implement the get_config op to query notification config
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (23 preceding siblings ...)
  2018-12-01  1:33 ` [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume Christopher Clark
@ 2018-12-01  1:33 ` Christopher Clark
  2018-12-13 14:32   ` Jan Beulich
  2018-12-03 16:49 ` [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Chris Patterson
  25 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-01  1:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Eric Chanudet

Needed by a guest to obtain the evtchn port to use, if notifications are via
event channel, so: this operation will return the current notification method
active for the domain, and method-specific configuration data:

    * event channel: port number
    * VIRQ: virq number

Return structure has reserved space intentionally to for future alternative
notification mechanism to return data about both an IRQ number and bound VCPU.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c         | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h | 28 +++++++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 98de9a9..f6cc764 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -1656,6 +1656,46 @@ argo_sendv(struct domain *src_d, const argo_addr_t *src_addr,
     return ( ret < 0 ) ? ret : len;
 }
 
+static void
+argo_get_config(struct domain *d, argo_get_config_t *get_config)
+{
+    unsigned int method = argo_signal_method(d);
+
+    get_config->signal_method = method;
+
+    switch ( method )
+    {
+        case ARGO_SIGNAL_METHOD_EVTCHN:
+        {
+            read_lock(&argo_lock);
+            read_lock(&d->argo->lock);
+
+            get_config->signal.evtchn = d->argo->evtchn_port;
+
+            read_unlock(&d->argo->lock);
+            read_unlock(&argo_lock);
+
+            argo_dprintk("signal for dom:%d evtchn %u\n", d->domain_id,
+                         get_config->signal.evtchn);
+
+            break;
+        }
+        case ARGO_SIGNAL_METHOD_VIRQ:
+        {
+            get_config->signal.virq = VIRQ_ARGO;
+
+            argo_dprintk("signal for dom:%d virq %u\n", d->domain_id,
+                         get_config->signal.virq);
+            break;
+        }
+        default:
+        {
+            BUG();
+            break;
+        }
+    }
+}
+
 long
 do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
                    XEN_GUEST_HANDLE_PARAM(void) arg2,
@@ -1746,6 +1786,23 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
         rc = argo_notify(d, ring_data_hnd);
         break;
     }
+    case ARGO_MESSAGE_OP_get_config:
+    {
+        XEN_GUEST_HANDLE_PARAM(argo_get_config_t) get_config_hnd =
+                   guest_handle_cast(arg1, argo_get_config_t);
+        argo_get_config_t get_config;
+
+        if ( unlikely(!guest_handle_okay(get_config_hnd, 1)) )
+            break;
+
+        argo_get_config(d, &get_config);
+
+        if ( __copy_to_guest(get_config_hnd, &get_config, 1) )
+            break;
+
+        rc = 0;
+        break;
+    }
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 710baa6..1e78ea0 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -20,6 +20,7 @@
 #define __XEN_PUBLIC_ARGO_H__
 
 #include "xen.h"
+#include "event_channel.h"
 
 #define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
 #define ARGO_RING_DATA_MAGIC 0xcce4d30fbc82e92aULL
@@ -153,6 +154,18 @@ struct argo_ring_message_header
 #define ARGO_SIGNAL_METHOD_EVTCHN      1
 #define ARGO_SIGNAL_METHOD_VIRQ        2
 
+typedef struct argo_get_config
+{
+    uint32_t signal_method;
+    union
+    {
+        evtchn_port_t evtchn;
+        uint32_t virq;
+    } signal;
+    uint32_t reserved;
+} argo_get_config_t;
+DEFINE_XEN_GUEST_HANDLE(argo_get_config_t);
+
 /*
  * Hypercall operations
  */
@@ -244,6 +257,21 @@ struct argo_ring_message_header
  */
 #define ARGO_MESSAGE_OP_notify              4
 
+/*
+ * ARGO_MESSAGE_OP_get_config
+ *
+ * Queries Xen for argo configuration values.
+ *
+ * Used by a guest to obtain the signal method in use for Argo notifications
+ * and the event channel port or isa irq in use.
+ *
+ * arg1: XEN_GUEST_HANDLE(argo_get_config_t)
+ * arg2: NULL
+ * arg3: 0 (ZERO)
+ * arg4: 0 (ZERO)
+ */
+#define ARGO_MESSAGE_OP_get_config          6
+
 /* The maximum size of a guest message that may be sent on an Argo ring. */
 #define ARGO_MAX_MSG_SIZE ((ARGO_MAX_RING_SIZE) - \
         (sizeof(struct argo_ring_message_header)) - \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-01  1:33 ` [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ Christopher Clark
@ 2018-12-02 19:55   ` Julien Grall
  2018-12-04  9:03     ` Christopher Clark
  2018-12-13 14:16   ` Jan Beulich
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2018-12-02 19:55 UTC (permalink / raw)
  To: Christopher Clark, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, nd,
	Eric Chanudet

Hi,

On 01/12/2018 01:33, Christopher Clark wrote:
> * x86 PV domains are notified via event channel.
> 
> PV guests are known to have the event channel software present in the guest
> kernel, so it is fine to depend on and use it.
> 
> * x86 HVM domains and all ARM domains are notified via VIRQ.
> 
> The intent is to remove the requirement for event channel software to be
> installed within these guests in order to use Argo. VIRQ signalling is also
> the method that has been in use for the longest period with this hypercall
> in both XenClient and OpenXT.

I am a bit confused. vIRQs are based on event channel, so how do you 
remove the requirement on event channel?

Cheers,

-- 
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
@ 2018-12-02 20:10   ` Julien Grall
  2018-12-04  9:08     ` Christopher Clark
  2018-12-04 10:57   ` Paul Durrant
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2018-12-02 20:10 UTC (permalink / raw)
  To: Christopher Clark, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, nd,
	Eric Chanudet, Roger Pau Monné



On 01/12/2018 01:32, Christopher Clark wrote:
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 20dabc0..5ad8e2b 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -21,6 +21,20 @@
>   
>   #include "xen.h"
>   
> +#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
> +
> +#define ARGO_DOMID_ANY           DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x1000000 or 16777216 bytes.
> + * A byte index into the ring is at most 24 bits.
> + */
> +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> +
> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> +typedef uint64_t argo_pfn_t;

As you always use 64-bit, can we just use an address? This would make 
the ABI agnostic to the hypervisor page granularity.

Cheers,

-- 
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  2018-12-01  1:32 ` [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
@ 2018-12-03 15:42   ` Jan Beulich
  2018-12-04  9:10     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-03 15:42 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html 
> describes these codes thus:
>     EMSGSIZE     : "Message too large"
>     ECONNREFUSED : "Connection refused".

If you were to go solely by what POSIX mandates to have, more
additions would be necessary afaict. We had limited ourselves to
some basic set, so selective additions need further rationale put
here. The more that for both added error codes the use case in
the hypervisor isn't immediately obvious.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo
  2018-12-01  1:32 ` [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
@ 2018-12-03 15:51   ` Jan Beulich
  2018-12-04  9:12     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-03 15:51 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -200,6 +200,26 @@ config LATE_HWDOM
>  
>  	  If unsure, say N.
>  
> +config ARGO
> +    bool "Argo: hypervisor-mediated interdomain communication"
> +    default y

Until our policy changes as to wider configurability, options not
depending on EXPERT should be accompanied by a reason. I
also don't think that we want this to default to enabled from
the very beginning. Finally please correct indentation.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 04/25] argo: define argo_dprintk for subsystem debugging
  2018-12-01  1:32 ` [PATCH 04/25] argo: define argo_dprintk for subsystem debugging Christopher Clark
@ 2018-12-03 15:59   ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-03 15:59 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> A convenience for working on development of the argo subsystem:
> toggling a local #define variable turns on just the debug messages
> in this subsystem.

I'm afraid I don't see the #define variable to toggle. I assume it's
ARGO_DEBUG, but there's no #define line for it anywhere here.

>   printk("argo: " format, ## args )

Was this line misplaced here? It doesn't look related to the rest of the
description.

> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -19,6 +19,19 @@
>  #include <xen/errno.h>
>  #include <xen/guest_access.h>
>  
> +/*
> + * Debugs
> + */
> +
> +#ifdef ARGO_DEBUG
> +#define argo_dprintk(format, args...)            \
> +    do {                                         \
> +        printk("argo: " format, ## args );       \
> +    } while ( 1 == 0 )

What wrong with

#define argo_dprintk(format, args...) printk("argo: " format, ## args )

?

> +#else
> +#define argo_dprintk(format, ... ) (void)0

Please fully parenthesize macro expansions.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen
  2018-12-01  1:32 ` [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen Christopher Clark
@ 2018-12-03 16:20   ` Jan Beulich
  2018-12-04  9:17     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-03 16:20 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> Allocates an IPI-bound event channel on vcpu0 for specified domain.

Please can such changes to general code be done at the point where
they're needed? 

> Is able to bypass the existence check on vcpu number since vcpu 0
> should always exist. Bypass is required at the point of use by Argo.

"Should" is not a sufficient criteria. And you leave open why such a
bypass may be needed.

As an aside, I question anyway any new interface special casing
vCPU 0.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 00/25] Argo: hypervisor-mediated interdomain communication
  2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (24 preceding siblings ...)
  2018-12-01  1:33 ` [PATCH 25/25] argo: implement the get_config op to query notification config Christopher Clark
@ 2018-12-03 16:49 ` Chris Patterson
  2018-12-04  9:00   ` Christopher Clark
  25 siblings, 1 reply; 111+ messages in thread
From: Chris Patterson @ 2018-12-03 16:49 UTC (permalink / raw)
  To: christopher.w.clark
  Cc: jandryuk, dpsmith, Julien Grall, Jan Beulich, Stefano Stabellini,
	Tim Deegan, xen-devel, jean.guyader, lars.kurth, Ross Philipson,
	Konrad Rzeszutek Wilk, paul.durrant, Juergen Gross, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, voreekf, Rich Persaud,
	dgdegra, eric chanudet, roger.pau

> == Future items
>
> The Linux device driver used to test this software is derived from the
> OpenXT v4v Linux device driver, available at:
>     https://github.com/OpenXT/v4v
> The Argo implementation is not yet ready to publish (focus has been on
> the hypervisor code to this point). A Linux device driver suitable for
> inclusion in Xen will be submitted for a future Xen release and
> incorporation into OpenXT.
>

Hey Christopher, I am glad you are tackling this.  While the Linux
driver is not ready to publish, is there a version you can share for
someone who wants to test this series?  Or is the v4v driver
compatible as-is?

Cheers,
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 00/25] Argo: hypervisor-mediated interdomain communication
  2018-12-03 16:49 ` [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Chris Patterson
@ 2018-12-04  9:00   ` Christopher Clark
  2018-12-11 22:13     ` Chris Patterson
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-04  9:00 UTC (permalink / raw)
  To: Chris Patterson
  Cc: Jason Andryuk, Daniel Smith, Julien Grall, Jan Beulich,
	Stefano Stabellini, Tim Deegan, xen-devel, Jean Guyader,
	Lars Kurth, Ross Philipson, Konrad Rzeszutek Wilk, Paul Durrant,
	Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, James McKenzie, Rich Persaud, dgdegra,
	eric chanudet, Roger Pau Monné

On Mon, Dec 3, 2018 at 8:49 AM Chris Patterson <cjp256@gmail.com> wrote:
>
> > == Future items
> >
> > The Linux device driver used to test this software is derived from the
> > OpenXT v4v Linux device driver, available at:
> >     https://github.com/OpenXT/v4v
> > The Argo implementation is not yet ready to publish (focus has been on
> > the hypervisor code to this point). A Linux device driver suitable for
> > inclusion in Xen will be submitted for a future Xen release and
> > incorporation into OpenXT.
> >
>
> Hey Christopher, I am glad you are tackling this.  While the Linux
> driver is not ready to publish, is there a version you can share for
> someone who wants to test this series?  Or is the v4v driver
> compatible as-is?

Hi Chris,

Thanks for the interest -- so: ok, for you to take a look and to
enable testing by anyone who would like to: I've just pushed a copy of
the Argo ported Linux driver and userspace interposer, etc., with some
OpenEmbedded build integration and instructions, to my github account
here:

https://github.com/dozylynx/meta-argo-linux

This a pretty fast port of the v4v Linux software to use the argo
interfaces -- the existing OpenXT v4v interface is not quite the same
-- plus metadata in there to turn it into a new OpenEmbedded layer in
the same repo with recipes to work with meta-virtualization. I've been
building with the rocko release, just to pick a stable reference
point, so it's the rocko branch in meta-argo-linux that you'll want to
look at, and there are instructions in the README.md in that branch.

If you build that per the instructions, just a heads up that the Xen
recipe in there will pull from a recent snapshot of Xen's staging
branch, with the posted Argo series applied, from a copy on my github
account.

If you give it a spin, let me know how it goes.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-02 19:55   ` Julien Grall
@ 2018-12-04  9:03     ` Christopher Clark
  2018-12-04  9:16       ` Paul Durrant
  2018-12-11 14:15       ` Julien Grall
  0 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-04  9:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	nd, eric chanudet

On Sun, Dec 2, 2018 at 11:55 AM Julien Grall <Julien.Grall@arm.com> wrote:
>
> Hi,
>
> On 01/12/2018 01:33, Christopher Clark wrote:
> > * x86 PV domains are notified via event channel.
> >
> > PV guests are known to have the event channel software present in the guest
> > kernel, so it is fine to depend on and use it.
> >
> > * x86 HVM domains and all ARM domains are notified via VIRQ.
> >
> > The intent is to remove the requirement for event channel software to be
> > installed within these guests in order to use Argo. VIRQ signalling is also
> > the method that has been in use for the longest period with this hypercall
> > in both XenClient and OpenXT.
>
> I am a bit confused. vIRQs are based on event channel, so how do you
> remove the requirement on event channel?

Are VIRQs always delivered via events in all cases? I was under the
impression that was not necessarily so with HVM guests but I haven't
checked and could well be incorrect.

A bit of context might help with how this multiple-method logic (as
submitted) was arrived at:

1) Both XenClient's original version of v4v, and that used in OpenXT,
deliver notifications to guests via VIRQ.
This logic has been performing fine for our uses cases, so there
hasn't really been a push to switch away from it.

2) The last version of v4v that was submitted to xen-devel for
iteration with the Xen community was intended to use event channels
instead, in response to a request from Jan at the time. Given that
expressed preference, I've added that, plumbing it in through via the
IPI event method exposed in patch #01, and then used in patch #05, of
the submitted series.

3) Bromium's uxen uses different logic for delivery of events to
non-PV guests: an edge-triggered, ISA IRQ, along these lines:

    #define ARGO_SIGNAL_ISA_IRQ 8
    hvm_isa_irq_assert(d, ARGO_SIGNAL_ISA_IRQ, NULL);
    hvm_isa_irq_deassert(d, ARGO_SIGNAL_ISA_IRQ);

I'm told that this avoids the need to EOI in the guest, reducing the
VMEXIT load, and using an ISA IRQ avoids some logic in Windows that
requires that a device be detected. I briefly looked into adding this
to Argo, but Linux wasn't immediately happy and I haven't had time to
look into it further given the proximity of the 4.12 release, with
other work still to complete.

Anyway: since method 3 isn't ready to submit, and if VIRQs don't have
an advantage over using event channels directly wrt. to needing
in-guest support to function, then I can drop this patch (#23) and
simplify the get_config op (#25), which will leave all notifications
being delivered as events.

Alternatively, if this is about which is the right delivery method for
ARM, with some valid reason to retain use of VIRQ for HVM x86, then
I'm happy to switch ARM over to deliver by the event method rather
than VIRQ if that makes more sense.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-02 20:10   ` Julien Grall
@ 2018-12-04  9:08     ` Christopher Clark
  2018-12-05 17:20       ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-04  9:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	nd, eric chanudet, Roger Pau Monné

On Sun, Dec 2, 2018 at 12:11 PM Julien Grall <Julien.Grall@arm.com> wrote:
>
>
>
> On 01/12/2018 01:32, Christopher Clark wrote:
> > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> > index 20dabc0..5ad8e2b 100644
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -21,6 +21,20 @@
> >
> >   #include "xen.h"
> >
> > +#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
> > +
> > +#define ARGO_DOMID_ANY           DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
> > + *  -- which is 0x1000000 or 16777216 bytes.
> > + * A byte index into the ring is at most 24 bits.
> > + */
> > +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> > +
> > +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> > +typedef uint64_t argo_pfn_t;
>
> As you always use 64-bit, can we just use an address? This would make
> the ABI agnostic to the hypervisor page granularity.

Thanks for reviewing this series.

I'm not sure yet that switching to using addresses instead would be
for the best, so have been working through some reasoning about your
suggestion. This interface is for the guest to identify to the
hypervisor the list of frames of memory to use as the ring, and the
purpose of a frame number is to uniquely identify a frame. Frame
numbers, as opposed to addresses, are going to remain the same across
all processors, independent of the page tables that happen to
currently be in use.

Where possible, translation should be performed by the guest rather
than the hypervisor, minimizing the hypervisor logic (good for several
reasons) - so it would be better to avoid adding the
address-to-page-number walk and granularity handling in the hypervisor
here. In this case, the guest has the incentive to do that work, given
that it wants to register the ring.

(Slightly out of scope, but hopefully not for long: We have a
near-term interest in using argo to communicate between VMs at
different levels of nesting in L0/L1 nested hypervisors, and I suspect
that frame number translation will end up being easier to handle
across L0/L1 than translation of guest addresses in a VM running at
the other level.)

Could you give a specific scenario you have in mind that is prompting a concern?

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  2018-12-03 15:42   ` Jan Beulich
@ 2018-12-04  9:10     ` Christopher Clark
  2018-12-04 10:04       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-04  9:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Mon, Dec 3, 2018 at 7:42 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
> > describes these codes thus:
> >     EMSGSIZE     : "Message too large"
> >     ECONNREFUSED : "Connection refused".
>
> If you were to go solely by what POSIX mandates to have, more
> additions would be necessary afaict. We had limited ourselves to
> some basic set, so selective additions need further rationale put
> here. The more that for both added error codes the use case in
> the hypervisor isn't immediately obvious.

Thanks for reviewing the series and the previous iterations of this work.

I note your other message indicating a preference for including these
changes at point of first use and I will do so in the next revision.

An aside before the rationales below: part of the motivation for
selection of these error codes is to continue alignment with the
modern v4v implementation in uxen where possible.

EMSGSIZE:

This series proposes to return EMSGSIZE for a sendv operation (patch
#15) where an excess amount of data, across all iovs, has been
supplied, exceeding either the statically configured maximum size of a
transmittable message, or the (variable) size of the ring registered
by another domain.

If the new code EMSGSIZE is not wanted, an alternative error code
could be EINVAL, though that is returned for other errors in the same
operation, such as supplying incorrectly sized individual iovs.

ECONNREFUSED:

This series proposes to return ECONNREFUSED whenever a remote domain
is specified that either does not exist or is not argo-enabled.
This affects both the ring registration and sending data operations.
(register op, patch #13; sendv op, patch #15)

ECONNREFUSED seems plausible for this use as it is determined by the
remote domain state within the hypervisor.

Elsewhere, ENODEV is already used to indicate that the local sending
domain cannot perform the operation due to its own state.
ENOENT is used in the unregister operation to indicate that the
domain's own ring that it is attempting to unregister is not present.

ENXIO could work; ECONNREFUSED just seems more descriptive.

Are you OK with these new codes if they are used as described here,
and melded into the patches which first use them?

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo
  2018-12-03 15:51   ` Jan Beulich
@ 2018-12-04  9:12     ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-04  9:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Mon, Dec 3, 2018 at 7:51 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -200,6 +200,26 @@ config LATE_HWDOM
> >
> >         If unsure, say N.
> >
> > +config ARGO
> > +    bool "Argo: hypervisor-mediated interdomain communication"
> > +    default y
>
> Until our policy changes as to wider configurability, options not
> depending on EXPERT should be accompanied by a reason. I
> also don't think that we want this to default to enabled from
> the very beginning. Finally please correct indentation.

ack: will toggle, add dependency, describe and re-indent.

thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 05/25] argo: Add initial argo_init and argo_destroy
  2018-12-01  1:32 ` [PATCH 05/25] argo: Add initial argo_init and argo_destroy Christopher Clark
@ 2018-12-04  9:12   ` Paul Durrant
  2018-12-13 13:16   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:12 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> Tim (Xen.org) <tim@xen.org>; Wei Liu <wei.liu2@citrix.com>; Rich Persaud
> <persaur@gmail.com>; Ross Philipson <ross.philipson@gmail.com>; Eric
> Chanudet <eric.chanudet@gmail.com>; James McKenzie
> <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>; Daniel Smith
> <dpsmith@apertussolutions.com>
> Subject: [PATCH 05/25] argo: Add initial argo_init and argo_destroy
> 
> Initialises basic data structures and performs teardown of argo state
> for domain shutdown.
> 
> Introduces headers:
>   <public/argo.h> with definions of addresses and ring structure,
> including
>   indexes for atomic update for communication between domain and
> hypervisor,
>   and <xen/argo.h> to support hooking init and destroy into domain
> lifecycle.
> 
> If CONFIG_ARGO is enabled:
> 
> Adds per-domain init of argo data structures to domain_create by calling
> argo_init, and similarly adds teardown via argo_destroy into
> domain_destroy
> and the error exit path of domain_create.
> 
> argo_init allocates an event channel for use for signalling to the domain.
> The event channel is of type IPI since that behaves in the required way;
> unbound event channels are unsuitable since they silently drop events.
> The only disadvantage of the IPI type is that the channel cannot be
> rebound
> to any other VCPU; that seems to be tolerable and avoids introducing any
> further changes to add another channel type.
> 
> In accordance with recent work on _domain_destroy, argo_destroy is
> idempotent.
> 
> Adds two new fields to struct domain:
>     rwlock_t argo_lock;
>     struct argo_domain *argo;
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c         | 277
> +++++++++++++++++++++++++++++++++++++++++++++-
>  xen/common/domain.c       |  15 +++
>  xen/include/public/argo.h |  55 +++++++++
>  xen/include/xen/argo.h    |  30 +++++
>  xen/include/xen/sched.h   |   7 ++
>  5 files changed, 383 insertions(+), 1 deletion(-)
>  create mode 100644 xen/include/public/argo.h
>  create mode 100644 xen/include/xen/argo.h
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 6917f98..1872d37 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -17,7 +17,101 @@
>   */
> 
>  #include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/argo.h>
> +#include <xen/event.h>
> +#include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/time.h>
> +
> +DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
> +
> +struct argo_pending_ent
> +{
> +    struct hlist_node node;
> +    domid_t id;

I think you want 16 bits of padding here.

> +    uint32_t len;
> +};
> +
> +struct argo_ring_info
> +{
> +    /* next node in the hash, protected by L2 */
> +    struct hlist_node node;
> +    /* this ring's id, protected by L2 */
> +    argo_ring_id_t id;
> +    /* used to confirm sender id, protected by L2 */
> +    uint64_t partner_cookie;
> +    /* L3 */
> +    spinlock_t lock;
> +    /* cached length of the ring (from ring->len), protected by L3 */
> +    uint32_t len;
> +    /* number of pages in the ring, protected by L3 */
> +    uint32_t npage;
> +    /* number of pages translated into mfns, protected by L3 */
> +    uint32_t nmfns;
> +    /* cached tx pointer location, protected by L3 */
> +    uint32_t tx_ptr;
> +    /* mapped ring pages protected by L3 */
> +    uint8_t **mfn_mapping;
> +    /* list of mfns of guest ring, protected by L3 */
> +    mfn_t *mfns;
> +    /* list of struct argo_pending_ent for this ring, protected by L3 */
> +    struct hlist_head pending;
> +};
> +
> +/*
> + * The value of the argo element in a struct domain is
> + * protected by the global lock argo_lock: L1
> + */
> +#define ARGO_HTABLE_SIZE 32
> +struct argo_domain
> +{
> +    /* L2 */
> +    rwlock_t lock;
> +    /* event channel */
> +    evtchn_port_t evtchn_port;
> +    /* protected by L2 */
> +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> +    /* id cookie, written only at init, so readable with R(L1) */
> +    uint64_t domain_cookie;
> +};
> +
> +/*
> + * locks
> + */
> +
> +/*
> + * locking is organized as follows:
> + *
> + * L1 : The global lock: argo_lock
> + * Protects the argo elements of all struct domain *d in the system.
> + * It does not protect any of the elements of d->argo, only their
> + * addresses.
> + * By extension since the destruction of a domain with a non-NULL
> + * d->argo will need to free the d->argo pointer, holding this lock
> + * guarantees that no domains pointers that argo is interested in
> + * become invalid whilst this lock is held.
> + */
> +
> +static DEFINE_RWLOCK(argo_lock); /* L1 */
> +
> +/*
> + * L2 : The per-domain lock: d->argo->lock
> + * Holding a read lock on L2 protects the hash table and
> + * the elements in the hash_table d->argo->ring_hash, and
> + * the node and id fields in struct argo_ring_info in the
> + * hash table.
> + * Holding a write lock on L2 protects all of the elements of
> + * struct argo_ring_info.
> + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> + *
> + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> + * Protects len, tx_ptr, the guest ring, the guest ring_data and
> + * the pending list.
> + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> + */
> 
>  /*
>   * Debugs
> @@ -32,10 +126,191 @@
>  #define argo_dprintk(format, ... ) (void)0
>  #endif
> 
> +/*
> + * ring buffer
> + */
> +
> +/* caller must have L3 or W(L2) */
> +static void
> +argo_ring_unmap(struct argo_ring_info *ring_info)
> +{
> +    int i;

unsigned?

> +
> +    if ( !ring_info->mfn_mapping )
> +        return;
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +    {
> +        if ( !ring_info->mfn_mapping[i] )
> +            continue;
> +        if ( ring_info->mfns )
> +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from
> %p\n",
> +                         mfn_x(ring_info->mfns[i]),
> +                         ring_info->mfn_mapping[i]);
> +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> +        ring_info->mfn_mapping[i] = NULL;
> +    }
> +}
> +
> +/*
> + * pending
> + */
> +static void
> +argo_pending_remove_ent(struct argo_pending_ent *ent)
> +{
> +    hlist_del(&ent->node);
> +    xfree(ent);
> +}
> +
> +static void
> +argo_pending_remove_all(struct argo_ring_info *ring_info)
> +{
> +    struct hlist_node *node, *next;
> +    struct argo_pending_ent *pending_ent;
> +
> +    hlist_for_each_entry_safe(pending_ent, node, next,
> +                              &ring_info->pending, node)
> +    {

Unnecessary braces, I think.

> +        argo_pending_remove_ent(pending_ent);
> +    }
> +}
> +
> +static void argo_ring_remove_mfns(const struct domain *d,
> +                                  struct argo_ring_info *ring_info)
> +{
> +    int i;

unsigned?

> +
> +    ASSERT(rw_is_write_locked(&d->argo->lock));
> +
> +    if ( !ring_info->mfns )
> +        return;
> +    ASSERT(ring_info->mfn_mapping);
> +
> +    argo_ring_unmap(ring_info);
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +        if ( mfn_x(ring_info->mfns[i]) != mfn_x(INVALID_MFN) )

How about "!mfn_eq(ring_info->mfns[i], INVALID_MFN)"? I think it's a bit neater.

> +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> +
> +    xfree(ring_info->mfns);
> +    ring_info->mfns = NULL;
> +    ring_info->npage = 0;
> +    xfree(ring_info->mfn_mapping);
> +    ring_info->mfn_mapping = NULL;
> +    ring_info->nmfns = 0;
> +}
> +
> +static void
> +argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    ASSERT(rw_is_write_locked(&d->argo->lock));
> +
> +    /* Holding W(L2) so do not need to acquire L3 */
> +    argo_pending_remove_all(ring_info);
> +    hlist_del(&ring_info->node);
> +    argo_ring_remove_mfns(d, ring_info);
> +    xfree(ring_info);
> +}
> +
>  long
>  do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>                     XEN_GUEST_HANDLE_PARAM(void) arg2,
>                     uint32_t arg3, uint32_t arg4)
>  {
> -    return -ENOSYS;
> +    struct domain *d = current->domain;

The general preference these days is to use 'currd' to refer to the current domain (and d for an arbitrary one).

> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_message_op(%d,%p,%p,%d,%d)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> +
> +    domain_lock(d);
> +
> +    switch (cmd)
> +    {
> +    default:
> +        rc = -ENOSYS;
> +        break;
> +    }
> +
> +    domain_unlock(d);
> +    argo_dprintk("<-do_argo_message_op()=%ld\n", rc);

Blank line here.

> +    return rc;
> +}
> +
> +int
> +argo_init(struct domain *d)
> +{
> +    struct argo_domain *argo;
> +    evtchn_port_t port;
> +    int i;
> +    int rc;
> +
> +    argo = xmalloc(struct argo_domain);
> +    if ( !argo )
> +        return -ENOMEM;
> +
> +    rwlock_init(&argo->lock);
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> +
> +    rc = evtchn_bind_ipi_vcpu0_domain(d, &port);
> +    if ( rc )
> +    {
> +        xfree(argo);
> +        return rc;
> +    }
> +    argo->evtchn_port = port;
> +    argo->domain_cookie = (uint64_t)NOW();
> +
> +    write_lock(&argo_lock);
> +    d->argo = argo;
> +    write_unlock(&argo_lock);
> +
> +    return 0;
> +}
> +
> +void
> +argo_destroy(struct domain *d)
> +{
> +    int i;
> +
> +    BUG_ON(!d->is_dying);
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("d->v=%p\n", d->argo);
> +
> +    if ( d->argo )
> +    {
> +        for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +        {
> +            struct hlist_node *node, *next;
> +            struct argo_ring_info *ring_info;
> +
> +            hlist_for_each_entry_safe(ring_info, node,
> +                                      next, &d->argo->ring_hash[i],
> +                                      node)
> +            {

Unnecessary braces I think.

> +                argo_ring_remove_info(d, ring_info);
> +            }
> +        }
> +        /*
> +         * Since this function is only called during domain destruction,
> +         * argo->evtchn_port need not be closed here. ref: evtchn_destroy
> +         */
> +        d->argo->domain_cookie = 0;
> +        xfree(d->argo);
> +        d->argo = NULL;
> +    }
> +    write_unlock(&argo_lock);
> +
> +    /*
> +     * This (dying) domain's domid may be recorded as the authorized
> sender
> +     * to rings registered by other domains, and those rings are not
> +     * unregistered here.
> +     * If a later domain is created that has the same domid as this one,
> the
> +     * domain_cookie will differ, which ensures that the new domain
> cannot
> +     * use the inherited authorizations to transmit that were issued to
> this
> +     * domain.
> +     */
>  }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 78cc524..eadea4d 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -277,6 +277,10 @@ static void _domain_destroy(struct domain *d)
> 
>      xfree(d->pbuf);
> 
> +#ifdef CONFIG_ARGO
> +    argo_destroy(d);
> +#endif
> +
>      rangeset_domain_destroy(d);
> 
>      free_cpumask_var(d->dirty_cpumask);
> @@ -376,6 +380,9 @@ struct domain *domain_create(domid_t domid,
>      spin_lock_init(&d->hypercall_deadlock_mutex);
>      INIT_PAGE_LIST_HEAD(&d->page_list);
>      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
> +#ifdef CONFIG_ARGO
> +    rwlock_init(&d->argo_lock);
> +#endif
> 
>      spin_lock_init(&d->node_affinity_lock);
>      d->node_affinity = NODE_MASK_ALL;
> @@ -445,6 +452,11 @@ struct domain *domain_create(domid_t domid,
>              goto fail;
>          init_status |= INIT_gnttab;
> 
> +#ifdef CONFIG_ARGO
> +        if ( (err = argo_init(d)) != 0 )
> +            goto fail;
> +#endif
> +
>          err = -ENOMEM;
> 
>          d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
> @@ -717,6 +729,9 @@ int domain_kill(struct domain *d)
>          if ( d->is_dying != DOMDYING_alive )
>              return domain_kill(d);
>          d->is_dying = DOMDYING_dying;
> +#ifdef CONFIG_ARGO
> +        argo_destroy(d);
> +#endif
>          evtchn_destroy(d);
>          gnttab_release_mappings(d);
>          tmem_destroy(d->tmem_client);
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> new file mode 100644
> index 0000000..20dabc0
> --- /dev/null
> +++ b/xen/include/public/argo.h
> @@ -0,0 +1,55 @@
> +/************************************************************************
> ******
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Derived from v4v, the version 2 of v2v.
> + *
> + * Copyright (c) 2010, Citrix Systems
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> USA
> + */

Do you really want to license a public header under the GPL?

  Paul

> +
> +#ifndef __XEN_PUBLIC_ARGO_H__
> +#define __XEN_PUBLIC_ARGO_H__
> +
> +#include "xen.h"
> +
> +typedef struct argo_addr
> +{
> +    uint32_t port;
> +    domid_t domain_id;
> +    uint16_t pad;
> +} argo_addr_t;
> +
> +typedef struct argo_ring_id
> +{
> +    struct argo_addr addr;
> +    domid_t partner;
> +    uint16_t pad;
> +} argo_ring_id_t;
> +
> +typedef struct argo_ring
> +{
> +    uint64_t magic;
> +    argo_ring_id_t id;
> +    uint32_t len;
> +    /* Guests should use atomic operations to access rx_ptr */
> +    uint32_t rx_ptr;
> +    /* Guests should use atomic operations to access tx_ptr */
> +    uint32_t tx_ptr;
> +    uint8_t reserved[32];
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +    uint8_t ring[];
> +#elif defined(__GNUC__)
> +    uint8_t ring[0];
> +#endif
> +} argo_ring_t;
> +
> +#endif
> diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
> new file mode 100644
> index 0000000..c037de6
> --- /dev/null
> +++ b/xen/include/xen/argo.h
> @@ -0,0 +1,30 @@
> +/************************************************************************
> ******
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Derived from v4v, the version 2 of v2v.
> + *
> + * Copyright (c) 2010, Citrix Systems
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> USA
> + */
> +
> +#ifndef __XEN_ARGO_H__
> +#define __XEN_ARGO_H__
> +
> +#include <xen/types.h>
> +#include <public/argo.h>
> +
> +struct argo_domain;
> +
> +int argo_init(struct domain *d);
> +void argo_destroy(struct domain *d);
> +
> +#endif
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 0309c1f..4a19b55 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -22,6 +22,7 @@
>  #include <asm/atomic.h>
>  #include <xen/vpci.h>
>  #include <xen/wait.h>
> +#include <xen/argo.h>
>  #include <public/xen.h>
>  #include <public/domctl.h>
>  #include <public/sysctl.h>
> @@ -490,6 +491,12 @@ struct domain
>          unsigned int guest_request_enabled       : 1;
>          unsigned int guest_request_sync          : 1;
>      } monitor;
> +
> +#ifdef CONFIG_ARGO
> +    /* Argo interdomain communication support */
> +    rwlock_t argo_lock;
> +    struct argo_domain *argo;
> +#endif
>  };
> 
>  /* Protect updates/reads (resp.) of domain_list and domain_hash. */
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-04  9:03     ` Christopher Clark
@ 2018-12-04  9:16       ` Paul Durrant
  2018-12-12 14:49         ` James
  2018-12-11 14:15       ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:16 UTC (permalink / raw)
  To: 'Christopher Clark', Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, James McKenzie, Rich Persaud, Jan Beulich,
	Ian Jackson, xen-devel, nd, eric chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 04 December 2018 09:03
> To: Julien Grall <Julien.Grall@arm.com>
> Cc: xen-devel <xen-devel@lists.xenproject.org>; nd@arm.com; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Jan Beulich <jbeulich@suse.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; Wei Liu
> <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; Rich
> Persaud <persaur@gmail.com>; Ross Philipson <ross.philipson@gmail.com>;
> eric chanudet <eric.chanudet@gmail.com>; James McKenzie
> <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>; Daniel Smith
> <dpsmith@apertussolutions.com>
> Subject: Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
> 
> On Sun, Dec 2, 2018 at 11:55 AM Julien Grall <Julien.Grall@arm.com> wrote:
> >
> > Hi,
> >
> > On 01/12/2018 01:33, Christopher Clark wrote:
> > > * x86 PV domains are notified via event channel.
> > >
> > > PV guests are known to have the event channel software present in the
> guest
> > > kernel, so it is fine to depend on and use it.
> > >
> > > * x86 HVM domains and all ARM domains are notified via VIRQ.
> > >
> > > The intent is to remove the requirement for event channel software to
> be
> > > installed within these guests in order to use Argo. VIRQ signalling is
> also
> > > the method that has been in use for the longest period with this
> hypercall
> > > in both XenClient and OpenXT.
> >
> > I am a bit confused. vIRQs are based on event channel, so how do you
> > remove the requirement on event channel?
> 
> Are VIRQs always delivered via events in all cases? I was under the
> impression that was not necessarily so with HVM guests but I haven't
> checked and could well be incorrect.
> 
> A bit of context might help with how this multiple-method logic (as
> submitted) was arrived at:
> 
> 1) Both XenClient's original version of v4v, and that used in OpenXT,
> deliver notifications to guests via VIRQ.
> This logic has been performing fine for our uses cases, so there
> hasn't really been a push to switch away from it.

I'm not aware of any way to map VIRQs to vectors directly so I think they have to be dealt with as any other event channel.

  Paul

> 
> 2) The last version of v4v that was submitted to xen-devel for
> iteration with the Xen community was intended to use event channels
> instead, in response to a request from Jan at the time. Given that
> expressed preference, I've added that, plumbing it in through via the
> IPI event method exposed in patch #01, and then used in patch #05, of
> the submitted series.
> 
> 3) Bromium's uxen uses different logic for delivery of events to
> non-PV guests: an edge-triggered, ISA IRQ, along these lines:
> 
>     #define ARGO_SIGNAL_ISA_IRQ 8
>     hvm_isa_irq_assert(d, ARGO_SIGNAL_ISA_IRQ, NULL);
>     hvm_isa_irq_deassert(d, ARGO_SIGNAL_ISA_IRQ);
> 
> I'm told that this avoids the need to EOI in the guest, reducing the
> VMEXIT load, and using an ISA IRQ avoids some logic in Windows that
> requires that a device be detected. I briefly looked into adding this
> to Argo, but Linux wasn't immediately happy and I haven't had time to
> look into it further given the proximity of the 4.12 release, with
> other work still to complete.
> 
> Anyway: since method 3 isn't ready to submit, and if VIRQs don't have
> an advantage over using event channels directly wrt. to needing
> in-guest support to function, then I can drop this patch (#23) and
> simplify the get_config op (#25), which will leave all notifications
> being delivered as events.
> 
> Alternatively, if this is about which is the right delivery method for
> ARM, with some valid reason to retain use of VIRQ for HVM x86, then
> I'm happy to switch ARM over to deliver by the event method rather
> than VIRQ if that makes more sense.
> 
> Christopher
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen
  2018-12-03 16:20   ` Jan Beulich
@ 2018-12-04  9:17     ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-04  9:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Mon, Dec 3, 2018 at 8:20 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> > Allocates an IPI-bound event channel on vcpu0 for specified domain.
>
> Please can such changes to general code be done at the point where
> they're needed?
>
> > Is able to bypass the existence check on vcpu number since vcpu 0
> > should always exist. Bypass is required at the point of use by Argo.
>
> "Should" is not a sufficient criteria. And you leave open why such a
> bypass may be needed.
>
> As an aside, I question anyway any new interface special casing
> vCPU 0.

Ack. I'll take a look at how this initialization interacts with
domains with no VCPUs. Agree that the vcpu0 special-case deserves
questioning. Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable
  2018-12-01  1:32 ` [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable Christopher Clark
@ 2018-12-04  9:18   ` Paul Durrant
  2018-12-04 11:35   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:18 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> Tim (Xen.org) <tim@xen.org>; Wei Liu <wei.liu2@citrix.com>; Rich Persaud
> <persaur@gmail.com>; Ross Philipson <ross.philipson@gmail.com>; Eric
> Chanudet <eric.chanudet@gmail.com>; James McKenzie
> <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>; Daniel Smith
> <dpsmith@apertussolutions.com>
> Subject: [PATCH 06/25] argo: Xen command line parameter 'argo': bool to
> enable/disable
> 
> Default to disabled.

Any particular reason not to fold this into patch #5?

  Paul

> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 1872d37..82fab36 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -28,6 +28,10 @@
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
> 
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled = 0;
> +boolean_param("argo", opt_argo_enabled);
> +
>  struct argo_pending_ent
>  {
>      struct hlist_node node;
> @@ -223,6 +227,13 @@ do_argo_message_op(int cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg1,
>      argo_dprintk("->do_argo_message_op(%d,%p,%p,%d,%d)\n", cmd,
>                   (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> 
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -ENOSYS;
> +        argo_dprintk("<-do_argo_message_op()=%ld\n", rc);
> +        return rc;
> +    }
> +
>      domain_lock(d);
> 
>      switch (cmd)
> @@ -245,6 +256,14 @@ argo_init(struct domain *d)
>      int i;
>      int rc;
> 
> +    if ( !opt_argo_enabled )
> +    {
> +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> +        return 0;
> +    }
> +
> +    argo_dprintk("argo init: domid: %d\n", d->domain_id);
> +
>      argo = xmalloc(struct argo_domain);
>      if ( !argo )
>          return -ENOMEM;
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
  2018-12-01  1:32 ` [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy Christopher Clark
@ 2018-12-04  9:35   ` Paul Durrant
  2018-12-12 16:01   ` Roger Pau Monné
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:35 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet, Roger Pau Monne



> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
> <julien.grall@arm.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; George
> Dunlap <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
> Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Paul Durrant <Paul.Durrant@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; Wei Liu <wei.liu2@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Rich Persaud <persaur@gmail.com>; Ross Philipson
> <ross.philipson@gmail.com>; Eric Chanudet <eric.chanudet@gmail.com>; James
> McKenzie <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>;
> Daniel Smith <dpsmith@apertussolutions.com>
> Subject: [PATCH 07/25] xen (ARM, x86): add errno-returning functions for
> copy
> 
> Applied to both x86 and ARM headers.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/include/asm-arm/guest_access.h | 25 +++++++++++++++++++++++++
>  xen/include/asm-x86/guest_access.h | 29 +++++++++++++++++++++++++++++
>  xen/include/xen/guest_access.h     |  3 +++
>  3 files changed, 57 insertions(+)
> 
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-
> arm/guest_access.h
> index 224d2a0..7b6f89c 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d,
> paddr_t ipa, void *buf,
>  #define __raw_copy_from_guest raw_copy_from_guest
>  #define __raw_clear_guest raw_clear_guest
> 
> +#define raw_copy_from_guest_errno(dst, src, len)             \
> +    (raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
> +#define raw_copy_to_guest_errno(dst, src, len)               \
> +    (raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)
> +
>  /* Remainder copied from x86 -- could be common? */
> 
>  /* Is the guest handle a NULL reference? */
> @@ -113,6 +118,26 @@ int access_guest_memory_by_ipa(struct domain *d,
> paddr_t ipa, void *buf,
>      raw_copy_from_guest(_d, _s, sizeof(*_d));           \
>  })
> 
> +/* errno returning copy functions */
> +#define copy_from_guest_offset_errno(ptr, hnd, off, nr) ({              \
> +            const typeof(*(ptr)) *_s = (hnd).p;                         \
> +            typeof(*(ptr)) *_d = (ptr);                                 \
> +            raw_copy_from_guest_errno(_d, _s + (off), sizeof(*_d) *
> (nr)); \
> +        })
> +
> +#define copy_field_to_guest_errno(hnd, ptr, field) ({           \
> +            const typeof(&(ptr)->field) _s = &(ptr)->field;     \
> +            void *_d = &(hnd).p->field;                         \
> +            ((void)(&(hnd).p->field == &(ptr)->field));         \
> +            raw_copy_to_guest_errno(_d, _s, sizeof(*_s));       \
> +        })
> +
> +#define copy_field_from_guest_errno(ptr, hnd, field) ({         \
> +            const typeof(&(ptr)->field) _s = &(hnd).p->field;   \
> +            typeof(&(ptr)->field) _d = &(ptr)->field;           \
> +            raw_copy_from_guest_errno(_d, _s, sizeof(*_d));     \
> +        })
> +
>  /*
>   * Pre-validate a guest handle.
>   * Allows use of faster __copy_* functions.
> diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-
> x86/guest_access.h
> index ca700c9..9391cd3 100644
> --- a/xen/include/asm-x86/guest_access.h
> +++ b/xen/include/asm-x86/guest_access.h
> @@ -38,6 +38,15 @@
>       clear_user_hvm((dst), (len)) :             \
>       clear_user((dst), (len)))
> 
> +#define raw_copy_from_guest_errno(dst, src, len)                        \
> +    (is_hvm_vcpu(current) ?                                             \
> +     copy_from_user_hvm((dst), (src), (len)) :                         \
> +     (copy_from_user((dst), (src), (len)) ? -EFAULT : 0))

AFAICT copy_from_user_hvm() doesn't return -ve errno (it has comment "/* fake a copy_to_user() return code */" on the return line) so I think your bracketing is wrong here...

> +#define raw_copy_to_guest_errno(dst, src, len)          \
> +    (is_hvm_vcpu(current) ?                             \
> +     copy_to_user_hvm((dst), (src), (len)) :           \
> +     (copy_to_user((dst), (src), (len)) ? -EFAULT : 0))
> +

...and similarly here.

>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> 
> @@ -121,6 +130,26 @@
>      raw_copy_from_guest(_d, _s, sizeof(*_d));           \
>  })
> 
> +/* errno returning copy functions */
> +#define copy_from_guest_offset_errno(ptr, hnd, off, nr) ({              \
> +            const typeof(*(ptr)) *_s = (hnd).p;                         \
> +            typeof(*(ptr)) *_d = (ptr);                                 \
> +            raw_copy_from_guest_errno(_d, _s + (off), sizeof(*_d) *
> (nr)); \
> +        })
> +
> +#define copy_field_to_guest_errno(hnd, ptr, field) ({           \
> +            const typeof(&(ptr)->field) _s = &(ptr)->field;     \
> +            void *_d = &(hnd).p->field;                         \
> +            ((void)(&(hnd).p->field == &(ptr)->field));         \
> +            raw_copy_to_guest_errno(_d, _s, sizeof(*_s));       \
> +        })
> +
> +#define copy_field_from_guest_errno(ptr, hnd, field) ({         \
> +            const typeof(&(ptr)->field) _s = &(hnd).p->field;   \
> +            typeof(&(ptr)->field) _d = &(ptr)->field;           \
> +            raw_copy_from_guest_errno(_d, _s, sizeof(*_d));     \
> +        })
> +
>  /*
>   * Pre-validate a guest handle.
>   * Allows use of faster __copy_* functions.
> diff --git a/xen/include/xen/guest_access.h
> b/xen/include/xen/guest_access.h
> index 09989df..3494c5f 100644
> --- a/xen/include/xen/guest_access.h
> +++ b/xen/include/xen/guest_access.h
> @@ -26,6 +26,9 @@
>  #define __copy_from_guest(ptr, hnd, nr)                 \
>      __copy_from_guest_offset(ptr, hnd, 0, nr)
> 
> +#define copy_from_guest_errno(ptr, hnd, nr)             \
> +    copy_from_guest_offset_errno(ptr, hnd, 0, nr)
> +
>  #define __clear_guest(hnd, nr)                          \
>      __clear_guest_offset(hnd, 0, nr)
> 

Given that the only errno possible seems to be EFAULT, I do have to question why you need these changes?

  Paul

> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate
  2018-12-01  1:32 ` [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate Christopher Clark
@ 2018-12-04  9:44   ` Paul Durrant
  2018-12-20  5:13     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:44 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet, Roger Pau Monne

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Roger Pau
> Monne <roger.pau@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Ian Jackson <Ian.Jackson@citrix.com>; Julien Grall <julien.grall@arm.com>;
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> Tim (Xen.org) <tim@xen.org>; Rich Persaud <persaur@gmail.com>; Ross
> Philipson <ross.philipson@gmail.com>; Eric Chanudet
> <eric.chanudet@gmail.com>; James McKenzie <voreekf@madingley.org>; Jason
> Andryuk <jandryuk@gmail.com>; Daniel Smith <dpsmith@apertussolutions.com>
> Subject: [PATCH 03/25] argo: introduce the argo_message_op hypercall
> boilerplate
> 
> Presence is gated upon CONFIG_ARGO.
> 
> Registers the hypercall previously reserved for this.
> Takes 5 arguments, does nothing and returns -ENOSYS.
> 
> Will be avoiding a compat ABI by using fixed-size types in hypercall ops.

You appear to be using handles, so will you not need compat code to deal with those?

  Paul

> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/arch/x86/guest/hypercall_page.S |  2 +-
>  xen/arch/x86/hvm/hypercall.c        |  3 +++
>  xen/arch/x86/hypercall.c            |  3 +++
>  xen/arch/x86/pv/hypercall.c         |  3 +++
>  xen/common/Makefile                 |  1 +
>  xen/common/argo.c                   | 28 ++++++++++++++++++++++++++++
>  xen/include/public/xen.h            |  2 +-
>  xen/include/xen/hypercall.h         |  9 +++++++++
>  8 files changed, 49 insertions(+), 2 deletions(-)
>  create mode 100644 xen/common/argo.c
> 
> diff --git a/xen/arch/x86/guest/hypercall_page.S
> b/xen/arch/x86/guest/hypercall_page.S
> index fdd2e72..6c56d66 100644
> --- a/xen/arch/x86/guest/hypercall_page.S
> +++ b/xen/arch/x86/guest/hypercall_page.S
> @@ -59,7 +59,7 @@ DECLARE_HYPERCALL(sysctl)
>  DECLARE_HYPERCALL(domctl)
>  DECLARE_HYPERCALL(kexec_op)
>  DECLARE_HYPERCALL(tmem_op)
> -DECLARE_HYPERCALL(xc_reserved_op)
> +DECLARE_HYPERCALL(argo_message_op)
>  DECLARE_HYPERCALL(xenpmu_op)
> 
>  DECLARE_HYPERCALL(arch_0)
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 19d1263..ee3c9f1 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -134,6 +134,9 @@ static const hypercall_table_t hvm_hypercall_table[] =
> {
>  #ifdef CONFIG_TMEM
>      HYPERCALL(tmem_op),
>  #endif
> +#ifdef CONFIG_ARGO
> +    HYPERCALL(argo_message_op),
> +#endif
>      COMPAT_CALL(platform_op),
>  #ifdef CONFIG_PV
>      COMPAT_CALL(mmuext_op),
> diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
> index 032de8f..7da7e89 100644
> --- a/xen/arch/x86/hypercall.c
> +++ b/xen/arch/x86/hypercall.c
> @@ -64,6 +64,9 @@ const hypercall_args_t
> hypercall_args_table[NR_hypercalls] =
>      ARGS(domctl, 1),
>      ARGS(kexec_op, 2),
>      ARGS(tmem_op, 1),
> +#ifdef CONFIG_ARGO
> +    ARGS(argo_message_op, 5),
> +#endif
>      ARGS(xenpmu_op, 2),
>  #ifdef CONFIG_HVM
>      ARGS(hvm_op, 2),
> diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
> index 5d11911..c3fd555 100644
> --- a/xen/arch/x86/pv/hypercall.c
> +++ b/xen/arch/x86/pv/hypercall.c
> @@ -77,6 +77,9 @@ const hypercall_table_t pv_hypercall_table[] = {
>  #ifdef CONFIG_TMEM
>      HYPERCALL(tmem_op),
>  #endif
> +#ifdef CONFIG_ARGO
> +    HYPERCALL(argo_message_op),
> +#endif
>      HYPERCALL(xenpmu_op),
>  #ifdef CONFIG_HVM
>      HYPERCALL(hvm_op),
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index ffdfb74..8c65c6f 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -1,3 +1,4 @@
> +obj-$(CONFIG_ARGO) += argo.o
>  obj-y += bitmap.o
>  obj-y += bsearch.o
>  obj-$(CONFIG_CORE_PARKING) += core_parking.o
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> new file mode 100644
> index 0000000..76017d4
> --- /dev/null
> +++ b/xen/common/argo.c
> @@ -0,0 +1,28 @@
> +/************************************************************************
> ******
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Derived from v4v, the version 2 of v2v.
> + *
> + * Copyright (c) 2010, Citrix Systems
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> USA
> + */
> +
> +#include <xen/errno.h>
> +#include <xen/guest_access.h>
> +
> +long
> +do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> +                   XEN_GUEST_HANDLE_PARAM(void) arg2,
> +                   uint32_t arg3, uint32_t arg4)
> +{
> +    return -ENOSYS;
> +}
> diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> index 68ee098..0a27546 100644
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -118,7 +118,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  #define __HYPERVISOR_domctl               36
>  #define __HYPERVISOR_kexec_op             37
>  #define __HYPERVISOR_tmem_op              38
> -#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
> +#define __HYPERVISOR_argo_message_op      39
>  #define __HYPERVISOR_xenpmu_op            40
>  #define __HYPERVISOR_dm_op                41
> 
> diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
> index cc99aea..112514c 100644
> --- a/xen/include/xen/hypercall.h
> +++ b/xen/include/xen/hypercall.h
> @@ -136,6 +136,15 @@ do_tmem_op(
>      XEN_GUEST_HANDLE_PARAM(tmem_op_t) uops);
>  #endif
> 
> +#ifdef CONFIG_ARGO
> +extern long do_argo_message_op(
> +    int cmd,
> +    XEN_GUEST_HANDLE_PARAM(void) arg1,
> +    XEN_GUEST_HANDLE_PARAM(void) arg2,
> +    uint32_t arg3,
> +    uint32_t arg4);
> +#endif
> +
>  extern long
>  do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
> 
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/25] arm: introduce guest_handle_for_field()
  2018-12-01  1:32 ` [PATCH 10/25] arm: introduce guest_handle_for_field() Christopher Clark
@ 2018-12-04  9:46   ` Paul Durrant
  0 siblings, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:46 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Ross Philipson, Jason Andryuk, Daniel Smith,
	James McKenzie, Rich Persaud, Julien Grall, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
> <julien.grall@arm.com>; Paul Durrant <Paul.Durrant@citrix.com>; Rich
> Persaud <persaur@gmail.com>; Ross Philipson <ross.philipson@gmail.com>;
> Eric Chanudet <eric.chanudet@gmail.com>; James McKenzie
> <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>; Daniel Smith
> <dpsmith@apertussolutions.com>
> Subject: [PATCH 10/25] arm: introduce guest_handle_for_field()
> 
> arm port of commit bb544585137259545d4adc9afe6eed8dc7c7376d

Capitalize 'arm'? Also maybe shorten the quoted hash, and you should include the commit title here.

> 
> This helper turns a field of a GUEST_HANDLE into a GUEST_HANDLE.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>

The code looks fine, so with a revised comment...

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
>  xen/include/asm-arm/guest_access.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-
> arm/guest_access.h
> index 7b6f89c..1137c54 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -68,6 +68,9 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t
> ipa, void *buf,
>      _y;                                                     \
>  })
> 
> +#define guest_handle_for_field(hnd, type, fld)          \
> +    ((XEN_GUEST_HANDLE(type)) { &(hnd).p->fld })
> +
>  #define guest_handle_from_ptr(ptr, type)        \
>      ((XEN_GUEST_HANDLE_PARAM(type)) { (type *)ptr })
>  #define const_guest_handle_from_ptr(ptr, type)  \
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam
  2018-12-01  1:32 ` [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam Christopher Clark
@ 2018-12-04  9:52   ` Paul Durrant
  2018-12-20  5:19     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:52 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Daniel De Graaf, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> Tim (Xen.org) <tim@xen.org>; Wei Liu <wei.liu2@citrix.com>; Daniel De
> Graaf <dgdegra@tycho.nsa.gov>; Rich Persaud <persaur@gmail.com>; Ross
> Philipson <ross.philipson@gmail.com>; Eric Chanudet
> <eric.chanudet@gmail.com>; James McKenzie <voreekf@madingley.org>; Jason
> Andryuk <jandryuk@gmail.com>; Daniel Smith <dpsmith@apertussolutions.com>
> Subject: [PATCH 11/25] xsm, argo: XSM control for argo register operation,
> argo_mac bootparam
> 
> XSM hooks implement distinct permissions for these two distinct cases of
> Argo ring registration:
> 
> * Single source:  registering a ring for communication to receive messages
>                   from a specified single other domain.
>   Default policy: allow.
> 
> * Any source:     registering a ring for communication to receive messages
>                   from any, or all, other domains (ie. wildcard).
>   Default policy: deny, with runtime policy configuration via new
> bootparam.
> 
> The reason why the default for wildcard rings is 'deny' is that there is
> currently no means other than XSM to protect the ring from DoS by a noisy
> domain spamming the ring, reducing the ability of other domains to send to
> it.
> Using XSM at least allows per-domain control over access to the send
> permission, to limit communication to domains that can be trusted.
> 
> Since denying access to any-sender rings unless a flask XSM policy is
> active
> will prevent many users from using a key Argo feature, also introduce a
> bootparam
> that can override this constraint:
>  "argo_mac" variable has allowed values: 'permissive' and 'enforcing'.
> Even though this is a boolean variable, use these descriptive strings in
> order
> to make it obvious to an administrator that this has potential security
> impact.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c                     | 15 +++++++++++++++
>  xen/include/xsm/dummy.h               | 15 +++++++++++++++
>  xen/include/xsm/xsm.h                 | 17 +++++++++++++++++
>  xen/xsm/dummy.c                       |  4 ++++
>  xen/xsm/flask/hooks.c                 | 19 +++++++++++++++++++
>  xen/xsm/flask/policy/access_vectors   | 11 +++++++++++
>  xen/xsm/flask/policy/security_classes |  1 +
>  7 files changed, 82 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 82fab36..2a95e09 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -32,6 +32,21 @@ DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
>  static bool __read_mostly opt_argo_enabled = 0;
>  boolean_param("argo", opt_argo_enabled);
> 
> +/* Xen command line option for conservative or relaxed access control */
> +bool __read_mostly argo_mac_bootparam_enforcing = true;
> +
> +static int __init parse_argo_mac_param(const char *s)
> +{
> +    if ( !strncmp(s, "enforcing", 10) )
> +        argo_mac_bootparam_enforcing = true;
> +    else if ( !strncmp(s, "permissive", 11) )
> +        argo_mac_bootparam_enforcing = false;
> +    else

Do you really want to parse e.g. 'enforcingfoobar' as 'enforcing'?

  Paul

> +        return -EINVAL;
> +    return 0;
> +}
> +custom_param("argo_mac", parse_argo_mac_param);
> +
>  struct argo_pending_ent
>  {
>      struct hlist_node node;
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index a29d1ef..55113c3 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -720,6 +720,21 @@ static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG
> struct domain *d)
> 
>  #endif /* CONFIG_X86 */
> 
> +#ifdef CONFIG_ARGO
> +static XSM_INLINE int xsm_argo_register_single_source(struct domain *d,
> +                                                      struct domain *t)
> +{
> +    return 0;
> +}
> +
> +static XSM_INLINE int xsm_argo_register_any_source(struct domain *d,
> +                                                   bool strict)
> +{
> +    return strict ? -EPERM : 0;
> +}
> +
> +#endif /* CONFIG_ARGO */
> +
>  #include <public/version.h>
>  static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
>  {
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 3b192b5..65577fd 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -181,6 +181,10 @@ struct xsm_operations {
>  #endif
>      int (*xen_version) (uint32_t cmd);
>      int (*domain_resource_map) (struct domain *d);
> +#ifdef CONFIG_ARGO
> +    int (*argo_register_single_source) (struct domain *d, struct domain
> *t);
> +    int (*argo_register_any_source) (struct domain *d);
> +#endif
>  };
> 
>  #ifdef CONFIG_XSM
> @@ -698,6 +702,19 @@ static inline int
> xsm_domain_resource_map(xsm_default_t def, struct domain *d)
>      return xsm_ops->domain_resource_map(d);
>  }
> 
> +#ifdef CONFIG_ARGO
> +static inline xsm_argo_register_single_source(struct domain *d, struct
> domain *t)
> +{
> +    return xsm_ops->argo_register_single_source(d, t);
> +}
> +
> +static inline xsm_argo_register_any_source(struct domain *d, bool strict)
> +{
> +    return xsm_ops->argo_register_any_source(d);
> +}
> +
> +#endif /* CONFIG_ARGO */
> +
>  #endif /* XSM_NO_WRAPPERS */
> 
>  #ifdef CONFIG_MULTIBOOT
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index 5701047..ed236b0 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -152,4 +152,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
>  #endif
>      set_to_dummy_if_null(ops, xen_version);
>      set_to_dummy_if_null(ops, domain_resource_map);
> +#ifdef CONFIG_ARGO
> +    set_to_dummy_if_null(ops, argo_register_single_source);
> +    set_to_dummy_if_null(ops, argo_register_any_source);
> +#endif
>  }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 96d31aa..3166561 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1717,6 +1717,21 @@ static int flask_domain_resource_map(struct domain
> *d)
>      return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__RESOURCE_MAP);
>  }
> 
> +#ifdef CONFIG_ARGO
> +static int flask_argo_register_single_source(struct domain *d,
> +                                             struct domain *t)
> +{
> +    return domain_has_perm(d, t, SECCLASS_ARGO,
> +                           ARGO__REGISTER_SINGLE_SOURCE);
> +}
> +
> +static int flask_argo_register_any_source(struct domain *d)
> +{
> +    return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
> +                        ARGO__REGISTER_ANY_SOURCE, NULL);
> +}
> +#endif
> +
>  long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
>  int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
> 
> @@ -1851,6 +1866,10 @@ static struct xsm_operations flask_ops = {
>  #endif
>      .xen_version = flask_xen_version,
>      .domain_resource_map = flask_domain_resource_map,
> +#ifdef CONFIG_ARGO
> +    .argo_register_single_source = flask_argo_register_single_source,
> +    .argo_register_any_source = flask_argo_register_any_source,
> +#endif
>  };
> 
>  void __init flask_init(const void *policy_buffer, size_t policy_size)
> diff --git a/xen/xsm/flask/policy/access_vectors
> b/xen/xsm/flask/policy/access_vectors
> index 6fecfda..fb95c97 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -531,3 +531,14 @@ class version
>  # Xen build id
>      xen_build_id
>  }
> +
> +# Class argo is used to describe the Argo interdomain communication
> system.
> +class argo
> +{
> +    # Domain requesting registration of a communication ring
> +    # to receive messages from a specific other domain.
> +    register_single_source
> +    # Domain requesting registration of a communication ring
> +    # to receive messages from any other domain.
> +    register_any_source
> +}
> diff --git a/xen/xsm/flask/policy/security_classes
> b/xen/xsm/flask/policy/security_classes
> index cde4e1a..50ecbab 100644
> --- a/xen/xsm/flask/policy/security_classes
> +++ b/xen/xsm/flask/policy/security_classes
> @@ -19,5 +19,6 @@ class event
>  class grant
>  class security
>  class version
> +class argo
> 
>  # FLASK
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 12/25] xsm, argo: XSM control for argo message send operation
  2018-12-01  1:32 ` [PATCH 12/25] xsm, argo: XSM control for argo message send operation Christopher Clark
@ 2018-12-04  9:53   ` Paul Durrant
  0 siblings, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04  9:53 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Ross Philipson, Jason Andryuk, Daniel Smith,
	James McKenzie, Rich Persaud, Daniel De Graaf, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>; Paul Durrant
> <Paul.Durrant@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> Rich Persaud <persaur@gmail.com>; Ross Philipson
> <ross.philipson@gmail.com>; Eric Chanudet <eric.chanudet@gmail.com>; James
> McKenzie <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>;
> Daniel Smith <dpsmith@apertussolutions.com>
> Subject: [PATCH 12/25] xsm, argo: XSM control for argo message send
> operation
> 
> Default policy: allow.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
>  xen/include/xsm/dummy.h             | 5 +++++
>  xen/include/xsm/xsm.h               | 6 ++++++
>  xen/xsm/dummy.c                     | 1 +
>  xen/xsm/flask/hooks.c               | 7 +++++++
>  xen/xsm/flask/policy/access_vectors | 2 ++
>  5 files changed, 21 insertions(+)
> 
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index 55113c3..85965fc 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -733,6 +733,11 @@ static XSM_INLINE int
> xsm_argo_register_any_source(struct domain *d,
>      return strict ? -EPERM : 0;
>  }
> 
> +static XSM_INLINE int xsm_argo_send(struct domain *d, struct domain *t)
> +{
> +    return 0;
> +}
> +
>  #endif /* CONFIG_ARGO */
> 
>  #include <public/version.h>
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 65577fd..470e7c3 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -184,6 +184,7 @@ struct xsm_operations {
>  #ifdef CONFIG_ARGO
>      int (*argo_register_single_source) (struct domain *d, struct domain
> *t);
>      int (*argo_register_any_source) (struct domain *d);
> +    int (*argo_send) (struct domain *d, struct domain *t);
>  #endif
>  };
> 
> @@ -713,6 +714,11 @@ static inline xsm_argo_register_any_source(struct
> domain *d, bool strict)
>      return xsm_ops->argo_register_any_source(d);
>  }
> 
> +static inline int xsm_argo_send(struct domain *d, struct domain *t)
> +{
> +    return xsm_ops->argo_send(d, t);
> +}
> +
>  #endif /* CONFIG_ARGO */
> 
>  #endif /* XSM_NO_WRAPPERS */
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index ed236b0..ffac774 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -155,5 +155,6 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
>  #ifdef CONFIG_ARGO
>      set_to_dummy_if_null(ops, argo_register_single_source);
>      set_to_dummy_if_null(ops, argo_register_any_source);
> +    set_to_dummy_if_null(ops, argo_send);
>  #endif
>  }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 3166561..7b4e5ff 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1730,6 +1730,12 @@ static int flask_argo_register_any_source(struct
> domain *d)
>      return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
>                          ARGO__REGISTER_ANY_SOURCE, NULL);
>  }
> +
> +static int flask_argo_send(struct domain *d, struct domain *t)
> +{
> +    return domain_has_perm(d, t, SECCLASS_ARGO, ARGO__SEND);
> +}
> +
>  #endif
> 
>  long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
> @@ -1869,6 +1875,7 @@ static struct xsm_operations flask_ops = {
>  #ifdef CONFIG_ARGO
>      .argo_register_single_source = flask_argo_register_single_source,
>      .argo_register_any_source = flask_argo_register_any_source,
> +    .argo_send = flask_argo_send,
>  #endif
>  };
> 
> diff --git a/xen/xsm/flask/policy/access_vectors
> b/xen/xsm/flask/policy/access_vectors
> index fb95c97..f6c5377 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -541,4 +541,6 @@ class argo
>      # Domain requesting registration of a communication ring
>      # to receive messages from any other domain.
>      register_any_source
> +    # Domain sending a message to another domain.
> +    send
>  }
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  2018-12-04  9:10     ` Christopher Clark
@ 2018-12-04 10:04       ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-04 10:04 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 04.12.18 at 10:10, <christopher.w.clark@gmail.com> wrote:
> On Mon, Dec 3, 2018 at 7:42 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
>> > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html 
>> > describes these codes thus:
>> >     EMSGSIZE     : "Message too large"
>> >     ECONNREFUSED : "Connection refused".
>>
>> If you were to go solely by what POSIX mandates to have, more
>> additions would be necessary afaict. We had limited ourselves to
>> some basic set, so selective additions need further rationale put
>> here. The more that for both added error codes the use case in
>> the hypervisor isn't immediately obvious.
> 
> Thanks for reviewing the series and the previous iterations of this work.
> 
> I note your other message indicating a preference for including these
> changes at point of first use and I will do so in the next revision.

Actually, for the error code additions here I wouldn't insist on
them getting folded into patches using them, as long as it is
explained well here why the additions are desired.

> An aside before the rationales below: part of the motivation for
> selection of these error codes is to continue alignment with the
> modern v4v implementation in uxen where possible.
> 
> EMSGSIZE:
> 
> This series proposes to return EMSGSIZE for a sendv operation (patch
> #15) where an excess amount of data, across all iovs, has been
> supplied, exceeding either the statically configured maximum size of a
> transmittable message, or the (variable) size of the ring registered
> by another domain.

Ah yes, for a send-like operation I can see its use.

> ECONNREFUSED:
> 
> This series proposes to return ECONNREFUSED whenever a remote domain
> is specified that either does not exist or is not argo-enabled.
> This affects both the ring registration and sending data operations.
> (register op, patch #13; sendv op, patch #15)
> 
> ECONNREFUSED seems plausible for this use as it is determined by the
> remote domain state within the hypervisor.

Makes sense for the not argo-enabled case. The domain not
existing case, however, is nothing argo-specific, and we use a
pretty consistent -ESRCH in such cases, I think.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
  2018-12-02 20:10   ` Julien Grall
@ 2018-12-04 10:57   ` Paul Durrant
  2018-12-12  9:48   ` Jan Beulich
  2018-12-12 16:47   ` Roger Pau Monné
  3 siblings, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04 10:57 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet, Roger Pau Monne

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; Wei Liu
> <wei.liu2@citrix.com>; Roger Pau Monne <roger.pau@citrix.com>; Paul
> Durrant <Paul.Durrant@citrix.com>; Rich Persaud <persaur@gmail.com>; Ross
> Philipson <ross.philipson@gmail.com>; Eric Chanudet
> <eric.chanudet@gmail.com>; James McKenzie <voreekf@madingley.org>; Jason
> Andryuk <jandryuk@gmail.com>; Daniel Smith <dpsmith@apertussolutions.com>
> Subject: [PATCH 13/25] argo: implement the register op
> 
> Used by a domain to register a region of memory for receiving messages
> from
> either a specified other domain, or, if specifying a wildcard, any domain.
> 
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits,
> the
> hypervisor will use this mapping to copy data from a sending domain into
> this
> registered ring, making it accessible to the domain that registered the
> ring to
> receive data.
> 
> In this code, the p2m type of the memory supplied by the guest for the
> ring
> must be p2m_ram_rw, which is a conservative choice made to defer the need
> to
> reason about the other p2m types with this commit.
> 
> argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit
> on
> all architectures, to assist with avoiding the need to add a compat ABI.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c                  | 498
> +++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h          |  64 +++++
>  4 files changed, 566 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 2a95e09..f4e82cf 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -25,6 +25,7 @@
>  #include <xen/guest_access.h>
>  #include <xen/time.h>
> 
> +DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
> 
> @@ -98,6 +99,25 @@ struct argo_domain
>  };
> 
>  /*
> + * Helper functions
> + */
> +
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)
> +{
> +    uint16_t ret;
> +
> +    ret = (uint16_t)(id->addr.port >> 16);
> +    ret ^= (uint16_t)id->addr.port;
> +    ret ^= id->addr.domain_id;
> +    ret ^= id->partner;
> +
> +    ret &= (ARGO_HTABLE_SIZE - 1);
> +
> +    return ret;
> +}
> +
> +/*
>   * locks
>   */
> 
> @@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info)
>      }
>  }
> 
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +                   uint8_t **page)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map
> page"
> +               " %u of %u\n", ring_info->id.addr.domain_id,
> +               ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +               i, ring_info->nmfns);
> +        return -EFAULT;

-ENOMEM? The seems to be the conventional errno to use when a global mapping fails.

> +    }
> +    ASSERT(ring_info->mfns);
> +    ASSERT(ring_info->mfn_mapping);
> +
> +    if ( !ring_info->mfn_mapping[i] )
> +    {
> +        /*
> +         * TODO:
> +         * The first page of the ring contains the ring indices, so both
> read and
> +         * write access to the page is required by the hypervisor, but
> read-access
> +         * is not needed for this mapping for the remainder of the ring.
> +         * Since this mapping will remain resident in Xen's address space
> for
> +         * the lifetime of the ring, and following the principle of least
> privilege,
> +         * it could be preferable to:
> +         *  # add a XSM check to determine what policy is wanted here
> +         *  # depending on the XSM query, optionally create this mapping
> as
> +         *    _write-only_ on platforms that can support it.
> +         *    (eg. Intel EPT/AMD NPT).
> +         */
> +        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info-
> >mfns[i]);
> +
> +        if ( !ring_info->mfn_mapping[i] )
> +        {
> +            printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to
> map page"
> +                   " %u of %u\n", ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner,
> ring_info,
> +                   i, ring_info->nmfns);
> +            return -EFAULT;

Same here.

> +        }
> +        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +               mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +    }
> +
> +    if ( page )
> +        *page = ring_info->mfn_mapping[i];

Blank line here.

> +    return 0;
> +}
> +
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +    uint8_t *dst;
> +    uint32_t *p;
> +    int ret;
> +
> +    ret = argo_ring_map_page(ring_info, 0, &dst);
> +    if ( ret )
> +        return ret;
> +
> +    p = (uint32_t *)(dst + offsetof(argo_ring_t, tx_ptr));
> +    write_atomic(p, tx_ptr);
> +    mb();

Blank line here.

> +    return 0;
> +}
> +
>  /*
>   * pending
>   */
> @@ -231,6 +319,388 @@ argo_ring_remove_info(struct domain *d, struct
> argo_ring_info *ring_info)
>      xfree(ring_info);
>  }
> 
> +/*
> + * ring
> + */
> +
> +static int
> +argo_find_ring_mfn(struct domain *d, argo_pfn_t pfn, mfn_t *mfn)
> +{
> +    p2m_type_t p2mt;
> +    int ret = 0;
> +
> +#ifdef CONFIG_X86
> +    *mfn = get_gfn_unshare(d, pfn, &p2mt);
> +#else
> +    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> +#endif
> +
> +    if ( !mfn_valid(*mfn) )
> +        ret = -EINVAL;
> +#ifdef CONFIG_X86
> +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> +        ret = -EAGAIN;
> +#endif
> +    else if ( (p2mt != p2m_ram_rw) ||
> +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)
> )
> +        ret = -EINVAL;
> +
> +#ifdef CONFIG_X86
> +    put_gfn(d, pfn);
> +#endif
> +
> +    return ret;
> +}
> +
> +static int
> +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t)
> pfn_hnd,
> +                    uint32_t len)
> +{
> +    int i;

unsigned?

> +    int ret = 0;
> +
> +    if ( (npage << PAGE_SHIFT) < len )

Overflow check?

> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /*
> +         * Ring already existed. Check if it's the same ring,
> +         * i.e. same number of pages and all translated gpfns still
> +         * translating to the same mfns
> +         */
> +        if ( ring_info->npage != npage )
> +            i = ring_info->nmfns + 1; /* forces re-register below */
> +        else
> +        {
> +            for ( i = 0; i < ring_info->nmfns; i++ )
> +            {
> +                argo_pfn_t pfn;
> +                mfn_t mfn;
> +
> +                ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +                if ( ret )
> +                    break;
> +
> +                ret = argo_find_ring_mfn(d, pfn, &mfn);
> +                if ( ret )
> +                    break;
> +
> +                if ( mfn_x(mfn) != mfn_x(ring_info->mfns[i]) )

Use mfn_eq()

> +                    break;
> +            }
> +        }
> +        if ( i != ring_info->nmfns )
> +        {
> +            printk(XENLOG_INFO "argo: vm%u re-registering existing argo
> ring"

Would XENLOG_G_WARNING not be more appropriate (or using gprintk with XENLOG_WARNING)?

> +                   " (vm%u:%x vm%d), clearing MFN list\n",
> +                   current->domain->domain_id, ring_info-
> >id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner);
> +
> +            argo_ring_remove_mfns(d, ring_info);
> +            ASSERT(!ring_info->mfns);
> +        }
> +    }
> +
> +    if ( !ring_info->mfns )
> +    {
> +        mfn_t *mfns;
> +        uint8_t **mfn_mapping;
> +
> +        mfns = xmalloc_array(mfn_t, npage);
> +        if ( !mfns )
> +            return -ENOMEM;
> +
> +        for ( i = 0; i < npage; i++ )
> +            mfns[i] = INVALID_MFN;
> +
> +        mfn_mapping = xmalloc_array(uint8_t *, npage);
> +        if ( !mfn_mapping )
> +        {
> +            xfree(mfns);
> +            return -ENOMEM;
> +        }
> +
> +        ring_info->npage = npage;
> +        ring_info->mfns = mfns;
> +        ring_info->mfn_mapping = mfn_mapping;
> +    }
> +    ASSERT(ring_info->npage == npage);
> +
> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;
> +
> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
> +    {
> +        argo_pfn_t pfn;
> +        mfn_t mfn;
> +
> +        ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +        if ( ret )
> +            break;
> +
> +        ret = argo_find_ring_mfn(d, pfn, &mfn);
> +        if ( ret )
> +        {
> +            printk(XENLOG_ERR "argo: vm%u passed invalid gpfn
> %"PRI_xen_pfn
> +                   " ring (vm%u:%x vm%d) %p seq %d of %d\n",
> +                   d->domain_id, pfn, ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner,
> +                   ring_info, i, ring_info->npage);
> +            break;

gprintk()?

> +        }
> +
> +        ring_info->mfns[i] = mfn;
> +        ring_info->nmfns = i + 1;

Since you don't return from within this loop (only break out), can you not set 'nmfns' to 'i' after the loop terminates rather than (re)setting it on every iteration?

> +
> +        argo_dprintk("%d: %"PRI_xen_pfn" -> %"PRI_mfn"\n",
> +               i, pfn, mfn_x(ring_info->mfns[i]));
> +
> +        ring_info->mfn_mapping[i] = NULL;
> +    }
> +
> +    if ( ret )
> +        argo_ring_remove_mfns(d, ring_info);
> +    else
> +    {
> +        ASSERT(ring_info->nmfns == ring_info->npage);
> +
> +        printk(XENLOG_ERR "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping
> %p"

gprintk()?

> +               " npage %d nmfns %d\n", current->domain->domain_id,
> +               ring_info->id.addr.domain_id, ring_info->id.addr.port,
> +               ring_info->id.partner, ring_info, ring_info->mfn_mapping,
> +               ring_info->npage, ring_info->nmfns);
> +    }

Blank line here.

> +    return ret;
> +}
> +
> +static struct argo_ring_info *
> +argo_ring_find_info(const struct domain *d, const struct argo_ring_id
> *id)
> +{
> +    uint16_t hash;
> +    struct hlist_node *node;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    hash = argo_hash_fn(id);
> +
> +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%d]=%p id=%p\n",
> +                 d->argo, hash, d->argo->ring_hash[hash].first, id);
> +    argo_dprintk("id.addr.port=%d id.addr.domain=vm%u"
> +                 " id.addr.partner=vm%d\n",
> +                 id->addr.port, id->addr.domain_id, id->partner);
> +
> +    hlist_for_each_entry(ring_info, node, &d->argo->ring_hash[hash],
> node)
> +    {
> +        argo_ring_id_t *cmpid = &ring_info->id;
> +
> +        if ( cmpid->addr.port == id->addr.port &&
> +             cmpid->addr.domain_id == id->addr.domain_id &&
> +             cmpid->partner == id->partner )
> +        {
> +            argo_dprintk("ring_info=%p\n", ring_info);
> +            return ring_info;
> +        }
> +    }
> +    argo_dprintk("no ring_info found\n");
> +
> +    return NULL;
> +}
> +
> +static long
> +argo_register_ring(struct domain *d,
> +                   XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd,
> +                   XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, uint32_t
> npage,
> +                   bool fail_exist)
> +{
> +    struct argo_ring ring;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +    bool update_tx_ptr = 0;
> +    uint64_t dst_domain_cookie = 0;
> +
> +    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> +        return -EINVAL;
> +
> +    read_lock (&argo_lock);

Stray space.

> +
> +    do {
> +        if ( !d->argo )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        if ( copy_from_guest(&ring, ring_hnd, 1) )
> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        if ( ring.magic != ARGO_RING_MAGIC )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( (ring.len < (sizeof(struct argo_ring_message_header)
> +                          + ARGO_ROUNDUP(1) + ARGO_ROUNDUP(1)))   ||
> +             (ARGO_ROUNDUP(ring.len) != ring.len) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( ring.len > ARGO_MAX_RING_SIZE )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( ring.id.partner == ARGO_DOMID_ANY )
> +        {
> +            ret = xsm_argo_register_any_source(d,
> argo_mac_bootparam_enforcing);
> +            if ( ret )
> +                break;
> +        }
> +        else
> +        {
> +            struct domain *dst_d = get_domain_by_id(ring.id.partner);

Blank line here.

> +            if ( !dst_d )
> +            {
> +                argo_dprintk("!dst_d, ECONNREFUSED\n");
> +                ret = -ECONNREFUSED;
> +                break;
> +            }
> +
> +            ret = xsm_argo_register_single_source(d, dst_d);
> +            if ( ret )
> +            {
> +                put_domain(dst_d);
> +                break;
> +            }
> +
> +            if ( !dst_d->argo )
> +            {
> +                argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +                ret = -ECONNREFUSED;
> +                put_domain(dst_d);
> +                break;
> +            }
> +
> +            dst_domain_cookie = dst_d->argo->domain_cookie;
> +
> +            put_domain(dst_d);
> +        }
> +
> +        ring.id.addr.domain_id = d->domain_id;
> +        if ( copy_field_to_guest(ring_hnd, &ring, id) )
> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        /*
> +         * no need for a lock yet, because only we know about this
> +         * set the tx pointer if it looks bogus (we don't reset it
> +         * because this might be a re-register after S4)
> +         */
> +
> +        if ( ring.tx_ptr >= ring.len ||
> +             ARGO_ROUNDUP(ring.tx_ptr) != ring.tx_ptr )
> +        {
> +            /*
> +             * Since the ring is a mess, attempt to flush the contents of
> it
> +             * here by setting the tx_ptr to the next aligned message
> slot past
> +             * the latest rx_ptr we have observed. Handle ring wrap
> correctly.
> +             */
> +            ring.tx_ptr = ARGO_ROUNDUP(ring.rx_ptr);
> +
> +            if ( ring.tx_ptr >= ring.len )
> +                ring.tx_ptr = 0;
> +
> +            /* ring.tx_ptr will be written back to the guest ring below.
> */
> +            update_tx_ptr = 1;
> +        }
> +
> +        /* W(L2) protects all the elements of the domain's ring_info */
> +        write_lock(&d->argo->lock);
> +
> +        do {
> +            ring_info = argo_ring_find_info(d, &ring.id);
> +
> +            if ( !ring_info )
> +            {
> +                uint16_t hash;
> +
> +                ring_info = xmalloc(struct argo_ring_info);
> +                if ( !ring_info )
> +                {
> +                    ret = -ENOMEM;
> +                    break;
> +                }
> +
> +                spin_lock_init(&ring_info->lock);
> +
> +                ring_info->mfns = NULL;
> +                ring_info->npage = 0;
> +                ring_info->mfn_mapping = NULL;
> +                ring_info->len = 0;
> +                ring_info->nmfns = 0;
> +                ring_info->tx_ptr = 0;
> +                ring_info->partner_cookie = dst_domain_cookie;
> +
> +                ring_info->id = ring.id;
> +                INIT_HLIST_HEAD(&ring_info->pending);
> +
> +                hash = argo_hash_fn(&ring_info->id);
> +                hlist_add_head(&ring_info->node, &d->argo-
> >ring_hash[hash]);
> +
> +                printk(XENLOG_INFO "argo: vm%u registering ring (vm%u:%x
> vm%d)\n",
> +                       current->domain->domain_id,
> ring.id.addr.domain_id,
> +                       ring.id.addr.port, ring.id.partner);

Unqualified XENLOG_INFO printk()... do we really want this?

> +            }
> +            else
> +            {
> +                /*
> +                 * If the caller specified that the ring must not already
> exist,
> +                 * fail at attempt to add a completed ring which already
> exists.
> +                 */
> +                if ( fail_exist && ring_info->len )
> +                {
> +                    ret = -EEXIST;
> +                    break;
> +                }
> +
> +                printk(XENLOG_INFO
> +                    "argo: vm%u re-registering existing ring (vm%u:%x
> vm%d)\n",
> +                     current->domain->domain_id, ring.id.addr.domain_id,
> +                     ring.id.addr.port, ring.id.partner);

Same here.

> +            }
> +
> +            /* Since we hold W(L2), there is no need to take L3 here */
> +            ring_info->tx_ptr = ring.tx_ptr;
> +
> +            ret = argo_find_ring_mfns(d, ring_info, npage, pfn_hnd,
> ring.len);
> +            if ( !ret )
> +                ret = update_tx_ptr ? argo_update_tx_ptr(ring_info,
> ring.tx_ptr)
> +                                    : argo_ring_map_page(ring_info, 0,
> NULL);
> +            if ( !ret )
> +                ring_info->len = ring.len;
> +
> +        } while ( 0 );
> +
> +        write_unlock(&d->argo->lock);
> +
> +    } while ( 0 );
> +
> +    read_unlock(&argo_lock);

Why the use of do-while-zero constructs here? Use of goto-error-lable is far more conventional in the rest of Xen, and it would reduce indentation.

> +
> +    return ret;
> +}
> +
>  long
>  do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>                     XEN_GUEST_HANDLE_PARAM(void) arg2,
> @@ -253,6 +723,34 @@ do_argo_message_op(int cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg1,
> 
>      switch (cmd)
>      {
> +    case ARGO_MESSAGE_OP_register_ring:
> +    {
> +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> +            guest_handle_cast(arg1, argo_ring_t);
> +        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> +            guest_handle_cast(arg2, argo_pfn_t);
> +        uint32_t npage = arg3;
> +        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> +
> +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> +            break;
> +        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> +            break;
> +        /* arg4: reserve currently-undefined bits, require zero.  */
> +        if ( unlikely(arg4 & ~ARGO_REGISTER_FLAG_MASK) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = argo_register_ring(d, ring_hnd, pfn_hnd, npage, fail_exist);
> +        break;
> +    }
>      default:
>          rc = -ENOSYS;
>          break;
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-
> arm/guest_access.h
> index 1137c54..98006f8 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -34,6 +34,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t
> ipa, void *buf,
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> 
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p &
> (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-
> x86/guest_access.h
> index 9391cd3..e9d25d6 100644
> --- a/xen/include/asm-x86/guest_access.h
> +++ b/xen/include/asm-x86/guest_access.h
> @@ -50,6 +50,8 @@
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> 
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p &
> (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 20dabc0..5ad8e2b 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -21,6 +21,20 @@
> 
>  #include "xen.h"
> 
> +#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
> +
> +#define ARGO_DOMID_ANY           DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x1000000 or 16777216 bytes.

Why not just define as a hex value?

  Paul

> + * A byte index into the ring is at most 24 bits.
> + */
> +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> +
> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> +typedef uint64_t argo_pfn_t;
> +
>  typedef struct argo_addr
>  {
>      uint32_t port;
> @@ -52,4 +66,54 @@ typedef struct argo_ring
>  #endif
>  } argo_ring_t;
> 
> +/*
> + * Messages on the ring are padded to 128 bits
> + * Len here refers to the exact length of the data not including the
> + * 128 bit header. The message uses
> + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> + * Using typeof(a) make clear that this does not truncate any high-order
> bits.
> + */
> +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
> +
> +struct argo_ring_message_header
> +{
> +    uint32_t len;
> +    argo_addr_t source;
> +    uint32_t message_type;
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +    uint8_t data[];
> +#elif defined(__GNUC__)
> +    uint8_t data[0];
> +#endif
> +};
> +
> +/*
> + * Hypercall operations
> + */
> +
> +/*
> + * ARGO_MESSAGE_OP_register_ring
> + *
> + * Register a ring using the indicated memory.
> + * Also used to reregister an existing ring (eg. after resume from
> sleep).
> + *
> + * arg1: XEN_GUEST_HANDLE(argo_ring_t)
> + * arg2: XEN_GUEST_HANDLE(argo_pfn_t)
> + * arg3: uint32_t npages
> + * arg4: uint32_t flags
> + */
> +#define ARGO_MESSAGE_OP_register_ring     1
> +
> +/* Register op flags */
> +/*
> + * Fail exist:
> + * If set, reject attempts to (re)register an existing established ring.
> + * If clear, reregistration occurs if the ring exists, with the new ring
> + * taking the place of the old, preserving tx_ptr if it remains valid.
> + */
> +#define ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
> +
> +/* Mask for all defined flags */
> +#define ARGO_REGISTER_FLAG_MASK ARGO_REGISTER_FLAG_FAIL_EXIST
> +
>  #endif
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 14/25] argo: implement the unregister op
  2018-12-01  1:32 ` [PATCH 14/25] argo: implement the unregister op Christopher Clark
@ 2018-12-04 11:10   ` Paul Durrant
  2018-12-12  9:51   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04 11:10 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; Wei Liu
> <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; Rich
> Persaud <persaur@gmail.com>; Ross Philipson <ross.philipson@gmail.com>;
> Eric Chanudet <eric.chanudet@gmail.com>; James McKenzie
> <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>; Daniel Smith
> <dpsmith@apertussolutions.com>
> Subject: [PATCH 14/25] argo: implement the unregister op
> 
> Takes a single argument: a handle to the registered ring.
> 
> The ring's entry is removed from the hashtable of registered rings;
> any entries for pending notifications are removed; and the ring is
> unmapped from Xen's address space.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c         | 62
> +++++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/argo.h |  9 +++++++
>  2 files changed, 71 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index f4e82cf..387e650 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -510,6 +510,59 @@ argo_ring_find_info(const struct domain *d, const
> struct argo_ring_id *id)
>  }
> 
>  static long
> +argo_unregister_ring(struct domain *d,
> +                     XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd)
> +{
> +    struct argo_ring ring;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +
> +    read_lock(&argo_lock);
> +
> +    do {
> +        if ( !d->argo )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        ret = copy_from_guest_errno(&ring, ring_hnd, 1);
> +        if ( ret )
> +            break;
> +
> +        if ( ring.magic != ARGO_RING_MAGIC )
> +        {
> +            argo_dprintk(
> +                "ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx),
> EINVAL\n",
> +                ring.magic, ARGO_RING_MAGIC);
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        ring.id.addr.domain_id = d->domain_id;
> +
> +        write_lock(&d->argo->lock);
> +
> +        ring_info = argo_ring_find_info(d, &ring.id);
> +        if ( ring_info )
> +            argo_ring_remove_info(d, ring_info);
> +
> +        write_unlock(&d->argo->lock);
> +
> +        if ( !ring_info )
> +        {
> +            argo_dprintk("ENOENT\n");
> +            ret = -ENOENT;
> +            break;
> +        }
> +

Stray blank line?

> +    } while ( 0 );
> +

Again, forward goto style is more conventional.

> +    read_unlock(&argo_lock);
> +    return ret;
> +}
> +
> +static long
>  argo_register_ring(struct domain *d,
>                     XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd,
>                     XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, uint32_t
> npage,
> @@ -751,6 +804,15 @@ do_argo_message_op(int cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg1,
>          rc = argo_register_ring(d, ring_hnd, pfn_hnd, npage, fail_exist);
>          break;
>      }
> +    case ARGO_MESSAGE_OP_unregister_ring:
> +    {
> +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> +            guest_handle_cast(arg1, argo_ring_t);
> +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> +            break;
> +        rc = argo_unregister_ring(d, ring_hnd);
> +        break;
> +    }
>      default:
>          rc = -ENOSYS;
>          break;
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 5ad8e2b..6cf10a8 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -116,4 +116,13 @@ struct argo_ring_message_header
>  /* Mask for all defined flags */
>  #define ARGO_REGISTER_FLAG_MASK ARGO_REGISTER_FLAG_FAIL_EXIST
> 
> +/*
> + * ARGO_MESSAGE_OP_unregister_ring
> + *
> + * Unregister a previously-registered ring, ending communication.
> + *
> + * arg1: XEN_GUEST_HANDLE(argo_ring_t)
> + */
> +#define ARGO_MESSAGE_OP_unregister_ring     2
> +
>  #endif
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2018-12-01  1:32 ` [PATCH 15/25] argo: implement the sendv op Christopher Clark
@ 2018-12-04 11:22   ` Paul Durrant
  2018-12-12 11:52   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2018-12-04 11:22 UTC (permalink / raw)
  To: 'Christopher Clark', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, Eric Chanudet

> -----Original Message-----
> From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; Wei Liu
> <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; Rich
> Persaud <persaur@gmail.com>; Ross Philipson <ross.philipson@gmail.com>;
> Eric Chanudet <eric.chanudet@gmail.com>; James McKenzie
> <voreekf@madingley.org>; Jason Andryuk <jandryuk@gmail.com>; Daniel Smith
> <dpsmith@apertussolutions.com>
> Subject: [PATCH 15/25] argo: implement the sendv op
> 
> sendv operation is invoked to perform a synchronous send of buffers
> contained in iovs to a remote domain's registered ring.
> 
> It takes:
>  * A destination address (domid, port) for the ring to send to.
>    It performs a most-specific match lookup, to allow for wildcard.
>  * A source address, used to inform the destination of where to reply.
>  * The address of an array of iovs containing the data to send
>  * .. and the length of that array of iovs
>  * and a 32-bit message type, available to communicate message context
>    data (eg. kernel-to-kernel, separate from the application data).
> 
> If insufficient space exists in the destination ring, it will return -
> EAGAIN
> and Xen will notify the caller when sufficient space becomes available.
> 
> Accesses to the ring indices are appropriately atomic. The rings are
> mapped into Xen's private address space to write as needed and the
> mappings are retained for later use.
> 
> When locating the destination ring, a check is performed via a cookie
> installed at ring registration time, to ensure that the source domain
> is the same as it was when the ring was registered.
> 
> Fixed-size types are used in some areas within this code where caution
> around avoiding integer overflow is important.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c         | 528
> ++++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/argo.h |  59 ++++++
>  2 files changed, 587 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 387e650..0c3972c 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -24,10 +24,13 @@
>  #include <xen/domain_page.h>
>  #include <xen/guest_access.h>
>  #include <xen/time.h>
> +#include <xsm/xsm.h>
> 
>  DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(argo_send_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
> +DEFINE_XEN_GUEST_HANDLE(uint8_t);
> 
>  /* Xen command line option to enable argo */
>  static bool __read_mostly opt_argo_enabled = 0;
> @@ -166,6 +169,21 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
>  #endif
> 
>  /*
> + * Event channel
> + */
> +
> +static void
> +argo_signal_domain(struct domain *d)
> +{
> +    argo_dprintk("signalling domid:%d\n", d->domain_id);
> +
> +    if ( !d->argo ) /* This can happen if the domain is being destroyed
> */
> +        return;
> +
> +    evtchn_send(d, d->argo->evtchn_port);
> +}
> +
> +/*
>   * ring buffer
>   */
> 
> @@ -259,6 +277,333 @@ argo_update_tx_ptr(struct argo_ring_info *ring_info,
> uint32_t tx_ptr)
>      return 0;
>  }
> 
> +static int
> +argo_memcpy_to_guest_ring(struct argo_ring_info *ring_info,
> +                          uint32_t offset,
> +                          const void *src,
> +                          XEN_GUEST_HANDLE(uint8_t) src_hnd,
> +                          uint32_t len)
> +{
> +    int page = offset >> PAGE_SHIFT;

unsigned?

> +    uint8_t *dst;
> +    int ret;
> +    unsigned int src_offset = 0;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    offset &= ~PAGE_MASK;
> +
> +    if ( (len > ARGO_MAX_RING_SIZE) || (offset > ARGO_MAX_RING_SIZE) )
> +        return -EFAULT;
> +
> +    while ( (offset + len) > PAGE_SIZE )
> +    {
> +        ret = argo_ring_map_page(ring_info, page, &dst);
> +        if ( ret )
> +            return ret;
> +
> +        if ( src )
> +        {
> +            memcpy(dst + offset, src + src_offset, PAGE_SIZE - offset);
> +            src_offset += (PAGE_SIZE - offset);
> +        }
> +        else
> +        {
> +            ret = copy_from_guest_errno(dst + offset, src_hnd,
> +                                        PAGE_SIZE - offset);
> +            if ( ret )
> +                return ret;
> +
> +            guest_handle_add_offset(src_hnd, PAGE_SIZE - offset);
> +        }
> +
> +        page++;
> +        len -= PAGE_SIZE - offset;

A lot of "PAGE_SIZE - offset" here... maybe worth a local variable?

> +        offset = 0;
> +    }
> +
> +    ret = argo_ring_map_page(ring_info, page, &dst);
> +    if ( ret )
> +    {
> +        argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page"
> +               " %d of %d\n", ring_info->id.addr.domain_id,
> +               ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +               page, ring_info->nmfns);
> +        return ret;
> +    }
> +
> +    if ( src )
> +        memcpy(dst + offset, src + src_offset, len);
> +    else
> +        ret = copy_from_guest_errno(dst + offset, src_hnd, len);
> +
> +    return ret;
> +}
> +
> +static int
> +argo_ringbuf_get_rx_ptr(struct argo_ring_info *ring_info, uint32_t
> *rx_ptr)
> +{
> +    uint8_t *src;
> +    argo_ring_t *ringp;
> +    int ret;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    if ( !ring_info->nmfns || ring_info->nmfns < ring_info->npage )
> +        return -EINVAL;
> +
> +    ret = argo_ring_map_page(ring_info, 0, &src);
> +    if ( ret )
> +        return ret;
> +
> +    ringp = (argo_ring_t *)src;
> +
> +    *rx_ptr = read_atomic(&ringp->rx_ptr);
> +
> +    return 0;
> +}
> +
> +/*
> + * argo_sanitize_ring creates a modified copy of the ring pointers
> + * where the rx_ptr is rounded up to ensure it is aligned, and then
> + * ring wrap is handled. Simplifies safe use of the rx_ptr for
> + * available space calculation.
> + */
> +static void
> +argo_sanitize_ring(argo_ring_t *ring, const struct argo_ring_info
> *ring_info)
> +{
> +    uint32_t rx_ptr = ring->rx_ptr;
> +
> +    ring->tx_ptr = ring_info->tx_ptr;
> +    ring->len = ring_info->len;
> +
> +    rx_ptr = ARGO_ROUNDUP(rx_ptr);
> +    if ( rx_ptr >= ring_info->len )
> +        rx_ptr = 0;
> +
> +    ring->rx_ptr = rx_ptr;
> +}
> +
> +/*
> + * argo_iov_count returns its count on success via an out variable
> + * to avoid potential for a negative return value to be used incorrectly
> + * (eg. coerced into an unsigned variable resulting in a large incorrect
> value)
> + */
> +static int
> +argo_iov_count(XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> +               uint32_t *count)
> +{
> +    argo_iov_t iov;
> +    uint32_t sum_iov_lens = 0;
> +    int ret;
> +
> +    if ( niov > ARGO_MAXIOV )
> +        return -EINVAL;
> +
> +    while ( niov-- )
> +    {
> +        ret = copy_from_guest_errno(&iov, iovs, 1);
> +        if ( ret )
> +            return ret;
> +
> +        /* check each to protect sum against integer overflow */
> +        if ( iov.iov_len > ARGO_MAX_RING_SIZE )
> +            return -EINVAL;
> +
> +        sum_iov_lens += iov.iov_len;
> +
> +        /*
> +         * Again protect sum from integer overflow
> +         * and ensure total msg size will be within bounds.
> +         */
> +        if ( sum_iov_lens > ARGO_MAX_MSG_SIZE )
> +            return -EINVAL;
> +
> +        guest_handle_add_offset(iovs, 1);
> +    }
> +
> +    *count = sum_iov_lens;
> +    return 0;
> +}
> +
> +static int
> +argo_ringbuf_insert(struct domain *d,
> +                    struct argo_ring_info *ring_info,
> +                    const struct argo_ring_id *src_id,
> +                    XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t
> niov,
> +                    uint32_t message_type, unsigned long *out_len)
> +{
> +    argo_ring_t ring;
> +    struct argo_ring_message_header mh = { 0 };
> +    int32_t sp;
> +    int32_t ret = 0;
> +    uint32_t len;
> +    uint32_t iov_len;
> +    uint32_t sum_iov_len = 0;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    if ( (ret = argo_iov_count(iovs, niov, &len)) )
> +        return ret;
> +
> +    if ( ((ARGO_ROUNDUP(len) + sizeof (struct argo_ring_message_header) )
> >=
> +          ring_info->len)
> +         || (len > ARGO_MAX_MSG_SIZE) )
> +        return -EMSGSIZE;
> +
> +    do {
> +        ret =  argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr);
> +        if ( ret )
> +            break;
> +
> +        argo_sanitize_ring(&ring, ring_info);
> +
> +        argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d"
> +                     " ring_info->tx_ptr=%d\n",
> +                     ring.tx_ptr, ring.rx_ptr, ring.len, ring_info-
> >tx_ptr);
> +
> +        if ( ring.rx_ptr == ring.tx_ptr )
> +            sp = ring_info->len;
> +        else
> +        {
> +            sp = ring.rx_ptr - ring.tx_ptr;
> +            if ( sp < 0 )
> +                sp += ring.len;
> +        }
> +
> +        if ( (ARGO_ROUNDUP(len) + sizeof(struct
> argo_ring_message_header)) >= sp )
> +        {
> +            argo_dprintk("EAGAIN\n");
> +            ret = -EAGAIN;
> +            break;
> +        }
> +
> +        mh.len = len + sizeof(struct argo_ring_message_header);
> +        mh.source.port = src_id->addr.port;
> +        mh.source.domain_id = src_id->addr.domain_id;
> +        mh.message_type = message_type;
> +
> +        /*
> +         * For this copy to the guest ring, tx_ptr is always 16-byte
> aligned
> +         * and the message header is 16 bytes long.
> +         */
> +        BUILD_BUG_ON(sizeof(struct argo_ring_message_header) !=
> ARGO_ROUNDUP(1));
> +
> +        if ( (ret = argo_memcpy_to_guest_ring(ring_info,
> +                                              ring.tx_ptr +
> sizeof(argo_ring_t),
> +                                              &mh,
> +
> XEN_GUEST_HANDLE_NULL(uint8_t),
> +                                              sizeof(mh))) )
> +            break;
> +
> +        ring.tx_ptr += sizeof(mh);
> +        if ( ring.tx_ptr == ring_info->len )
> +            ring.tx_ptr = 0;
> +
> +        while ( niov-- )
> +        {
> +            XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd;
> +            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
> +            argo_iov_t iov;
> +
> +            ret = copy_from_guest_errno(&iov, iovs, 1);
> +            if ( ret )
> +                break;
> +
> +            bufp_hnd = guest_handle_from_ptr((uintptr_t)iov.iov_base,
> uint8_t);
> +            buf_hnd = guest_handle_from_param(bufp_hnd, uint8_t);
> +            iov_len = iov.iov_len;
> +
> +            if ( !iov_len )
> +            {
> +                printk(XENLOG_ERR "argo: iov.iov_len=0 iov.iov_base=%"
> +                       PRIx64" ring (vm%u:%x vm%d)\n",
> +                       iov.iov_base, ring_info->id.addr.domain_id,
> +                       ring_info->id.addr.port, ring_info->id.partner);
> +
> +                guest_handle_add_offset(iovs, 1);
> +                continue;
> +            }
> +
> +            if ( iov_len > ARGO_MAX_MSG_SIZE )
> +            {
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            sum_iov_len += iov_len;
> +            if ( sum_iov_len > len )
> +            {
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            if ( unlikely(!guest_handle_okay(buf_hnd, iov_len)) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            sp = ring.len - ring.tx_ptr;
> +
> +            if ( iov_len > sp )
> +            {
> +                ret = argo_memcpy_to_guest_ring(ring_info,
> +                        ring.tx_ptr + sizeof(argo_ring_t),
> +                        NULL, buf_hnd, sp);
> +                if ( ret )
> +                    break;
> +
> +                ring.tx_ptr = 0;
> +                iov_len -= sp;
> +                guest_handle_add_offset(buf_hnd, sp);
> +            }
> +
> +            ret = argo_memcpy_to_guest_ring(ring_info,
> +                        ring.tx_ptr + sizeof(argo_ring_t),
> +                        NULL, buf_hnd, iov_len);
> +            if ( ret )
> +                break;
> +
> +            ring.tx_ptr += iov_len;
> +
> +            if ( ring.tx_ptr == ring_info->len )
> +                ring.tx_ptr = 0;
> +
> +            guest_handle_add_offset(iovs, 1);
> +        }
> +
> +        if ( ret )
> +            break;
> +
> +        ring.tx_ptr = ARGO_ROUNDUP(ring.tx_ptr);
> +
> +        if ( ring.tx_ptr >= ring_info->len )
> +            ring.tx_ptr -= ring_info->len;
> +
> +        mb();
> +        ring_info->tx_ptr = ring.tx_ptr;
> +        if ( (ret = argo_update_tx_ptr(ring_info, ring.tx_ptr)) )
> +            break;
> +
> +    } while ( 0 );
> +

Again. Odd do-while-zero style... I won't mention this again.

  Paul

> +    /*
> +     * At this point it is possible to unmap the ring_info, ie:
> +     *   argo_ring_unmap(ring_info);
> +     * but performance should be improved by not doing so, and retaining
> +     * the mapping.
> +     * An XSM policy control over level of confidentiality required
> +     * versus performance cost could be added to decide that here.
> +     * See the similar comment in argo_ring_map_page re: write-only
> mappings.
> +     */
> +
> +    if ( !ret )
> +        *out_len = len;
> +
> +    return ret;
> +}
> +
>  /*
>   * pending
>   */
> @@ -282,6 +627,47 @@ argo_pending_remove_all(struct argo_ring_info
> *ring_info)
>      }
>  }
> 
> +static int
> +argo_pending_queue(struct argo_ring_info *ring_info, domid_t src_id, int
> len)
> +{
> +    struct argo_pending_ent *ent;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    ent = xmalloc(struct argo_pending_ent);
> +
> +    if ( !ent )
> +        return -ENOMEM;
> +
> +    ent->len = len;
> +    ent->id = src_id;
> +
> +    hlist_add_head(&ent->node, &ring_info->pending);
> +
> +    return 0;
> +}
> +
> +static int
> +argo_pending_requeue(struct argo_ring_info *ring_info, domid_t src_id,
> int len)
> +{
> +    struct hlist_node *node;
> +    struct argo_pending_ent *ent;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    hlist_for_each_entry(ent, node, &ring_info->pending, node)
> +    {
> +        if ( ent->id == src_id )
> +        {
> +            if ( ent->len < len )
> +                ent->len = len;
> +            return 0;
> +        }
> +    }
> +
> +    return argo_pending_queue(ring_info, src_id, len);
> +}
> +
>  static void argo_ring_remove_mfns(const struct domain *d,
>                                    struct argo_ring_info *ring_info)
>  {
> @@ -509,6 +895,28 @@ argo_ring_find_info(const struct domain *d, const
> struct argo_ring_id *id)
>      return NULL;
>  }
> 
> +static struct argo_ring_info *
> +argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
> +                             domid_t partner_id, uint64_t partner_cookie)
> +{
> +    argo_ring_id_t id;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    id.addr.port = port;
> +    id.addr.domain_id = d->domain_id;
> +    id.partner = partner_id;
> +
> +    ring_info = argo_ring_find_info(d, &id);
> +    if ( ring_info && (partner_cookie == ring_info->partner_cookie) )
> +        return ring_info;
> +
> +    id.partner = ARGO_DOMID_ANY;
> +
> +    return argo_ring_find_info(d, &id);
> +}
> +
>  static long
>  argo_unregister_ring(struct domain *d,
>                       XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd)
> @@ -754,6 +1162,103 @@ argo_register_ring(struct domain *d,
>      return ret;
>  }
> 
> +/*
> + * io
> + */
> +
> +static long
> +argo_sendv(struct domain *src_d, const argo_addr_t *src_addr,
> +           const argo_addr_t *dst_addr,
> +           XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint32_t niov,
> +           uint32_t message_type)
> +{
> +    struct domain *dst_d = NULL;
> +    struct argo_ring_id src_id;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +    unsigned long len = 0;
> +
> +    ASSERT(src_d->domain_id == src_addr->domain_id);
> +
> +    read_lock(&argo_lock);
> +
> +    do {
> +        if ( !src_d->argo )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        src_id.addr.pad = 0;
> +        src_id.addr.port = src_addr->port;
> +        src_id.addr.domain_id = src_d->domain_id;
> +        src_id.partner = dst_addr->domain_id;
> +
> +        dst_d = get_domain_by_id(dst_addr->domain_id);
> +        if ( !dst_d || !dst_d->argo )
> +        {
> +            argo_dprintk("!dst_d, ECONNREFUSED\n");
> +            ret = -ECONNREFUSED;
> +            break;
> +        }
> +
> +        ret = xsm_argo_send(src_d, dst_d);
> +        if ( ret )
> +        {
> +            printk(XENLOG_ERR "argo: XSM REJECTED %i -> %i\n",
> +                   src_addr->domain_id, dst_addr->domain_id);
> +            break;
> +        }
> +
> +        read_lock(&dst_d->argo->lock);
> +
> +        do {
> +            ring_info = argo_ring_find_info_by_match(dst_d, dst_addr-
> >port,
> +                                                 src_addr->domain_id,
> +                                                 src_d->argo-
> >domain_cookie);
> +            if ( !ring_info )
> +            {
> +                printk(XENLOG_ERR "argo: vm%u connection refused, "
> +                       "src (vm%u:%x) dst (vm%u:%x)\n",
> +                       current->domain->domain_id,
> +                       src_id.addr.domain_id, src_id.addr.port,
> +                       dst_addr->domain_id, dst_addr->port);
> +
> +                ret = -ECONNREFUSED;
> +                break;
> +            }
> +
> +            spin_lock(&ring_info->lock);
> +
> +            ret = argo_ringbuf_insert(dst_d, ring_info, &src_id,
> +                                      iovs, niov, message_type, &len);
> +            if ( ret == -EAGAIN )
> +            {
> +                argo_dprintk("argo_ringbuf_sendv failed, EAGAIN\n");
> +                /* requeue to issue a notification when space is there */
> +                if ( argo_pending_requeue(ring_info, src_addr->domain_id,
> len) )
> +                     ret = -ENOMEM;
> +            }
> +
> +            spin_unlock(&ring_info->lock);
> +
> +            if ( ret >= 0 )
> +                argo_signal_domain(dst_d);
> +
> +        } while ( 0 );
> +
> +        read_unlock(&dst_d->argo->lock);
> +
> +    } while ( 0 );
> +
> +    if ( dst_d )
> +        put_domain(dst_d);
> +
> +    read_unlock(&argo_lock);
> +
> +    return ( ret < 0 ) ? ret : len;
> +}
> +
>  long
>  do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>                     XEN_GUEST_HANDLE_PARAM(void) arg2,
> @@ -813,6 +1318,29 @@ do_argo_message_op(int cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg1,
>          rc = argo_unregister_ring(d, ring_hnd);
>          break;
>      }
> +    case ARGO_MESSAGE_OP_sendv:
> +    {
> +        argo_send_addr_t send_addr;
> +        uint32_t niov = arg3;
> +        uint32_t message_type = arg4;
> +
> +        XEN_GUEST_HANDLE_PARAM(argo_send_addr_t) send_addr_hnd =
> +            guest_handle_cast(arg1, argo_send_addr_t);
> +        XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs =
> +            guest_handle_cast(arg2, argo_iov_t);
> +
> +        if ( unlikely(!guest_handle_okay(send_addr_hnd, 1)) )
> +            break;
> +        rc = copy_from_guest_errno(&send_addr, send_addr_hnd, 1);
> +        if ( rc )
> +            break;
> +
> +        send_addr.src.domain_id = d->domain_id;
> +
> +        rc = argo_sendv(d, &send_addr.src, &send_addr.dst,
> +                        iovs, niov, message_type);
> +        break;
> +    }
>      default:
>          rc = -ENOSYS;
>          break;
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 6cf10a8..123efc5 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -32,6 +32,28 @@
>   */
>  #define ARGO_MAX_RING_SIZE  (16777216ULL)
> 
> +/*
> + * ARGO_MAXIOV : maximum number of iovs accepted in a single sendv.
> + * Rationale for the value:
> + * The Linux argo driver never passes more than two iovs.
> + * Linux defines UIO_MAXIOV as 1024.
> + * POSIX mandates at least 16 -- not that this is a POSIX API of course.
> + *
> + * Limit the total amount of data posted in a single argo operation to
> + * no more than 2^31 bytes to reduce risk of integer overflow defects.
> + * Each argo iov can hold ~ 2^24 bytes, so set ARGO_MAXIOV to 2^(31-24),
> + * minus one to enable simple efficient bounds checking via masking: 127.
> +*/
> +#define ARGO_MAXIOV          127U
> +
> +typedef struct argo_iov
> +{
> +    uint64_t iov_base;
> +    uint32_t iov_len;
> +    uint32_t pad;
> +} argo_iov_t;
> +DEFINE_XEN_GUEST_HANDLE(argo_iov_t);
> +
>  /* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
>  typedef uint64_t argo_pfn_t;
> 
> @@ -42,6 +64,12 @@ typedef struct argo_addr
>      uint16_t pad;
>  } argo_addr_t;
> 
> +typedef struct argo_send_addr
> +{
> +    argo_addr_t src;
> +    argo_addr_t dst;
> +} argo_send_addr_t;
> +
>  typedef struct argo_ring_id
>  {
>      struct argo_addr addr;
> @@ -125,4 +153,35 @@ struct argo_ring_message_header
>   */
>  #define ARGO_MESSAGE_OP_unregister_ring     2
> 
> +/*
> + * ARGO_MESSAGE_OP_sendv
> + *
> + * Send a list of buffers contained in iovs.
> + *
> + * The send address struct specifies the source and destination addresses
> + * for the message being sent, which are used to find the destination
> ring:
> + * Xen first looks for a most-specific match with a registered ring with
> + *  (id.addr == dst) and (id.partner == sending_domain) ;
> + * if that fails, it then looks for a wildcard match (aka multicast
> receiver)
> + * where (id.addr == dst) and (id.partner == DOMID_ANY).
> + *
> + * For each iov entry, send iov_len bytes from iov_base to the
> destination ring.
> + * If insufficient space exists in the destination ring, it will return -
> EAGAIN
> + * and Xen will notify the caller when sufficient space becomes
> available.
> + *
> + * The message type is a 32-bit data field available to communicate
> message
> + * context data (eg. kernel-to-kernel, rather than application layer).
> + *
> + * arg1: XEN_GUEST_HANDLE(argo_send_addr_t) source and dest addresses
> + * arg2: XEN_GUEST_HANDLE(argo_iov_t) iovs
> + * arg3: uint32_t niov
> + * arg4: uint32_t message type
> + */
> +#define ARGO_MESSAGE_OP_sendv               5
> +
> +/* The maximum size of a guest message that may be sent on an Argo ring.
> */
> +#define ARGO_MAX_MSG_SIZE ((ARGO_MAX_RING_SIZE) - \
> +        (sizeof(struct argo_ring_message_header)) - \
> +        ARGO_ROUNDUP(1))
> +
>  #endif
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable
  2018-12-01  1:32 ` [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable Christopher Clark
  2018-12-04  9:18   ` Paul Durrant
@ 2018-12-04 11:35   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-04 11:35 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -28,6 +28,10 @@
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
>  
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled = 0;

The initializer is pointless here, and if there was one then it
should be true or false.

> @@ -223,6 +227,13 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>      argo_dprintk("->do_argo_message_op(%d,%p,%p,%d,%d)\n", cmd,
>                   (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
>  
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -ENOSYS;
> +        argo_dprintk("<-do_argo_message_op()=%ld\n", rc);

While I can see debugging printk()s to be useful in certain places,
I question the utility of this and the other one. I also question the
use of -ENOSYS - I think you mean e.g. -EOPNOTSUPP.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 08/25] xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE
  2018-12-01  1:32 ` [PATCH 08/25] xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE Christopher Clark
@ 2018-12-04 11:39   ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-04 11:39 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>

Here and elsewhere: Such additions, which we've found no need for
till now, should not be submitted without giving a reason for why
they become necessary or at least desirable.

> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -982,6 +982,8 @@ typedef struct {
>  #define XEN_GUEST_HANDLE_64(name) XEN_GUEST_HANDLE(name)
>  #endif
>  
> +#define XEN_GUEST_HANDLE_NULL(name) (XEN_GUEST_HANDLE(name)){(name *)0}

Public headers are intended to be usable in C89 mode. While this
won't cause compilation to fail when not used, it still is a violation
of this principle. Furthermore the construct is incompletely
parenthesized.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-04  9:08     ` Christopher Clark
@ 2018-12-05 17:20       ` Julien Grall
  2018-12-05 22:35         ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2018-12-05 17:20 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	nd, eric chanudet, Roger Pau Monné

Hi Christoffer,

On 04/12/2018 09:08, Christopher Clark wrote:
> On Sun, Dec 2, 2018 at 12:11 PM Julien Grall <Julien.Grall@arm.com> wrote:
>>
>>
>>
>> On 01/12/2018 01:32, Christopher Clark wrote:
>>> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
>>> index 20dabc0..5ad8e2b 100644
>>> --- a/xen/include/public/argo.h
>>> +++ b/xen/include/public/argo.h
>>> @@ -21,6 +21,20 @@
>>>
>>>    #include "xen.h"
>>>
>>> +#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
>>> +
>>> +#define ARGO_DOMID_ANY           DOMID_INVALID
>>> +
>>> +/*
>>> + * The maximum size of an Argo ring is defined to be: 16GB
>>> + *  -- which is 0x1000000 or 16777216 bytes.
>>> + * A byte index into the ring is at most 24 bits.
>>> + */
>>> +#define ARGO_MAX_RING_SIZE  (16777216ULL)
>>> +
>>> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
>>> +typedef uint64_t argo_pfn_t;
>>
>> As you always use 64-bit, can we just use an address? This would make
>> the ABI agnostic to the hypervisor page granularity.
> 
> Thanks for reviewing this series.
> 
> I'm not sure yet that switching to using addresses instead would be
> for the best, so have been working through some reasoning about your
> suggestion. This interface is for the guest to identify to the
> hypervisor the list of frames of memory to use as the ring, and the
> purpose of a frame number is to uniquely identify a frame. Frame
> numbers, as opposed to addresses, are going to remain the same across
> all processors, independent of the page tables that happen to
> currently be in use.

Sorry I wasn't clear enough about the address. By address I meant guest physical 
address (and not guest virtual address).

guest virtual address would indeed be a pretty bad idea as you can't promise the
address would stay mapped forever. For a matter of fact, we already see some 
issues because of (K)PTI.

> 
> Where possible, translation should be performed by the guest rather
> than the hypervisor, minimizing the hypervisor logic (good for several
> reasons) - so it would be better to avoid adding the
> address-to-page-number walk and granularity handling in the hypervisor
> here. In this case, the guest has the incentive to do that work, given
> that it wants to register the ring.
> 
> (Slightly out of scope, but hopefully not for long: We have a
> near-term interest in using argo to communicate between VMs at
> different levels of nesting in L0/L1 nested hypervisors, and I suspect
> that frame number translation will end up being easier to handle
> across L0/L1 than translation of guest addresses in a VM running at
> the other level.)
> 
> Could you give a specific scenario you have in mind that is prompting a concern?

Arm processors may support multiple page granularity (4KB, 16KB, 64KB). The 
software is allowed to use different granularity at different level. This means 
that the hypervisor could use 4KB page while the guest kernel would use 64KB 
page (and vice versa). Some distros made the choice to only support one type of 
page granularity (i.e 64KB for REHL, 4KB for Debian...).

At the moment the hypercall interface is based on the hypervisor page 
granularity. Because Xen has always supported 4KB page-granularity, this 
assumption was also hardcoded in the kernel.

What prevent us to get 64KB page support in Xen (and therefore support for 
52-bit address) is the hypercall ABI. If you upgrade Xen to 64KB then the 
hypercall interface would defact use 64KB frame. This would break any current 
guest. It is also not possible to keep 4KB pages everywhere because you can only 
map 64KB in Xen. So you may map a bit too much from another guest.

This makes me think that the frame is probably not the best in that situation. 
Instead a pair of address/size would be more suitable.

The problem is much larger than this series. But I thought I would attempt to 
convince the community using guest physical address over guest frame address 
whenever it is possible.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-05 17:20       ` Julien Grall
@ 2018-12-05 22:35         ` Christopher Clark
  2018-12-11 13:51           ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-05 22:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	nd, eric chanudet, Roger Pau Monné

On Wed, Dec 5, 2018 at 9:20 AM Julien Grall <julien.grall@arm.com> wrote:
> On 04/12/2018 09:08, Christopher Clark wrote:
> > On Sun, Dec 2, 2018 at 12:11 PM Julien Grall <Julien.Grall@arm.com> wrote:
> >> On 01/12/2018 01:32, Christopher Clark wrote:
> >>> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> >>> ...
> >>> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> >>> +typedef uint64_t argo_pfn_t;
> >>
> >> As you always use 64-bit, can we just use an address? This would make
> >> the ABI agnostic to the hypervisor page granularity.
>
> By address I meant guest physical address (and not guest virtual address).
>
> Arm processors may support multiple page granularity (4KB, 16KB, 64KB). The
> software is allowed to use different granularity at different level. This means
> that the hypervisor could use 4KB page while the guest kernel would use 64KB
> page (and vice versa). Some distros made the choice to only support one type of
> page granularity (i.e 64KB for REHL, 4KB for Debian...).
>
> At the moment the hypercall interface is based on the hypervisor page
> granularity. Because Xen has always supported 4KB page-granularity, this
> assumption was also hardcoded in the kernel.
>
> What prevent us to get 64KB page support in Xen (and therefore support for
> 52-bit address) is the hypercall ABI. If you upgrade Xen to 64KB then the
> hypercall interface would defact use 64KB frame. This would break any current
> guest. It is also not possible to keep 4KB pages everywhere because you can only
> map 64KB in Xen. So you may map a bit too much from another guest.
>
> This makes me think that the frame is probably not the best in that situation.
> Instead a pair of address/size would be more suitable.
>
> The problem is much larger than this series. But I thought I would attempt to
> convince the community using guest physical address over guest frame address
> whenever it is possible.

Thanks, Julien -- that explanation is very helpful and your request makes sense.

So in concrete terms, with the change that you're advocating for to
this patch, the 64-bit value that is supplied by the guest in the
array passed as an argument to register_ring would encode the same
guest physical frame number as it currently does in the patch version
presented in this thread, but it would be bit-shifted to the position
used in a physical address.

In addition to that change, a page size indicator would be supplied
too -- for every page address supplied in the call.

Is there a method currently used within Xen (or relevant places
elsewhere) for encoding both the page address and size (ie. 4KB, 16KB
or 64KB) within the same 64-bits?
ie. Knowing that the smallest granularity of page is 4KB, and that all
pages are aligned to at least a 4KB boundary, there are low bits in
the address that are known to be zero, and those could be used to
indicate the page size when supplied to this call. It seems like such
an encoding would allow for avoiding doubling the size of the argument
array, but I'm not sure how inconvenient it would be to work with in
practice.

If so, such an interface change looks manageable and hopefully it
would be acceptable to only support 4KB pages in the current
implementation behind that new ABI for the time being. Let me know
what you think.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-05 22:35         ` Christopher Clark
@ 2018-12-11 13:51           ` Julien Grall
  0 siblings, 0 replies; 111+ messages in thread
From: Julien Grall @ 2018-12-11 13:51 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	nd, eric chanudet, Roger Pau Monné

Hi Christoffer,

On 05/12/2018 22:35, Christopher Clark wrote:
> On Wed, Dec 5, 2018 at 9:20 AM Julien Grall <julien.grall@arm.com> wrote:
>> On 04/12/2018 09:08, Christopher Clark wrote:
>>> On Sun, Dec 2, 2018 at 12:11 PM Julien Grall <Julien.Grall@arm.com> wrote:
>>>> On 01/12/2018 01:32, Christopher Clark wrote:
>>>>> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
>>>>> ...
>>>>> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
>>>>> +typedef uint64_t argo_pfn_t;
>>>>
>>>> As you always use 64-bit, can we just use an address? This would make
>>>> the ABI agnostic to the hypervisor page granularity.
>>
>> By address I meant guest physical address (and not guest virtual address).
>>
>> Arm processors may support multiple page granularity (4KB, 16KB, 64KB). The
>> software is allowed to use different granularity at different level. This means
>> that the hypervisor could use 4KB page while the guest kernel would use 64KB
>> page (and vice versa). Some distros made the choice to only support one type of
>> page granularity (i.e 64KB for REHL, 4KB for Debian...).
>>
>> At the moment the hypercall interface is based on the hypervisor page
>> granularity. Because Xen has always supported 4KB page-granularity, this
>> assumption was also hardcoded in the kernel.
>>
>> What prevent us to get 64KB page support in Xen (and therefore support for
>> 52-bit address) is the hypercall ABI. If you upgrade Xen to 64KB then the
>> hypercall interface would defact use 64KB frame. This would break any current
>> guest. It is also not possible to keep 4KB pages everywhere because you can only
>> map 64KB in Xen. So you may map a bit too much from another guest.
>>
>> This makes me think that the frame is probably not the best in that situation.
>> Instead a pair of address/size would be more suitable.
>>
>> The problem is much larger than this series. But I thought I would attempt to
>> convince the community using guest physical address over guest frame address
>> whenever it is possible.
> 
> Thanks, Julien -- that explanation is very helpful and your request makes sense.
> 
> So in concrete terms, with the change that you're advocating for to
> this patch, the 64-bit value that is supplied by the guest in the
> array passed as an argument to register_ring would encode the same
> guest physical frame number as it currently does in the patch version
> presented in this thread, but it would be bit-shifted to the position
> used in a physical address.
> 
> In addition to that change, a page size indicator would be supplied
> too -- for every page address supplied in the call.
> 
> Is there a method currently used within Xen (or relevant places
> elsewhere) for encoding both the page address and size (ie. 4KB, 16KB
> or 64KB) within the same 64-bits?
> ie. Knowing that the smallest granularity of page is 4KB, and that all
> pages are aligned to at least a 4KB boundary, there are low bits in
> the address that are known to be zero, and those could be used to
> indicate the page size when supplied to this call. It seems like such
> an encoding would allow for avoiding doubling the size of the argument
> array, but I'm not sure how inconvenient it would be to work with in
> practice.
> 
> If so, such an interface change looks manageable and hopefully it
> would be acceptable to only support 4KB pages in the current
> implementation behind that new ABI for the time being. Let me know
> what you think.

If you let the user the choice of the granularity, then, I believe, you will 
prevent the hypervisor to do some optimization.

For instance, if the guest supplies only 4KB page but the hypervisor is 64KB. 
There are no way to easily map them contiguously in the hypervisor (e.g using vmap).

Is there a particular reason to allow the ring buffer to be non-contiguous in 
the guest physical address?

Depending on the answer, there are different way to handle that:
	1) Request the guest to allocate memory using 64KB (on Arm) chunk and pass the 
base address for each chunk
	2) Request the guest to allocate contiguously the buffer and pass the base 
address and size

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-04  9:03     ` Christopher Clark
  2018-12-04  9:16       ` Paul Durrant
@ 2018-12-11 14:15       ` Julien Grall
  1 sibling, 0 replies; 111+ messages in thread
From: Julien Grall @ 2018-12-11 14:15 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	nd, eric chanudet

Hi Christopher,

On 04/12/2018 09:03, Christopher Clark wrote:
> On Sun, Dec 2, 2018 at 11:55 AM Julien Grall <Julien.Grall@arm.com> wrote:
>>
>> Hi,
>>
>> On 01/12/2018 01:33, Christopher Clark wrote:
>>> * x86 PV domains are notified via event channel.
>>>
>>> PV guests are known to have the event channel software present in the guest
>>> kernel, so it is fine to depend on and use it.
>>>
>>> * x86 HVM domains and all ARM domains are notified via VIRQ.
>>>
>>> The intent is to remove the requirement for event channel software to be
>>> installed within these guests in order to use Argo. VIRQ signalling is also
>>> the method that has been in use for the longest period with this hypercall
>>> in both XenClient and OpenXT.
>>
>> I am a bit confused. vIRQs are based on event channel, so how do you
>> remove the requirement on event channel?
> 
> Are VIRQs always delivered via events in all cases? I was under the
> impression that was not necessarily so with HVM guests but I haven't
> checked and could well be incorrect.

It depends on your meaning of vIRQs. We seem to use it for two cases in the 
hypervisor.

In the context of send_guest_global_virq(), the interrupt will be 
para-virtualized as we delivered via events.

On Arm most of the virtual interrupts will goes through the virtual interrupt 
controller. They can be raised using vgic_inject_irq() and event channel are 
therefore not required. I think this is fairly similar on PVH/HVM but I will let 
the x86 folks confirm here.

> 
> A bit of context might help with how this multiple-method logic (as
> submitted) was arrived at:
> 
> 1) Both XenClient's original version of v4v, and that used in OpenXT,
> deliver notifications to guests via VIRQ.
> This logic has been performing fine for our uses cases, so there
> hasn't really been a push to switch away from it.

 From my understanding, VIRQ is just a convenience alias for the guest to 
receive the associated event. The guest only need to say "I want to bind VIRQ 
foo". In the other case, you would need to allocate the event channel in the 
hypervisor and then pass the information somehow to the guest.

> 
> 2) The last version of v4v that was submitted to xen-devel for
> iteration with the Xen community was intended to use event channels
> instead, in response to a request from Jan at the time. Given that
> expressed preference, I've added that, plumbing it in through via the
> IPI event method exposed in patch #01, and then used in patch #05, of
> the submitted series.
> 
> 3) Bromium's uxen uses different logic for delivery of events to
> non-PV guests: an edge-triggered, ISA IRQ, along these lines:
> 
>      #define ARGO_SIGNAL_ISA_IRQ 8
>      hvm_isa_irq_assert(d, ARGO_SIGNAL_ISA_IRQ, NULL);
>      hvm_isa_irq_deassert(d, ARGO_SIGNAL_ISA_IRQ);
> 
> I'm told that this avoids the need to EOI in the guest, reducing the
> VMEXIT load, and using an ISA IRQ avoids some logic in Windows that
> requires that a device be detected. I briefly looked into adding this
> to Argo, but Linux wasn't immediately happy and I haven't had time to
> look into it further given the proximity of the 4.12 release, with
> other work still to complete.
> 
> Anyway: since method 3 isn't ready to submit, and if VIRQs don't have
> an advantage over using event channels directly wrt. to needing
> in-guest support to function, then I can drop this patch (#23) and
> simplify the get_config op (#25), which will leave all notifications
> being delivered as events.
> 
> Alternatively, if this is about which is the right delivery method for
> ARM, with some valid reason to retain use of VIRQ for HVM x86, then
> I'm happy to switch ARM over to deliver by the event method rather
> than VIRQ if that makes more sense.

For Arm, 3) would look the right approach if you want to avoid the dependencies 
on the event channel driver.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 00/25] Argo: hypervisor-mediated interdomain communication
  2018-12-04  9:00   ` Christopher Clark
@ 2018-12-11 22:13     ` Chris Patterson
  0 siblings, 0 replies; 111+ messages in thread
From: Chris Patterson @ 2018-12-11 22:13 UTC (permalink / raw)
  To: Christopher Clark
  Cc: jandryuk, dpsmith, Julien Grall, Jan Beulich, Stefano Stabellini,
	Tim Deegan, xen-devel, Jean Guyader, Lars Kurth, Ross Philipson,
	Konrad Rzeszutek Wilk, paul.durrant, Juergen Gross, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, voreekf, Rich Persaud,
	dgdegra, eric chanudet, roger.pau

On Tue, Dec 4, 2018 at 4:00 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> On Mon, Dec 3, 2018 at 8:49 AM Chris Patterson <cjp256@gmail.com> wrote:
> >
> > > == Future items
> > >
> > > The Linux device driver used to test this software is derived from the
> > > OpenXT v4v Linux device driver, available at:
> > >     https://github.com/OpenXT/v4v
> > > The Argo implementation is not yet ready to publish (focus has been on
> > > the hypervisor code to this point). A Linux device driver suitable for
> > > inclusion in Xen will be submitted for a future Xen release and
> > > incorporation into OpenXT.
> > >
> >
> > Hey Christopher, I am glad you are tackling this.  While the Linux
> > driver is not ready to publish, is there a version you can share for
> > someone who wants to test this series?  Or is the v4v driver
> > compatible as-is?
>
> Hi Chris,
>
> Thanks for the interest -- so: ok, for you to take a look and to
> enable testing by anyone who would like to: I've just pushed a copy of
> the Argo ported Linux driver and userspace interposer, etc., with some
> OpenEmbedded build integration and instructions, to my github account
> here:
>
> https://github.com/dozylynx/meta-argo-linux
>
> This a pretty fast port of the v4v Linux software to use the argo
> interfaces -- the existing OpenXT v4v interface is not quite the same
> -- plus metadata in there to turn it into a new OpenEmbedded layer in
> the same repo with recipes to work with meta-virtualization. I've been
> building with the rocko release, just to pick a stable reference
> point, so it's the rocko branch in meta-argo-linux that you'll want to
> look at, and there are instructions in the README.md in that branch.
>
> If you build that per the instructions, just a heads up that the Xen
> recipe in there will pull from a recent snapshot of Xen's staging
> branch, with the posted Argo series applied, from a copy on my github
> account.
>
> If you give it a spin, let me know how it goes.
>

Thank you! Will do. :D

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
  2018-12-02 20:10   ` Julien Grall
  2018-12-04 10:57   ` Paul Durrant
@ 2018-12-12  9:48   ` Jan Beulich
  2018-12-20  5:29     ` Christopher Clark
  2018-12-12 16:47   ` Roger Pau Monné
  3 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-12  9:48 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)

We generally prefer to avoid "inline" outside of header files. Also
is there any strict need for the function to return a fixed width
type? Plus what's the point of the _fn suffix?

> +{
> +    uint16_t ret;
> +
> +    ret = (uint16_t)(id->addr.port >> 16);
> +    ret ^= (uint16_t)id->addr.port;

Pointless casts (with ret itself being uint16_t).

> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +                   uint8_t **page)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map page"
> +               " %u of %u\n", ring_info->id.addr.domain_id,
> +               ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +               i, ring_info->nmfns);
> +        return -EFAULT;
> +    }
> +    ASSERT(ring_info->mfns);
> +    ASSERT(ring_info->mfn_mapping);
> +
> +    if ( !ring_info->mfn_mapping[i] )
> +    {
> +        /*
> +         * TODO:
> +         * The first page of the ring contains the ring indices, so both read and
> +         * write access to the page is required by the hypervisor, but read-access
> +         * is not needed for this mapping for the remainder of the ring.
> +         * Since this mapping will remain resident in Xen's address space for
> +         * the lifetime of the ring, and following the principle of least privilege,
> +         * it could be preferable to:
> +         *  # add a XSM check to determine what policy is wanted here
> +         *  # depending on the XSM query, optionally create this mapping as
> +         *    _write-only_ on platforms that can support it.
> +         *    (eg. Intel EPT/AMD NPT).
> +         */
> +        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]);
> +
> +        if ( !ring_info->mfn_mapping[i] )
> +        {
> +            printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map page"
> +                   " %u of %u\n", ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +                   i, ring_info->nmfns);
> +            return -EFAULT;

Unsuitable error code.

> +        }
> +        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +               mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +    }
> +
> +    if ( page )
> +        *page = ring_info->mfn_mapping[i];

This suggests that the parameter is misnamed. "page" variables
should be of types other than struct page_info * only under
exceptional circumstances.

> +static int
> +argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +    uint8_t *dst;

Why uint8_t when you don't mean to access bytes? For the
arithmetic below void * should do just fine.

> +    uint32_t *p;
> +    int ret;
> +
> +    ret = argo_ring_map_page(ring_info, 0, &dst);
> +    if ( ret )
> +        return ret;
> +
> +    p = (uint32_t *)(dst + offsetof(argo_ring_t, tx_ptr));

And then you also don't need any cast here.

> +    write_atomic(p, tx_ptr);
> +    mb();

While guests need to use non-SMP barriers, I don't see why an
SMP one wouldn't be sufficient here. I also don't see why this
isn't smp_wmb().

> @@ -231,6 +319,388 @@ argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
>      xfree(ring_info);
>  }
>  
> +/*
> + * ring
> + */

I can see the point of using multi-line comments in a few cases
where our style would not permit this, but a single word is imo
too little to justify a style violation.

> +static int
> +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> +                    uint32_t len)
> +{
> +    int i;
> +    int ret = 0;
> +
> +    if ( (npage << PAGE_SHIFT) < len )
> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /*
> +         * Ring already existed. Check if it's the same ring,
> +         * i.e. same number of pages and all translated gpfns still
> +         * translating to the same mfns
> +         */

This comment makes me wonder whether the translations are
permitted to change at other times. If so I'm not sure what
value verification here has. If not, this probably would want to
be debugging-only code.

> +        if ( ring_info->npage != npage )
> +            i = ring_info->nmfns + 1; /* forces re-register below */
> +        else
> +        {
> +            for ( i = 0; i < ring_info->nmfns; i++ )
> +            {
> +                argo_pfn_t pfn;
> +                mfn_t mfn;
> +
> +                ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +                if ( ret )
> +                    break;
> +
> +                ret = argo_find_ring_mfn(d, pfn, &mfn);
> +                if ( ret )
> +                    break;
> +
> +                if ( mfn_x(mfn) != mfn_x(ring_info->mfns[i]) )
> +                    break;
> +            }
> +        }
> +        if ( i != ring_info->nmfns )
> +        {
> +            printk(XENLOG_INFO "argo: vm%u re-registering existing argo ring"
> +                   " (vm%u:%x vm%d), clearing MFN list\n",
> +                   current->domain->domain_id, ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner);
> +
> +            argo_ring_remove_mfns(d, ring_info);
> +            ASSERT(!ring_info->mfns);
> +        }
> +    }
> +
> +    if ( !ring_info->mfns )
> +    {
> +        mfn_t *mfns;
> +        uint8_t **mfn_mapping;
> +
> +        mfns = xmalloc_array(mfn_t, npage);
> +        if ( !mfns )
> +            return -ENOMEM;
> +
> +        for ( i = 0; i < npage; i++ )
> +            mfns[i] = INVALID_MFN;
> +
> +        mfn_mapping = xmalloc_array(uint8_t *, npage);

Perhaps better to xzalloc_array() here than to ...

> +        if ( !mfn_mapping )
> +        {
> +            xfree(mfns);
> +            return -ENOMEM;
> +        }
> +
> +        ring_info->npage = npage;
> +        ring_info->mfns = mfns;
> +        ring_info->mfn_mapping = mfn_mapping;
> +    }
> +    ASSERT(ring_info->npage == npage);
> +
> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;
> +
> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
> +    {
> +        argo_pfn_t pfn;
> +        mfn_t mfn;
> +
> +        ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +        if ( ret )
> +            break;
> +
> +        ret = argo_find_ring_mfn(d, pfn, &mfn);
> +        if ( ret )
> +        {
> +            printk(XENLOG_ERR "argo: vm%u passed invalid gpfn %"PRI_xen_pfn
> +                   " ring (vm%u:%x vm%d) %p seq %d of %d\n",
> +                   d->domain_id, pfn, ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner,
> +                   ring_info, i, ring_info->npage);
> +            break;
> +        }
> +
> +        ring_info->mfns[i] = mfn;
> +        ring_info->nmfns = i + 1;
> +
> +        argo_dprintk("%d: %"PRI_xen_pfn" -> %"PRI_mfn"\n",
> +               i, pfn, mfn_x(ring_info->mfns[i]));
> +
> +        ring_info->mfn_mapping[i] = NULL;

... zap individual slots late here?

> +static struct argo_ring_info *
> +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> +{
> +    uint16_t hash;
> +    struct hlist_node *node;

const?

> +static long
> +argo_register_ring(struct domain *d,
> +                   XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd,
> +                   XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, uint32_t npage,
> +                   bool fail_exist)
> +{
> +    struct argo_ring ring;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +    bool update_tx_ptr = 0;

bool type means true/false in initializers and assignments.

> +    uint64_t dst_domain_cookie = 0;
> +
> +    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> +        return -EINVAL;

Why? You don't store the handle for later use (and you shouldn't).
If there really is a need for a full page's worth of memory, it
would better be passed in as GFN.

> +    read_lock (&argo_lock);
> +
> +    do {
> +        if ( !d->argo )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        if ( copy_from_guest(&ring, ring_hnd, 1) )
> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        if ( ring.magic != ARGO_RING_MAGIC )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( (ring.len < (sizeof(struct argo_ring_message_header)
> +                          + ARGO_ROUNDUP(1) + ARGO_ROUNDUP(1)))   ||

An expression like this wants at least a brief explaining comment
attached.

> +             (ARGO_ROUNDUP(ring.len) != ring.len) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( ring.len > ARGO_MAX_RING_SIZE )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( ring.id.partner == ARGO_DOMID_ANY )
> +        {
> +            ret = xsm_argo_register_any_source(d, argo_mac_bootparam_enforcing);
> +            if ( ret )
> +                break;
> +        }
> +        else
> +        {
> +            struct domain *dst_d = get_domain_by_id(ring.id.partner);
> +            if ( !dst_d )

Blank line between declaration(s) and statement(s) please.

> +            {
> +                argo_dprintk("!dst_d, ECONNREFUSED\n");
> +                ret = -ECONNREFUSED;
> +                break;
> +            }
> +
> +            ret = xsm_argo_register_single_source(d, dst_d);
> +            if ( ret )
> +            {
> +                put_domain(dst_d);
> +                break;
> +            }
> +
> +            if ( !dst_d->argo )
> +            {
> +                argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +                ret = -ECONNREFUSED;
> +                put_domain(dst_d);
> +                break;
> +            }
> +
> +            dst_domain_cookie = dst_d->argo->domain_cookie;
> +
> +            put_domain(dst_d);
> +        }
> +
> +        ring.id.addr.domain_id = d->domain_id;
> +        if ( copy_field_to_guest(ring_hnd, &ring, id) )

Whenever you copy back out fields (or entire structures) that
you've copied in before, __copy_* variants of the functions
suffice).

> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        /*
> +         * no need for a lock yet, because only we know about this
> +         * set the tx pointer if it looks bogus (we don't reset it
> +         * because this might be a re-register after S4)
> +         */
> +
> +        if ( ring.tx_ptr >= ring.len ||
> +             ARGO_ROUNDUP(ring.tx_ptr) != ring.tx_ptr )
> +        {
> +            /*
> +             * Since the ring is a mess, attempt to flush the contents of it
> +             * here by setting the tx_ptr to the next aligned message slot past
> +             * the latest rx_ptr we have observed. Handle ring wrap correctly.
> +             */
> +            ring.tx_ptr = ARGO_ROUNDUP(ring.rx_ptr);
> +
> +            if ( ring.tx_ptr >= ring.len )
> +                ring.tx_ptr = 0;
> +
> +            /* ring.tx_ptr will be written back to the guest ring below. */
> +            update_tx_ptr = 1;
> +        }
> +
> +        /* W(L2) protects all the elements of the domain's ring_info */
> +        write_lock(&d->argo->lock);
> +
> +        do {
> +            ring_info = argo_ring_find_info(d, &ring.id);
> +
> +            if ( !ring_info )
> +            {
> +                uint16_t hash;
> +
> +                ring_info = xmalloc(struct argo_ring_info);
> +                if ( !ring_info )
> +                {
> +                    ret = -ENOMEM;
> +                    break;
> +                }
> +
> +                spin_lock_init(&ring_info->lock);
> +
> +                ring_info->mfns = NULL;
> +                ring_info->npage = 0;
> +                ring_info->mfn_mapping = NULL;
> +                ring_info->len = 0;
> +                ring_info->nmfns = 0;
> +                ring_info->tx_ptr = 0;

xzalloc() used above would eliminate the need for all of these.

> +                ring_info->partner_cookie = dst_domain_cookie;
> +
> +                ring_info->id = ring.id;
> +                INIT_HLIST_HEAD(&ring_info->pending);
> +
> +                hash = argo_hash_fn(&ring_info->id);
> +                hlist_add_head(&ring_info->node, &d->argo->ring_hash[hash]);
> +
> +                printk(XENLOG_INFO "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> +                       current->domain->domain_id, ring.id.addr.domain_id,
> +                       ring.id.addr.port, ring.id.partner);

Please consider using XENLOG_G_INFO for such guest related log
messages.

> @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>  
>      switch (cmd)
>      {
> +    case ARGO_MESSAGE_OP_register_ring:
> +    {
> +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> +            guest_handle_cast(arg1, argo_ring_t);
> +        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> +            guest_handle_cast(arg2, argo_pfn_t);
> +        uint32_t npage = arg3;
> +        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> +
> +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> +            break;

I don't understand the need for this and ...

> +        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> +            break;

... perhaps also this, when you use copy_from_guest() upon access.

> +        /* arg4: reserve currently-undefined bits, require zero.  */
> +        if ( unlikely(arg4 & ~ARGO_REGISTER_FLAG_MASK) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = argo_register_ring(d, ring_hnd, pfn_hnd, npage, fail_exist);
> +        break;
> +    }
>      default:

Blank line above here please.

> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -21,6 +21,20 @@
>  
>  #include "xen.h"
>  
> +#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
> +
> +#define ARGO_DOMID_ANY           DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x1000000 or 16777216 bytes.
> + * A byte index into the ring is at most 24 bits.
> + */
> +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> +
> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> +typedef uint64_t argo_pfn_t;
> +
>  typedef struct argo_addr
>  {
>      uint32_t port;

It must have started in an earlier patch where I didn't pay
attention: Please can you make sure to prefix all public
header additions to global name space with XEN_ / xen_?
Unless of course it is thought that ARGO_ / argo_ are
entirely impossible to be used in any other environment.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 14/25] argo: implement the unregister op
  2018-12-01  1:32 ` [PATCH 14/25] argo: implement the unregister op Christopher Clark
  2018-12-04 11:10   ` Paul Durrant
@ 2018-12-12  9:51   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-12  9:51 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -510,6 +510,59 @@ argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
>  }
>  
>  static long
> +argo_unregister_ring(struct domain *d,
> +                     XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd)
> +{
> +    struct argo_ring ring;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +
> +    read_lock(&argo_lock);
> +
> +    do {
> +        if ( !d->argo )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        ret = copy_from_guest_errno(&ring, ring_hnd, 1);
> +        if ( ret )
> +            break;
> +
> +        if ( ring.magic != ARGO_RING_MAGIC )
> +        {
> +            argo_dprintk(
> +                "ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx), EINVAL\n",
> +                ring.magic, ARGO_RING_MAGIC);
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        ring.id.addr.domain_id = d->domain_id;

Why the override?

> +        write_lock(&d->argo->lock);
> +
> +        ring_info = argo_ring_find_info(d, &ring.id);
> +        if ( ring_info )
> +            argo_ring_remove_info(d, ring_info);
> +
> +        write_unlock(&d->argo->lock);
> +
> +        if ( !ring_info )
> +        {
> +            argo_dprintk("ENOENT\n");
> +            ret = -ENOENT;
> +            break;
> +        }
> +
> +    } while ( 0 );
> +
> +    read_unlock(&argo_lock);
> +    return ret;
> +}

Blank line ahead of the main return statement of a function please.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2018-12-01  1:32 ` [PATCH 15/25] argo: implement the sendv op Christopher Clark
  2018-12-04 11:22   ` Paul Durrant
@ 2018-12-12 11:52   ` Jan Beulich
  2018-12-20  5:58     ` Christopher Clark
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-12 11:52 UTC (permalink / raw)
  To: Christopher Clark, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> +static void
> +argo_signal_domain(struct domain *d)
> +{
> +    argo_dprintk("signalling domid:%d\n", d->domain_id);
> +
> +    if ( !d->argo ) /* This can happen if the domain is being destroyed */
> +        return;

If such a precaution is necessary, how is it guaranteed that
the pointer doesn't change to NULL between the check above
and ...

> +    evtchn_send(d, d->argo->evtchn_port);

... the use here?

> +static int
> +argo_iov_count(XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> +               uint32_t *count)
> +{
> +    argo_iov_t iov;
> +    uint32_t sum_iov_lens = 0;
> +    int ret;
> +
> +    if ( niov > ARGO_MAXIOV )
> +        return -EINVAL;
> +
> +    while ( niov-- )
> +    {
> +        ret = copy_from_guest_errno(&iov, iovs, 1);
> +        if ( ret )
> +            return ret;
> +
> +        /* check each to protect sum against integer overflow */
> +        if ( iov.iov_len > ARGO_MAX_RING_SIZE )
> +            return -EINVAL;
> +
> +        sum_iov_lens += iov.iov_len;
> +
> +        /*
> +         * Again protect sum from integer overflow
> +         * and ensure total msg size will be within bounds.
> +         */
> +        if ( sum_iov_lens > ARGO_MAX_MSG_SIZE )
> +            return -EINVAL;

So you do overflow checks here. But how does this help when ...

> +        guest_handle_add_offset(iovs, 1);
> +    }
> +
> +    *count = sum_iov_lens;
> +    return 0;
> +}
> +
> +static int
> +argo_ringbuf_insert(struct domain *d,
> +                    struct argo_ring_info *ring_info,
> +                    const struct argo_ring_id *src_id,
> +                    XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> +                    uint32_t message_type, unsigned long *out_len)
> +{
> +    argo_ring_t ring;
> +    struct argo_ring_message_header mh = { 0 };
> +    int32_t sp;
> +    int32_t ret = 0;
> +    uint32_t len;
> +    uint32_t iov_len;
> +    uint32_t sum_iov_len = 0;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    if ( (ret = argo_iov_count(iovs, niov, &len)) )
> +        return ret;
> +
> +    if ( ((ARGO_ROUNDUP(len) + sizeof (struct argo_ring_message_header) ) >=
> +          ring_info->len)
> +         || (len > ARGO_MAX_MSG_SIZE) )
> +        return -EMSGSIZE;
> +
> +    do {
> +        ret =  argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr);
> +        if ( ret )
> +            break;
> +
> +        argo_sanitize_ring(&ring, ring_info);
> +
> +        argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d"
> +                     " ring_info->tx_ptr=%d\n",
> +                     ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
> +
> +        if ( ring.rx_ptr == ring.tx_ptr )
> +            sp = ring_info->len;
> +        else
> +        {
> +            sp = ring.rx_ptr - ring.tx_ptr;
> +            if ( sp < 0 )
> +                sp += ring.len;
> +        }
> +
> +        if ( (ARGO_ROUNDUP(len) + sizeof(struct argo_ring_message_header)) >= sp )
> +        {
> +            argo_dprintk("EAGAIN\n");
> +            ret = -EAGAIN;
> +            break;
> +        }
> +
> +        mh.len = len + sizeof(struct argo_ring_message_header);
> +        mh.source.port = src_id->addr.port;
> +        mh.source.domain_id = src_id->addr.domain_id;
> +        mh.message_type = message_type;
> +
> +        /*
> +         * For this copy to the guest ring, tx_ptr is always 16-byte aligned
> +         * and the message header is 16 bytes long.
> +         */
> +        BUILD_BUG_ON(sizeof(struct argo_ring_message_header) != ARGO_ROUNDUP(1));
> +
> +        if ( (ret = argo_memcpy_to_guest_ring(ring_info,
> +                                              ring.tx_ptr + sizeof(argo_ring_t),
> +                                              &mh,
> +                                              XEN_GUEST_HANDLE_NULL(uint8_t),
> +                                              sizeof(mh))) )
> +            break;
> +
> +        ring.tx_ptr += sizeof(mh);
> +        if ( ring.tx_ptr == ring_info->len )
> +            ring.tx_ptr = 0;
> +
> +        while ( niov-- )
> +        {
> +            XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd;
> +            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
> +            argo_iov_t iov;
> +
> +            ret = copy_from_guest_errno(&iov, iovs, 1);

... here you copy the structure again from guest memory, at
which point it may have changed? I see you do some checks
further down, but the question then is - is the checking in
argo_iov_count() redundant and hence unnecessary? Are
you really safe here against inconsistencies between the
first and second reads? If so, a thorough explanation in a
comment is needed here.

> +            if ( ret )
> +                break;
> +
> +            bufp_hnd = guest_handle_from_ptr((uintptr_t)iov.iov_base, uint8_t);

Please use a handle in the public interface instead of such a
cast.

> +            buf_hnd = guest_handle_from_param(bufp_hnd, uint8_t);
> +            iov_len = iov.iov_len;
> +
> +            if ( !iov_len )
> +            {
> +                printk(XENLOG_ERR "argo: iov.iov_len=0 iov.iov_base=%"
> +                       PRIx64" ring (vm%u:%x vm%d)\n",
> +                       iov.iov_base, ring_info->id.addr.domain_id,
> +                       ring_info->id.addr.port, ring_info->id.partner);
> +
> +                guest_handle_add_offset(iovs, 1);
> +                continue;
> +            }
> +
> +            if ( iov_len > ARGO_MAX_MSG_SIZE )
> +            {
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            sum_iov_len += iov_len;
> +            if ( sum_iov_len > len )
> +            {
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            if ( unlikely(!guest_handle_okay(buf_hnd, iov_len)) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            sp = ring.len - ring.tx_ptr;
> +
> +            if ( iov_len > sp )
> +            {
> +                ret = argo_memcpy_to_guest_ring(ring_info,
> +                        ring.tx_ptr + sizeof(argo_ring_t),
> +                        NULL, buf_hnd, sp);
> +                if ( ret )
> +                    break;
> +
> +                ring.tx_ptr = 0;
> +                iov_len -= sp;
> +                guest_handle_add_offset(buf_hnd, sp);
> +            }
> +
> +            ret = argo_memcpy_to_guest_ring(ring_info,
> +                        ring.tx_ptr + sizeof(argo_ring_t),
> +                        NULL, buf_hnd, iov_len);

Extending the remark on double guest memory read above, is
it certain you won't overrun the ring here?

> +            if ( ret )
> +                break;
> +
> +            ring.tx_ptr += iov_len;
> +
> +            if ( ring.tx_ptr == ring_info->len )
> +                ring.tx_ptr = 0;
> +
> +            guest_handle_add_offset(iovs, 1);
> +        }
> +
> +        if ( ret )
> +            break;
> +
> +        ring.tx_ptr = ARGO_ROUNDUP(ring.tx_ptr);
> +
> +        if ( ring.tx_ptr >= ring_info->len )
> +            ring.tx_ptr -= ring_info->len;
> +
> +        mb();
> +        ring_info->tx_ptr = ring.tx_ptr;

What does the above barrier guard against? It's all hypervisor
local memory which gets altered afaict.

> +static int
> +argo_pending_requeue(struct argo_ring_info *ring_info, domid_t src_id, int len)
> +{
> +    struct hlist_node *node;
> +    struct argo_pending_ent *ent;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    hlist_for_each_entry(ent, node, &ring_info->pending, node)
> +    {
> +        if ( ent->id == src_id )
> +        {
> +            if ( ent->len < len )
> +                ent->len = len;

What does this achieve? I.e. why is this not either a plain
assignment or a check that the length is the same?

> +static struct argo_ring_info *
> +argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
> +                             domid_t partner_id, uint64_t partner_cookie)
> +{
> +    argo_ring_id_t id;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    id.addr.port = port;
> +    id.addr.domain_id = d->domain_id;
> +    id.partner = partner_id;
> +
> +    ring_info = argo_ring_find_info(d, &id);
> +    if ( ring_info && (partner_cookie == ring_info->partner_cookie) )
> +        return ring_info;

Such a cookie makes mismatches unlikely, but it doesn't exclude
them. If there are other checks, is the cookie useful at all?

> @@ -813,6 +1318,29 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>          rc = argo_unregister_ring(d, ring_hnd);
>          break;
>      }
> +    case ARGO_MESSAGE_OP_sendv:
> +    {
> +        argo_send_addr_t send_addr;
> +        uint32_t niov = arg3;
> +        uint32_t message_type = arg4;

At the example of these (perhaps I've again overlooked earlier
instances), what about the upper halves on 64-bit? Given the
rather generic interface of the actual hypercall, I don't think it
is a good idea to ignore the bits. The situation is different for
the "cmd" parameter, which is uniformly 32-bit for all sub-ops.

Talking of "cmd" and its type: In case it wasn't said by anyone
else yet, please use unsigned types wherever negative values
are impossible.

> +        XEN_GUEST_HANDLE_PARAM(argo_send_addr_t) send_addr_hnd =
> +            guest_handle_cast(arg1, argo_send_addr_t);
> +        XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs =
> +            guest_handle_cast(arg2, argo_iov_t);
> +
> +        if ( unlikely(!guest_handle_okay(send_addr_hnd, 1)) )
> +            break;
> +        rc = copy_from_guest_errno(&send_addr, send_addr_hnd, 1);
> +        if ( rc )
> +            break;
> +
> +        send_addr.src.domain_id = d->domain_id;

What use is the field if you override it like this?

> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -32,6 +32,28 @@
>   */
>  #define ARGO_MAX_RING_SIZE  (16777216ULL)
>  
> +/*
> + * ARGO_MAXIOV : maximum number of iovs accepted in a single sendv.
> + * Rationale for the value:
> + * The Linux argo driver never passes more than two iovs.
> + * Linux defines UIO_MAXIOV as 1024.
> + * POSIX mandates at least 16 -- not that this is a POSIX API of course.
> + *
> + * Limit the total amount of data posted in a single argo operation to
> + * no more than 2^31 bytes to reduce risk of integer overflow defects.
> + * Each argo iov can hold ~ 2^24 bytes, so set ARGO_MAXIOV to 2^(31-24),
> + * minus one to enable simple efficient bounds checking via masking: 127.
> +*/
> +#define ARGO_MAXIOV          127U
> +
> +typedef struct argo_iov
> +{
> +    uint64_t iov_base;
> +    uint32_t iov_len;
> +    uint32_t pad;

I don't think I've found any checking of this field to be zero, to
allow for future re-use.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-04  9:16       ` Paul Durrant
@ 2018-12-12 14:49         ` James
  0 siblings, 0 replies; 111+ messages in thread
From: James @ 2018-12-12 14:49 UTC (permalink / raw)
  To: Paul Durrant, 'Christopher Clark', Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, James McKenzie, Rich Persaud, Jan Beulich,
	Ian Jackson, xen-devel, nd, eric chanudet

I think there are two issues:

1) VIRQ vs some other sort of event channel

   For PV guests we originally chose a VIRQ in order to have a well known
   number against which the kernel driver could bind, so that it wasn't
   dependent on any of the other interdomain communication systems (such
   as xenstore) to find the correct channel.

   VIRQs and events in general make sense for PV domains since the
   up call mechanism fits well into the way argo expects scheduling to work.

   I dont see any pressing reason to not use a VIRQ for PV or PVH domains,
   perhaps I've missed something.

2) VIRQ vs direct injection of vector in hvm case.

   for HVM guests - you can make the argument for injection via a (potentially
   hardware) emulated LAPIC. In this case the best performance is obtained by
   not having to clear the interrupt (requiring another VMEXIT). The only two
   sorts of interrupts that have that property are 1) MSIs, and 2) edge interrupts
   via the IOAPIC:

   Not all operating systems had/have good MSI support, PCI doesn't [really] support
   edge interrupts on LNK[ABCD], and even if you go the PCI route almost no OSes do
   the right thing anyway. However it is possible to specify a device via ACPI with a
   fixed GSI with an edge interrupt. Windows from at least XP, and linux both handle
   that well and it's the perfect fit for sending the idempotent "you have work to
   do" type interrupt that argo needs.

   One of the design goals was that multiple independent actors in a VM should
   be able to run without knowledge of each other, and using a standard GSI allows
   something like an EDK II, grub and linux to all use it without having to hand
   forward state.


J.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
  2018-12-01  1:32 ` [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy Christopher Clark
  2018-12-04  9:35   ` Paul Durrant
@ 2018-12-12 16:01   ` Roger Pau Monné
  2018-12-20  5:16     ` Christopher Clark
  1 sibling, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2018-12-12 16:01 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Fri, Nov 30, 2018 at 05:32:46PM -0800, Christopher Clark wrote:
> Applied to both x86 and ARM headers.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/include/asm-arm/guest_access.h | 25 +++++++++++++++++++++++++
>  xen/include/asm-x86/guest_access.h | 29 +++++++++++++++++++++++++++++
>  xen/include/xen/guest_access.h     |  3 +++
>  3 files changed, 57 insertions(+)
> 
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> index 224d2a0..7b6f89c 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
>  #define __raw_copy_from_guest raw_copy_from_guest
>  #define __raw_clear_guest raw_clear_guest
>  
> +#define raw_copy_from_guest_errno(dst, src, len)             \
> +    (raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
> +#define raw_copy_to_guest_errno(dst, src, len)               \
> +    (raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)

Since the only error that you return is EFAULT, I don't really see the
point in adding all those helpers. You achieve exactly the same by
returning a boolean and doing the translation to EFAULT in the caller
if required.

It might have been nice to have the copy to/from set of functions
return an error value, but adding a new set of helpers that have the
same functionality but just differ in the return value look
redundant.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
                     ` (2 preceding siblings ...)
  2018-12-12  9:48   ` Jan Beulich
@ 2018-12-12 16:47   ` Roger Pau Monné
  2018-12-20  5:41     ` Christopher Clark
  3 siblings, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2018-12-12 16:47 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> Used by a domain to register a region of memory for receiving messages from
> either a specified other domain, or, if specifying a wildcard, any domain.
> 
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits, the
> hypervisor will use this mapping to copy data from a sending domain into this
> registered ring, making it accessible to the domain that registered the ring to
> receive data.
> 
> In this code, the p2m type of the memory supplied by the guest for the ring
> must be p2m_ram_rw, which is a conservative choice made to defer the need to
> reason about the other p2m types with this commit.
> 
> argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit on
> all architectures, to assist with avoiding the need to add a compat ABI.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
>  xen/common/argo.c                  | 498 +++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h          |  64 +++++
>  4 files changed, 566 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 2a95e09..f4e82cf 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -25,6 +25,7 @@
>  #include <xen/guest_access.h>
>  #include <xen/time.h>
>  
> +DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
>  
> @@ -98,6 +99,25 @@ struct argo_domain
>  };
>  
>  /*
> + * Helper functions
> + */
> +
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)

No need for the argo_ prefix for static functions, this is already an
argo specific file.

> +{
> +    uint16_t ret;
> +
> +    ret = (uint16_t)(id->addr.port >> 16);
> +    ret ^= (uint16_t)id->addr.port;
> +    ret ^= id->addr.domain_id;
> +    ret ^= id->partner;
> +
> +    ret &= (ARGO_HTABLE_SIZE - 1);

I'm having trouble figuring out what this is supposed to do, I think a
comment and the expected hash formula will help make sure the code is
correct.

Also doesn't this need to be documented in the public header?

> +    return ret;
> +}
> +
> +/*
>   * locks
>   */
>  
> @@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info)
>      }
>  }
>  
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +                   uint8_t **page)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map page"

You likely want to use gprintk here and below, or XENLOG_G_ERR, so
that the guest cannot DoS the console.

> +               " %u of %u\n", ring_info->id.addr.domain_id,
> +               ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +               i, ring_info->nmfns);
> +        return -EFAULT;
> +    }
> +    ASSERT(ring_info->mfns);
> +    ASSERT(ring_info->mfn_mapping);

We are trying to move away from such assertions, and instead use
constructions that would prevent issues in non-debug builds. I would
write the above asserts as:

if ( !ring_info->mfns || !ring_info->mfn_mapping )
{
    ASSERT_UNREACHABLE();
    return -E<something>;
}

That way non-debug builds won't trigger page faults if there's indeed
a way to get here with the wrong state, and debug builds will still
hit an assert.

> +
> +    if ( !ring_info->mfn_mapping[i] )
> +    {
> +        /*
> +         * TODO:
> +         * The first page of the ring contains the ring indices, so both read and
> +         * write access to the page is required by the hypervisor, but read-access
> +         * is not needed for this mapping for the remainder of the ring.
> +         * Since this mapping will remain resident in Xen's address space for
> +         * the lifetime of the ring, and following the principle of least privilege,
> +         * it could be preferable to:
> +         *  # add a XSM check to determine what policy is wanted here
> +         *  # depending on the XSM query, optionally create this mapping as
> +         *    _write-only_ on platforms that can support it.
> +         *    (eg. Intel EPT/AMD NPT).
> +         */
> +        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]);
> +
> +        if ( !ring_info->mfn_mapping[i] )
> +        {
> +            printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map page"
> +                   " %u of %u\n", ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +                   i, ring_info->nmfns);
> +            return -EFAULT;
> +        }
> +        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +               mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +    }
> +
> +    if ( page )
> +        *page = ring_info->mfn_mapping[i];
> +    return 0;
> +}
> +
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +    uint8_t *dst;
> +    uint32_t *p;
> +    int ret;
> +
> +    ret = argo_ring_map_page(ring_info, 0, &dst);
> +    if ( ret )
> +        return ret;
> +
> +    p = (uint32_t *)(dst + offsetof(argo_ring_t, tx_ptr));
> +    write_atomic(p, tx_ptr);
> +    mb();
> +    return 0;
> +}
> +
>  /*
>   * pending
>   */
> @@ -231,6 +319,388 @@ argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
>      xfree(ring_info);
>  }
>  
> +/*
> + * ring
> + */
> +
> +static int
> +argo_find_ring_mfn(struct domain *d, argo_pfn_t pfn, mfn_t *mfn)

I think you mean gfn instead of pfn, here and below. Also I'm unsure
why you need a new type for argo, it's it fine to just use uint64_t?

> +{
> +    p2m_type_t p2mt;
> +    int ret = 0;
> +
> +#ifdef CONFIG_X86
> +    *mfn = get_gfn_unshare(d, pfn, &p2mt);

Is this supposed to work for PV guests?

> +#else
> +    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> +#endif
> +
> +    if ( !mfn_valid(*mfn) )
> +        ret = -EINVAL;
> +#ifdef CONFIG_X86
> +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> +        ret = -EAGAIN;
> +#endif
> +    else if ( (p2mt != p2m_ram_rw) ||
> +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> +        ret = -EINVAL;
> +
> +#ifdef CONFIG_X86
> +    put_gfn(d, pfn);

If you do this put_gfn here, by the time you check that the gfn -> mfn
matches your expectations the guest might have somehow changed the gfn
-> mfn mapping already (for example by ballooning down memory?)

> +#endif
> +
> +    return ret;
> +}
> +
> +static int
> +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> +                    uint32_t len)
> +{
> +    int i;
> +    int ret = 0;
> +
> +    if ( (npage << PAGE_SHIFT) < len )
> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /*
> +         * Ring already existed. Check if it's the same ring,
> +         * i.e. same number of pages and all translated gpfns still
> +         * translating to the same mfns
> +         */
> +        if ( ring_info->npage != npage )
> +            i = ring_info->nmfns + 1; /* forces re-register below */
> +        else
> +        {
> +            for ( i = 0; i < ring_info->nmfns; i++ )
> +            {
> +                argo_pfn_t pfn;
> +                mfn_t mfn;
> +
> +                ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +                if ( ret )
> +                    break;
> +
> +                ret = argo_find_ring_mfn(d, pfn, &mfn);
> +                if ( ret )
> +                    break;
> +
> +                if ( mfn_x(mfn) != mfn_x(ring_info->mfns[i]) )
> +                    break;
> +            }
> +        }
> +        if ( i != ring_info->nmfns )
> +        {
> +            printk(XENLOG_INFO "argo: vm%u re-registering existing argo ring"
> +                   " (vm%u:%x vm%d), clearing MFN list\n",
> +                   current->domain->domain_id, ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner);
> +
> +            argo_ring_remove_mfns(d, ring_info);
> +            ASSERT(!ring_info->mfns);
> +        }
> +    }
> +
> +    if ( !ring_info->mfns )
> +    {
> +        mfn_t *mfns;
> +        uint8_t **mfn_mapping;
> +
> +        mfns = xmalloc_array(mfn_t, npage);
> +        if ( !mfns )
> +            return -ENOMEM;
> +
> +        for ( i = 0; i < npage; i++ )
> +            mfns[i] = INVALID_MFN;
> +
> +        mfn_mapping = xmalloc_array(uint8_t *, npage);
> +        if ( !mfn_mapping )
> +        {
> +            xfree(mfns);
> +            return -ENOMEM;
> +        }
> +
> +        ring_info->npage = npage;
> +        ring_info->mfns = mfns;
> +        ring_info->mfn_mapping = mfn_mapping;
> +    }
> +    ASSERT(ring_info->npage == npage);
> +
> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;
> +
> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
> +    {
> +        argo_pfn_t pfn;
> +        mfn_t mfn;
> +
> +        ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +        if ( ret )
> +            break;
> +
> +        ret = argo_find_ring_mfn(d, pfn, &mfn);
> +        if ( ret )
> +        {
> +            printk(XENLOG_ERR "argo: vm%u passed invalid gpfn %"PRI_xen_pfn
> +                   " ring (vm%u:%x vm%d) %p seq %d of %d\n",
> +                   d->domain_id, pfn, ring_info->id.addr.domain_id,
> +                   ring_info->id.addr.port, ring_info->id.partner,
> +                   ring_info, i, ring_info->npage);
> +            break;
> +        }
> +
> +        ring_info->mfns[i] = mfn;
> +        ring_info->nmfns = i + 1;
> +
> +        argo_dprintk("%d: %"PRI_xen_pfn" -> %"PRI_mfn"\n",
> +               i, pfn, mfn_x(ring_info->mfns[i]));
> +
> +        ring_info->mfn_mapping[i] = NULL;
> +    }
> +
> +    if ( ret )
> +        argo_ring_remove_mfns(d, ring_info);
> +    else
> +    {
> +        ASSERT(ring_info->nmfns == ring_info->npage);
> +
> +        printk(XENLOG_ERR "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p"
> +               " npage %d nmfns %d\n", current->domain->domain_id,
> +               ring_info->id.addr.domain_id, ring_info->id.addr.port,
> +               ring_info->id.partner, ring_info, ring_info->mfn_mapping,
> +               ring_info->npage, ring_info->nmfns);
> +    }
> +    return ret;
> +}
> +
> +static struct argo_ring_info *
> +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> +{
> +    uint16_t hash;
> +    struct hlist_node *node;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    hash = argo_hash_fn(id);
> +
> +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%d]=%p id=%p\n",
> +                 d->argo, hash, d->argo->ring_hash[hash].first, id);
> +    argo_dprintk("id.addr.port=%d id.addr.domain=vm%u"
> +                 " id.addr.partner=vm%d\n",
> +                 id->addr.port, id->addr.domain_id, id->partner);
> +
> +    hlist_for_each_entry(ring_info, node, &d->argo->ring_hash[hash], node)
> +    {
> +        argo_ring_id_t *cmpid = &ring_info->id;
> +
> +        if ( cmpid->addr.port == id->addr.port &&
> +             cmpid->addr.domain_id == id->addr.domain_id &&
> +             cmpid->partner == id->partner )
> +        {
> +            argo_dprintk("ring_info=%p\n", ring_info);
> +            return ring_info;
> +        }
> +    }
> +    argo_dprintk("no ring_info found\n");
> +
> +    return NULL;
> +}
> +
> +static long
> +argo_register_ring(struct domain *d,
> +                   XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd,
> +                   XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, uint32_t npage,
> +                   bool fail_exist)
> +{
> +    struct argo_ring ring;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +    bool update_tx_ptr = 0;

bool uses true/false.

> +    uint64_t dst_domain_cookie = 0;
> +
> +    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> +        return -EINVAL;
> +
> +    read_lock (&argo_lock);
                ^ extra space.

> +
> +    do {
> +        if ( !d->argo )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        if ( copy_from_guest(&ring, ring_hnd, 1) )
> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        if ( ring.magic != ARGO_RING_MAGIC )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( (ring.len < (sizeof(struct argo_ring_message_header)
> +                          + ARGO_ROUNDUP(1) + ARGO_ROUNDUP(1)))   ||
> +             (ARGO_ROUNDUP(ring.len) != ring.len) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( ring.len > ARGO_MAX_RING_SIZE )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( ring.id.partner == ARGO_DOMID_ANY )
> +        {
> +            ret = xsm_argo_register_any_source(d, argo_mac_bootparam_enforcing);
> +            if ( ret )
> +                break;
> +        }
> +        else
> +        {
> +            struct domain *dst_d = get_domain_by_id(ring.id.partner);

Missing newline.

> +            if ( !dst_d )
> +            {
> +                argo_dprintk("!dst_d, ECONNREFUSED\n");
> +                ret = -ECONNREFUSED;
> +                break;
> +            }
> +
> +            ret = xsm_argo_register_single_source(d, dst_d);
> +            if ( ret )
> +            {
> +                put_domain(dst_d);
> +                break;
> +            }
> +
> +            if ( !dst_d->argo )
> +            {
> +                argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +                ret = -ECONNREFUSED;
> +                put_domain(dst_d);
> +                break;
> +            }
> +
> +            dst_domain_cookie = dst_d->argo->domain_cookie;
> +
> +            put_domain(dst_d);
> +        }
> +
> +        ring.id.addr.domain_id = d->domain_id;
> +        if ( copy_field_to_guest(ring_hnd, &ring, id) )
> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        /*
> +         * no need for a lock yet, because only we know about this
> +         * set the tx pointer if it looks bogus (we don't reset it
> +         * because this might be a re-register after S4)
> +         */
> +
> +        if ( ring.tx_ptr >= ring.len ||
> +             ARGO_ROUNDUP(ring.tx_ptr) != ring.tx_ptr )
> +        {
> +            /*
> +             * Since the ring is a mess, attempt to flush the contents of it
> +             * here by setting the tx_ptr to the next aligned message slot past
> +             * the latest rx_ptr we have observed. Handle ring wrap correctly.
> +             */
> +            ring.tx_ptr = ARGO_ROUNDUP(ring.rx_ptr);
> +
> +            if ( ring.tx_ptr >= ring.len )
> +                ring.tx_ptr = 0;
> +
> +            /* ring.tx_ptr will be written back to the guest ring below. */
> +            update_tx_ptr = 1;
> +        }
> +
> +        /* W(L2) protects all the elements of the domain's ring_info */
> +        write_lock(&d->argo->lock);

I don't understand this W(L2) nomenclature, is this explain somewhere?

Also there's no such comment when you take the global argo_lock above.

> +
> +        do {
> +            ring_info = argo_ring_find_info(d, &ring.id);
> +
> +            if ( !ring_info )
> +            {
> +                uint16_t hash;
> +
> +                ring_info = xmalloc(struct argo_ring_info);
> +                if ( !ring_info )
> +                {
> +                    ret = -ENOMEM;
> +                    break;
> +                }
> +
> +                spin_lock_init(&ring_info->lock);
> +
> +                ring_info->mfns = NULL;
> +                ring_info->npage = 0;
> +                ring_info->mfn_mapping = NULL;
> +                ring_info->len = 0;
> +                ring_info->nmfns = 0;
> +                ring_info->tx_ptr = 0;
> +                ring_info->partner_cookie = dst_domain_cookie;
> +
> +                ring_info->id = ring.id;
> +                INIT_HLIST_HEAD(&ring_info->pending);
> +
> +                hash = argo_hash_fn(&ring_info->id);
> +                hlist_add_head(&ring_info->node, &d->argo->ring_hash[hash]);
> +
> +                printk(XENLOG_INFO "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> +                       current->domain->domain_id, ring.id.addr.domain_id,
> +                       ring.id.addr.port, ring.id.partner);
> +            }
> +            else
> +            {
> +                /*
> +                 * If the caller specified that the ring must not already exist,
> +                 * fail at attempt to add a completed ring which already exists.
> +                 */
> +                if ( fail_exist && ring_info->len )
> +                {
> +                    ret = -EEXIST;
> +                    break;
> +                }
> +
> +                printk(XENLOG_INFO
> +                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
> +                     current->domain->domain_id, ring.id.addr.domain_id,
> +                     ring.id.addr.port, ring.id.partner);
> +            }
> +
> +            /* Since we hold W(L2), there is no need to take L3 here */
> +            ring_info->tx_ptr = ring.tx_ptr;
> +
> +            ret = argo_find_ring_mfns(d, ring_info, npage, pfn_hnd, ring.len);
> +            if ( !ret )
> +                ret = update_tx_ptr ? argo_update_tx_ptr(ring_info, ring.tx_ptr)
> +                                    : argo_ring_map_page(ring_info, 0, NULL);
> +            if ( !ret )
> +                ring_info->len = ring.len;
> +
> +        } while ( 0 );

Why this useless loop? Just adds to indentation.

> +
> +        write_unlock(&d->argo->lock);
> +
> +    } while ( 0 );

Same here.

> +
> +    read_unlock(&argo_lock);
> +
> +    return ret;
> +}
> +
>  long
>  do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>                     XEN_GUEST_HANDLE_PARAM(void) arg2,
> @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>  
>      switch (cmd)
>      {
> +    case ARGO_MESSAGE_OP_register_ring:
> +    {
> +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> +            guest_handle_cast(arg1, argo_ring_t);
> +        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> +            guest_handle_cast(arg2, argo_pfn_t);
> +        uint32_t npage = arg3;
> +        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> +
> +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> +            break;
> +        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> +            break;
> +        /* arg4: reserve currently-undefined bits, require zero.  */
> +        if ( unlikely(arg4 & ~ARGO_REGISTER_FLAG_MASK) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = argo_register_ring(d, ring_hnd, pfn_hnd, npage, fail_exist);
> +        break;
> +    }
>      default:
>          rc = -ENOSYS;
>          break;
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> index 1137c54..98006f8 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -34,6 +34,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>  
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
> index 9391cd3..e9d25d6 100644
> --- a/xen/include/asm-x86/guest_access.h
> +++ b/xen/include/asm-x86/guest_access.h
> @@ -50,6 +50,8 @@
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>  
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 20dabc0..5ad8e2b 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -21,6 +21,20 @@
>  
>  #include "xen.h"
>  
> +#define ARGO_RING_MAGIC      0xbd67e163e7777f2fULL
> +
> +#define ARGO_DOMID_ANY           DOMID_INVALID

I think you should either leave 1 space between the define name and
the value, or if you want to add multiple spaces please make all the
define values aligned on the same col.

> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x1000000 or 16777216 bytes.
> + * A byte index into the ring is at most 24 bits.
> + */
> +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> +
> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> +typedef uint64_t argo_pfn_t;
> +
>  typedef struct argo_addr
>  {
>      uint32_t port;
> @@ -52,4 +66,54 @@ typedef struct argo_ring
>  #endif
>  } argo_ring_t;
>  
> +/*
> + * Messages on the ring are padded to 128 bits
> + * Len here refers to the exact length of the data not including the
> + * 128 bit header. The message uses
> + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> + * Using typeof(a) make clear that this does not truncate any high-order bits.
> + */
> +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)

Why not just use ROUNDUP?

And in any case this shouldn't be on the public header IMO, since it's
not part of the interface AFAICT.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 05/25] argo: Add initial argo_init and argo_destroy
  2018-12-01  1:32 ` [PATCH 05/25] argo: Add initial argo_init and argo_destroy Christopher Clark
  2018-12-04  9:12   ` Paul Durrant
@ 2018-12-13 13:16   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 13:16 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> --- /dev/null
> +++ b/xen/include/public/argo.h
> @@ -0,0 +1,55 @@
> +/******************************************************************************
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Derived from v4v, the version 2 of v2v.
> + *
> + * Copyright (c) 2010, Citrix Systems
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#ifndef __XEN_PUBLIC_ARGO_H__
> +#define __XEN_PUBLIC_ARGO_H__
> +
> +#include "xen.h"
> +
> +typedef struct argo_addr
> +{
> +    uint32_t port;
> +    domid_t domain_id;
> +    uint16_t pad;
> +} argo_addr_t;
> +
> +typedef struct argo_ring_id
> +{
> +    struct argo_addr addr;
> +    domid_t partner;
> +    uint16_t pad;
> +} argo_ring_id_t;
> +
> +typedef struct argo_ring
> +{
> +    uint64_t magic;
> +    argo_ring_id_t id;
> +    uint32_t len;
> +    /* Guests should use atomic operations to access rx_ptr */
> +    uint32_t rx_ptr;
> +    /* Guests should use atomic operations to access tx_ptr */
> +    uint32_t tx_ptr;
> +    uint8_t reserved[32];
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +    uint8_t ring[];
> +#elif defined(__GNUC__)
> +    uint8_t ring[0];
> +#endif
> +} argo_ring_t;

Btw, for all structure types you define, and with your desire to avoid
compat mode translation, you should add ?-prefixed entries to
xen/include/xlat.lst and invoke the produced CHECK_* macros from
somewhere. If, for reference, you'd look at existing instances, you'll
then also find another reason why all these would better have xen_
prefixes.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 16/25] argo: implement the notify op
  2018-12-01  1:32 ` [PATCH 16/25] argo: implement the notify op Christopher Clark
@ 2018-12-13 14:06   ` Jan Beulich
  2018-12-20  6:12     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:06 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> +static uint32_t
> +argo_ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    argo_ring_t ring;
> +    int32_t ret;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    ring.len = ring_info->len;
> +    if ( !ring.len )
> +        return 0;
> +
> +    ring.tx_ptr = ring_info->tx_ptr;
> +
> +    if ( argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr) )
> +        return 0;
> +
> +    argo_dprintk("argo_ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
> +                 ring.tx_ptr, ring.rx_ptr);
> +
> +    if ( ring.rx_ptr == ring.tx_ptr )
> +        return ring.len - sizeof(struct argo_ring_message_header);
> +
> +    ret = ring.rx_ptr - ring.tx_ptr;
> +    if ( ret < 0 )
> +        ret += ring.len;

Seeing these two if()-s - how is an empty ring distinguished from
a completely full one? I'm getting the impression that
ring.rx_ptr == ring.tx_ptr in both cases.

> +    ret -= sizeof(struct argo_ring_message_header);
> +    ret -= ARGO_ROUNDUP(1);

Wouldn't you instead better round ret to a suitable multiple of
whatever granularity you try to arrange for here? Otherwise
what is this extra subtraction supposed to do?

> @@ -627,6 +679,43 @@ argo_pending_remove_all(struct argo_ring_info *ring_info)
>      }
>  }
>  
> +static void
> +argo_pending_notify(struct hlist_head *to_notify)
> +{
> +    struct hlist_node *node, *next;
> +    struct argo_pending_ent *pending_ent;
> +
> +    ASSERT(rw_is_locked(&argo_lock));
> +
> +    hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
> +    {
> +        hlist_del(&pending_ent->node);
> +        argo_signal_domid(pending_ent->id);
> +        xfree(pending_ent);
> +    }
> +}
> +
> +static void
> +argo_pending_find(const struct domain *d, struct argo_ring_info *ring_info,
> +                  uint32_t payload_space, struct hlist_head *to_notify)
> +{
> +    struct hlist_node *node, *next;
> +    struct argo_pending_ent *ent;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    spin_lock(&ring_info->lock);
> +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> +    {
> +        if ( payload_space >= ent->len )
> +        {
> +            hlist_del(&ent->node);
> +            hlist_add_head(&ent->node, to_notify);
> +        }
> +    }

So if there's space available to fit e.g. just the first pending entry,
you'd continue the loop and also signal all others, provided their
lengths aren't too big? What good does producing such a burst of
notifications do, when only one of the interested parties is
actually going to be able to put something on the ring?

> @@ -705,6 +812,107 @@ argo_ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
>      xfree(ring_info);
>  }
>  
> +/*ring data*/

Now this comment is malformed in any event.

> +static int
> +argo_fill_ring_data(struct domain *src_d,

const?

> +static long
> +argo_notify(struct domain *d,
> +            XEN_GUEST_HANDLE_PARAM(argo_ring_data_t) ring_data_hnd)
> +{
> +    argo_ring_data_t ring_data;
> +    int ret = 0;
> +
> +    read_lock(&argo_lock);
> +
> +    if ( !d->argo )
> +    {
> +        read_unlock(&argo_lock);
> +        argo_dprintk("!d->argo, ENODEV\n");
> +        return -ENODEV;
> +    }
> +
> +    argo_notify_check_pending(d);
> +
> +    do {
> +        if ( !guest_handle_is_null(ring_data_hnd) )
> +        {
> +            /* Quick sanity check on ring_data_hnd */
> +            ret = copy_field_from_guest_errno(&ring_data, ring_data_hnd, magic);
> +            if ( ret )
> +                break;
> +
> +            if ( ring_data.magic != ARGO_RING_DATA_MAGIC )
> +            {
> +                argo_dprintk(
> +                    "ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx), EINVAL\n",
> +                    ring_data.magic, ARGO_RING_MAGIC);
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            ret = copy_from_guest_errno(&ring_data, ring_data_hnd, 1);
> +            if ( ret )
> +                break;
> +
> +            {
> +                /*
> +                 * This is a guest pointer passed as a field in a struct
> +                 * so XEN_GUEST_HANDLE is used.
> +                 */
> +                XEN_GUEST_HANDLE(argo_ring_data_ent_t) ring_data_ent_hnd;
> +                ring_data_ent_hnd = guest_handle_for_field(ring_data_hnd,
> +                                                           argo_ring_data_ent_t,
> +                                                           data[0]);
> +                ret = argo_fill_ring_data_array(d, ring_data.nent,
> +                                                ring_data_ent_hnd);
> +            }

Stray braces and bogus indentation: The declaration can move up
and then you don't need the braces and the extra level of indentation.

> @@ -103,6 +104,40 @@ typedef struct argo_ring
>   */
>  #define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
>  
> +/*
> + * Notify flags
> + */
> +/* Ring is empty */
> +#define ARGO_RING_DATA_F_EMPTY       (1U << 0)
> +/* Ring exists */
> +#define ARGO_RING_DATA_F_EXISTS      (1U << 1)
> +/* Pending interrupt exists. Do not rely on this field - for profiling only */
> +#define ARGO_RING_DATA_F_PENDING     (1U << 2)
> +/* Sufficient space to queue space_required bytes exists */
> +#define ARGO_RING_DATA_F_SUFFICIENT  (1U << 3)
> +
> +typedef struct argo_ring_data_ent
> +{
> +    argo_addr_t ring;
> +    uint16_t flags;
> +    uint16_t pad;
> +    uint32_t space_required;
> +    uint32_t max_message_size;
> +} argo_ring_data_ent_t;
> +
> +typedef struct argo_ring_data
> +{
> +    uint64_t magic;

What is this good for?

> @@ -179,6 +214,33 @@ struct argo_ring_message_header
>   */
>  #define ARGO_MESSAGE_OP_sendv               5
>  
> +/*
> + * ARGO_MESSAGE_OP_notify
> + *
> + * Asks Xen for information about other rings in the system.
> + *
> + * ent->ring is the argo_addr_t of the ring you want information on.
> + * Uses the same ring matching rules as ARGO_MESSAGE_OP_sendv.
> + *
> + * ent->space_required : if this field is not null then Xen will check
> + * that there is space in the destination ring for this many bytes of  payload.
> + * If sufficient space is available, it will set ARGO_RING_DATA_F_SUFFICIENT
> + * and CANCEL any pending notification for that ent->ring; otherwise it
> + * will schedule a notification event and the flag will not be set.
> + *
> + * These flags are set by Xen when notify replies:
> + * ARGO_RING_DATA_F_EMPTY       ring is empty
> + * ARGO_RING_DATA_F_PENDING     notify event is pending - * don't rely on this *
> + * ARGO_RING_DATA_F_SUFFICIENT  sufficient space for space_required is there
> + * ARGO_RING_DATA_F_EXISTS      ring exists
> + *
> + * arg1: XEN_GUEST_HANDLE(argo_ring_data_t) ring_data (may be NULL)
> + * arg2: NULL
> + * arg3: 0 (ZERO)
> + * arg4: 0 (ZERO)

Another observation I probably should have made earlier: You
don't check that the NULL/ZERO specified argument are indeed
so. Just like for padding fields, please do.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 18/25] argo: limit the max number of rings that a domain may register.
  2018-12-01  1:32 ` [PATCH 18/25] argo: limit the max number of rings that a domain may register Christopher Clark
@ 2018-12-13 14:08   ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:08 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> Very basic implementation: a fixed limit of 128.

Such restrictions to limit resource use would better be implemented
right away for code that can be used (in a limited way) already with
just the initial parts of the series applied.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func
  2018-12-01  1:33 ` [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func Christopher Clark
@ 2018-12-13 14:10   ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:10 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> This is out of an abundance of caution, since this is a very basic hash
> function, chosen more for its bucket distribution properties to cluster related
> rings rather than for cryptographic strength or any uniformness of output,
> and it operates upon values supplied by the guest just before being used as an
> array index.

Same here: Better to put this in place right away for new code
than to incrementally add it later.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen
  2018-12-01  1:33 ` [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen Christopher Clark
@ 2018-12-13 14:12   ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:12 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> To be used by Argo for delivery of notifications to some guests.

Better not to make this a separate patch: By folding it into where it's
needed it is easier for everyone to judge whether the exposure is
indeed necessary, and it also eliminates the risk of the series getting
committed up to here, and the function then being pointlessly non-
static for an extended period of time.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-01  1:33 ` [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ Christopher Clark
  2018-12-02 19:55   ` Julien Grall
@ 2018-12-13 14:16   ` Jan Beulich
  2018-12-20  6:20     ` Christopher Clark
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:16 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> * x86 PV domains are notified via event channel.
> 
> PV guests are known to have the event channel software present in the guest
> kernel, so it is fine to depend on and use it.
> 
> * x86 HVM domains and all ARM domains are notified via VIRQ.
> 
> The intent is to remove the requirement for event channel software to be
> installed within these guests in order to use Argo. VIRQ signalling is also
> the method that has been in use for the longest period with this hypercall
> in both XenClient and OpenXT.

I'm afraid I don't follow: send_guest_global_virq() uses, well,
evtchn_port_set_pending(), just like evtchn_send() does.
Therefore how does sending a vIRQ help with a guest without
event channel awareness?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume
  2018-12-01  1:33 ` [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume Christopher Clark
@ 2018-12-13 14:26   ` Jan Beulich
  2018-12-20  6:25     ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:26 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> so that the guest may re-register the rings on resume with current mappings.

Is this something guests really need help with, rather than managing
it on their own? What does "current mappings" here mean, i.e. why
do rings need re-registration in the first place?

> +void
> +argo_resume(struct domain *d)
> +{
> +    bool send_wakeup;
> +
> +    if ( !d )
> +        return;
> +
> +    if ( !get_domain(d) )
> +        return;
> +
> +    read_lock(&argo_lock);
> +
> +    read_lock(&d->argo->lock);
> +    send_wakeup = ( d->argo->ring_count > 0 );
> +    read_unlock(&d->argo->lock);
> +
> +    if ( send_wakeup )
> +        argo_signal_domain(d);
> +
> +    read_unlock(&argo_lock);
> +
> +    put_domain(d);
> +}

domain_resume() also gets called from domain_soft_reset(). Do
you really want such handling in that case as well, when after a
soft-reset the domain is supposed to be "blank"?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 25/25] argo: implement the get_config op to query notification config
  2018-12-01  1:33 ` [PATCH 25/25] argo: implement the get_config op to query notification config Christopher Clark
@ 2018-12-13 14:32   ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-13 14:32 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -1656,6 +1656,46 @@ argo_sendv(struct domain *src_d, const argo_addr_t *src_addr,
>      return ( ret < 0 ) ? ret : len;
>  }
>  
> +static void
> +argo_get_config(struct domain *d, argo_get_config_t *get_config)
> +{
> +    unsigned int method = argo_signal_method(d);
> +
> +    get_config->signal_method = method;
> +
> +    switch ( method )
> +    {
> +        case ARGO_SIGNAL_METHOD_EVTCHN:
> +        {
> +            read_lock(&argo_lock);
> +            read_lock(&d->argo->lock);
> +
> +            get_config->signal.evtchn = d->argo->evtchn_port;
> +
> +            read_unlock(&d->argo->lock);
> +            read_unlock(&argo_lock);
> +
> +            argo_dprintk("signal for dom:%d evtchn %u\n", d->domain_id,
> +                         get_config->signal.evtchn);
> +
> +            break;
> +        }
> +        case ARGO_SIGNAL_METHOD_VIRQ:
> +        {
> +            get_config->signal.virq = VIRQ_ARGO;
> +
> +            argo_dprintk("signal for dom:%d virq %u\n", d->domain_id,
> +                         get_config->signal.virq);
> +            break;
> +        }
> +        default:
> +        {
> +            BUG();
> +            break;
> +        }

There are quite a few stray braces here.

> +typedef struct argo_get_config
> +{
> +    uint32_t signal_method;
> +    union
> +    {
> +        evtchn_port_t evtchn;
> +        uint32_t virq;
> +    } signal;
> +    uint32_t reserved;

Judging from the description, did you perhaps mean to put
uint32_t reserved[2] inside the union?

Then again "get_config" sounds much more generic than just
obtaining the notification method.

> @@ -244,6 +257,21 @@ struct argo_ring_message_header
>   */
>  #define ARGO_MESSAGE_OP_notify              4
>  
> +/*
> + * ARGO_MESSAGE_OP_get_config
> + *
> + * Queries Xen for argo configuration values.
> + *
> + * Used by a guest to obtain the signal method in use for Argo notifications
> + * and the event channel port or isa irq in use.

ISA IRQ? It's a vIRQ that you have as alternative to bare
event-channel signaling.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate
  2018-12-04  9:44   ` Paul Durrant
@ 2018-12-20  5:13     ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  5:13 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, xen-devel, Eric Chanudet,
	Roger Pau Monne

On Tue, Dec 4, 2018 at 1:44 AM Paul Durrant <Paul.Durrant@citrix.com> wrote:
>
> > -----Original Message-----
> > From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> > Sent: 01 December 2018 01:33
> > To: xen-devel@lists.xenproject.org
> > Subject: [PATCH 03/25] argo: introduce the argo_message_op hypercall
> > boilerplate
> >
> > Presence is gated upon CONFIG_ARGO.
> >
> > Registers the hypercall previously reserved for this.
> > Takes 5 arguments, does nothing and returns -ENOSYS.
> >
> > Will be avoiding a compat ABI by using fixed-size types in hypercall ops.
>
> You appear to be using handles, so will you not need compat code to deal with those?

No. The structures that the handles refer to are exactly the same on
both 32 and 64 bit, so the memory access operations that work with the
handle DTRT.

Communication has been tested and is working fine, eg. between a
client in a 32-bit guest and server in a 64-bit VM.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
  2018-12-12 16:01   ` Roger Pau Monné
@ 2018-12-20  5:16     ` Christopher Clark
  2018-12-20  8:45       ` Jan Beulich
  2018-12-20 12:57       ` Roger Pau Monné
  0 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  5:16 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Wed, Dec 12, 2018 at 8:03 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Nov 30, 2018 at 05:32:46PM -0800, Christopher Clark wrote:
> > Applied to both x86 and ARM headers.
> >
> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > ---
> >  xen/include/asm-arm/guest_access.h | 25 +++++++++++++++++++++++++
> >  xen/include/asm-x86/guest_access.h | 29 +++++++++++++++++++++++++++++
> >  xen/include/xen/guest_access.h     |  3 +++
> >  3 files changed, 57 insertions(+)
> >
> > diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> > index 224d2a0..7b6f89c 100644
> > --- a/xen/include/asm-arm/guest_access.h
> > +++ b/xen/include/asm-arm/guest_access.h
> > @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
> >  #define __raw_copy_from_guest raw_copy_from_guest
> >  #define __raw_clear_guest raw_clear_guest
> >
> > +#define raw_copy_from_guest_errno(dst, src, len)             \
> > +    (raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
> > +#define raw_copy_to_guest_errno(dst, src, len)               \
> > +    (raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)
>
> Since the only error that you return is EFAULT, I don't really see the
> point in adding all those helpers. You achieve exactly the same by
> returning a boolean and doing the translation to EFAULT in the caller
> if required.
>
> It might have been nice to have the copy to/from set of functions
> return an error value, but adding a new set of helpers that have the
> same functionality but just differ in the return value look
> redundant.

It is true that there is redundancy with these -- but I think there are decent
arguments in favour of taking these in:

* the errno-providing interface is just a better fit for almost every call site
- which means less source code in total, that is easier to read.

* it is promoting good interface design for error handling:
  return of error code.

* since these are in use within the uxen source code, it eases comparison and
  work across both codebases - relevant for Argo, due to v4v.

I've rewritten the implementation of these for the second version of the patch
series -- now much simpler -- and hopefully that will mitigate some of your
concern about them.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam
  2018-12-04  9:52   ` Paul Durrant
@ 2018-12-20  5:19     ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  5:19 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper,
	Jason Andryuk, Tim (Xen.org),
	George Dunlap, Rich Persaud, James McKenzie, Julien Grall,
	Jan Beulich, Ian Jackson, xen-devel, Daniel De Graaf,
	Eric Chanudet

On Tue, Dec 4, 2018 at 1:52 AM Paul Durrant <Paul.Durrant@citrix.com> wrote:
>
> > -----Original Message-----
> > From: Christopher Clark [mailto:christopher.w.clark@gmail.com]
> > Sent: 01 December 2018 01:33
> > To: xen-devel@lists.xenproject.org
> > Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> > <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> > Beulich <jbeulich@suse.com>; Julien Grall <julien.grall@arm.com>; Konrad
> > Rzeszutek Wilk <konrad.wilk@oracle.com>; Paul Durrant
> > <Paul.Durrant@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> > Tim (Xen.org) <tim@xen.org>; Wei Liu <wei.liu2@citrix.com>; Daniel De
> > Graaf <dgdegra@tycho.nsa.gov>; Rich Persaud <persaur@gmail.com>; Ross
> > Philipson <ross.philipson@gmail.com>; Eric Chanudet
> > <eric.chanudet@gmail.com>; James McKenzie <voreekf@madingley.org>; Jason
> > Andryuk <jandryuk@gmail.com>; Daniel Smith <dpsmith@apertussolutions.com>
> > Subject: [PATCH 11/25] xsm, argo: XSM control for argo register operation,
> > argo_mac bootparam
> >
> > XSM hooks implement distinct permissions for these two distinct cases of
> > Argo ring registration:
> >
> > * Single source:  registering a ring for communication to receive messages
> >                   from a specified single other domain.
> >   Default policy: allow.
> >
> > * Any source:     registering a ring for communication to receive messages
> >                   from any, or all, other domains (ie. wildcard).
> >   Default policy: deny, with runtime policy configuration via new
> > bootparam.
> >
> > The reason why the default for wildcard rings is 'deny' is that there is
> > currently no means other than XSM to protect the ring from DoS by a noisy
> > domain spamming the ring, reducing the ability of other domains to send to
> > it.
> > Using XSM at least allows per-domain control over access to the send
> > permission, to limit communication to domains that can be trusted.
> >
> > Since denying access to any-sender rings unless a flask XSM policy is
> > active
> > will prevent many users from using a key Argo feature, also introduce a
> > bootparam
> > that can override this constraint:
> >  "argo_mac" variable has allowed values: 'permissive' and 'enforcing'.
> > Even though this is a boolean variable, use these descriptive strings in
> > order
> > to make it obvious to an administrator that this has potential security
> > impact.
> >
> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > ---
> >  xen/common/argo.c                     | 15 +++++++++++++++
> >  xen/include/xsm/dummy.h               | 15 +++++++++++++++
> >  xen/include/xsm/xsm.h                 | 17 +++++++++++++++++
> >  xen/xsm/dummy.c                       |  4 ++++
> >  xen/xsm/flask/hooks.c                 | 19 +++++++++++++++++++
> >  xen/xsm/flask/policy/access_vectors   | 11 +++++++++++
> >  xen/xsm/flask/policy/security_classes |  1 +
> >  7 files changed, 82 insertions(+)
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 82fab36..2a95e09 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -32,6 +32,21 @@ DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
> >  static bool __read_mostly opt_argo_enabled = 0;
> >  boolean_param("argo", opt_argo_enabled);
> >
> > +/* Xen command line option for conservative or relaxed access control */
> > +bool __read_mostly argo_mac_bootparam_enforcing = true;
> > +
> > +static int __init parse_argo_mac_param(const char *s)
> > +{
> > +    if ( !strncmp(s, "enforcing", 10) )
> > +        argo_mac_bootparam_enforcing = true;
> > +    else if ( !strncmp(s, "permissive", 11) )
> > +        argo_mac_bootparam_enforcing = false;
> > +    else
>
> Do you really want to parse e.g. 'enforcingfoobar' as 'enforcing'?

No, I don't - and it doesn't do that because the number supplied to strncmp
is large enough to include comparison of the string terminator too -- but I
get the point: strncmp is just confusing and is for no clear benefit, so I've
dropped it in favour of strcmp in the next revision.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-12  9:48   ` Jan Beulich
@ 2018-12-20  5:29     ` Christopher Clark
  2018-12-20  8:29       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  5:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> > +static int
> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> > +                    uint32_t len)
> > +{
> > +    int i;
> > +    int ret = 0;
> > +
> > +    if ( (npage << PAGE_SHIFT) < len )
> > +        return -EINVAL;
> > +
> > +    if ( ring_info->mfns )
> > +    {
> > +        /*
> > +         * Ring already existed. Check if it's the same ring,
> > +         * i.e. same number of pages and all translated gpfns still
> > +         * translating to the same mfns
> > +         */
>
> This comment makes me wonder whether the translations are
> permitted to change at other times. If so I'm not sure what
> value verification here has. If not, this probably would want to
> be debugging-only code.

My understanding is that the gfn->mfn translation is not necessarily stable
across entry and exit from host power state S4, suspend to disk.

I've added extra explanation to the next version of the patch series in
the commit message for the suspend and resume functions which trigger
guest use of this logic.

> > +static struct argo_ring_info *
> > +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> > +{
> > +    uint16_t hash;
> > +    struct hlist_node *node;
>
> const?

I couldn't determine exactly what you were pointing towards with this one.
I've applied 'const' in a lot further place in the next version; please
let me know if I've missed where you intended.

> > +    uint64_t dst_domain_cookie = 0;
> > +
> > +    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> > +        return -EINVAL;
>
> Why? You don't store the handle for later use (and you shouldn't).
> If there really is a need for a full page's worth of memory, it
> would better be passed in as GFN.

I've added this comment for this behaviour in v2:

+    /*
+     * Verify the alignment of the ring data structure supplied with the
+     * understanding that the ring handle supplied points to the same memory as
+     * the first entry in the array of pages provided via pg_descr_hnd, where
+     * the head of the ring will reside.
+     * See argo_update_tx_ptr where the location of the tx_ptr is accessed at a
+     * fixed offset from head of the first page in the mfn array.
+     */

> > @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >
> >      switch (cmd)
> >      {
> > +    case ARGO_MESSAGE_OP_register_ring:
> > +    {
> > +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> > +            guest_handle_cast(arg1, argo_ring_t);
> > +        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> > +            guest_handle_cast(arg2, argo_pfn_t);
> > +        uint32_t npage = arg3;
> > +        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> > +
> > +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> > +            break;
>
> I don't understand the need for this and ...
>
> > +        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> > +        {
> > +            rc = -EINVAL;
> > +            break;
> > +        }
> > +        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> > +            break;
>
> ... perhaps also this, when you use copy_from_guest() upon access.

This is the one piece of feedback on version 1 of this series that I haven't
taken the time to address yet. The code is evidently safe, with only a possible
performance decrease a concern, so I'd like to study it further before removing
any of the checks rather than delay posting version two of this series.

All the other points: ack, and should be ok in v2.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-12 16:47   ` Roger Pau Monné
@ 2018-12-20  5:41     ` Christopher Clark
  2018-12-20  8:51       ` Jan Beulich
  2018-12-20 12:52       ` Roger Pau Monné
  0 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  5:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > +static inline uint16_t
> > +argo_hash_fn(const struct argo_ring_id *id)
>
> No need for the argo_ prefix for static functions, this is already an
> argo specific file.

Although the compiler could live without the prefix, I'm finding it helpful to
very easily determine that functions being used are not defined elsewhere
within Xen; so I've left the prefix as is for version two of this series.

> > +{
> > +    uint16_t ret;
> > +
> > +    ret = (uint16_t)(id->addr.port >> 16);
> > +    ret ^= (uint16_t)id->addr.port;
> > +    ret ^= id->addr.domain_id;
> > +    ret ^= id->partner;
> > +
> > +    ret &= (ARGO_HTABLE_SIZE - 1);
>
> I'm having trouble figuring out what this is supposed to do, I think a
> comment and the expected hash formula will help make sure the code is
> correct.

Fair point. I've added a comment with explanation in the next version.

+ * This hash function is used to distribute rings within the per-domain
+ * hash table (d->argo->ring_hash). The hash table will provide a
+ * 'ring_info' struct if a match is found with a 'xen_argo_ring_id' key:
+ * ie. the key is a (domain id, port, partner domain id) tuple.
+ * There aren't many hash table buckets, and this doesn't need to be
+ * cryptographically robust. Since port number varies the most in
+ * expected use, and the Linux driver allocates at both the high and
+ * low ends, incorporate high and low bits to help with distribution.

> Also doesn't this need to be documented in the public header?

No, it's only for internal use within the hypervisor and designed to
meet the hypervisor's hashing requirement with a small table.

> > +    ASSERT(ring_info->mfns);
> > +    ASSERT(ring_info->mfn_mapping);
>
> We are trying to move away from such assertions, and instead use
> constructions that would prevent issues in non-debug builds. I would
> write the above asserts as:
>
> if ( !ring_info->mfns || !ring_info->mfn_mapping )
> {
>     ASSERT_UNREACHABLE();
>     return -E<something>;
> }
>
> That way non-debug builds won't trigger page faults if there's indeed
> a way to get here with the wrong state, and debug builds will still
> hit an assert.

ack, will do so in v2.

> > +    *mfn = get_gfn_unshare(d, pfn, &p2mt);
>
> Is this supposed to work for PV guests?

Yes -- and they seem to work OK. Am I missing something?

> > +#else
> > +    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > +#endif
> > +
> > +    if ( !mfn_valid(*mfn) )
> > +        ret = -EINVAL;
> > +#ifdef CONFIG_X86
> > +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > +        ret = -EAGAIN;
> > +#endif
> > +    else if ( (p2mt != p2m_ram_rw) ||
> > +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > +        ret = -EINVAL;
> > +
> > +#ifdef CONFIG_X86
> > +    put_gfn(d, pfn);
>
> If you do this put_gfn here, by the time you check that the gfn -> mfn
> matches your expectations the guest might have somehow changed the gfn
> -> mfn mapping already (for example by ballooning down memory?)

If the guest does that, I think it only harms itself. If for some reason
a memory access is denied, then the op would just fail. I don't think
there's a more serious consequence to be worried about.

Above, if we're going to use the mfn, then we've just done a successful:
    get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)

which should hold it in a state that we're ok with until we're done
with it -- see put_page_and_type in argo_ring_remove_mfns.

> > +        /* W(L2) protects all the elements of the domain's ring_info */
> > +        write_lock(&d->argo->lock);
>
> I don't understand this W(L2) nomenclature, is this explain somewhere?

Yes, sort of. Lock "L2" is the per-domain argo lock, identified in a
comment near the top of the file. It's a read-write lock, so 'W' means:
take the write lock on it.

> Also there's no such comment when you take the global argo_lock above.

L2 covers more interesting work than L1, which is why there are more
comments pertaining to it than L1.

> > +/*
> > + * Messages on the ring are padded to 128 bits
> > + * Len here refers to the exact length of the data not including the
> > + * 128 bit header. The message uses
> > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> > + * Using typeof(a) make clear that this does not truncate any high-order bits.
> > + */
> > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
>
> Why not just use ROUNDUP?
>
> And in any case this shouldn't be on the public header IMO, since it's
> not part of the interface AFAICT.

Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
because it does need to be in the public header because it's used within the
Linux device driver, and items in that public Xen header need the 'xen' prefix
(so they now do).  Within the Linux code, it's used to choose a sensible ring
size, and also used when manipulating the rx_ptr on the guest side.

Thanks for the feedback - almost all should be covered in v2, I think.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2018-12-12 11:52   ` Jan Beulich
@ 2018-12-20  5:58     ` Christopher Clark
  2018-12-20  8:33       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  5:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Wed, Dec 12, 2018 at 3:53 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> > +static void
> > +argo_signal_domain(struct domain *d)
> > +{
> > +    argo_dprintk("signalling domid:%d\n", d->domain_id);
> > +
> > +    if ( !d->argo ) /* This can happen if the domain is being destroyed */
> > +        return;
>
> If such a precaution is necessary, how is it guaranteed that
> the pointer doesn't change to NULL between the check above
> and ...
>
> > +    evtchn_send(d, d->argo->evtchn_port);
>
> ... the use here?

ack, this code is gone in v2.
d->argo is safe to access when holding either read or write
of L1, the global argo lock, so won't switch to NULL as a surprise.

> > +static int
> > +argo_iov_count(XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> > +               uint32_t *count)
> > +{
> > +    argo_iov_t iov;
> > +    uint32_t sum_iov_lens = 0;
> > +    int ret;
> > +
> > +    if ( niov > ARGO_MAXIOV )
> > +        return -EINVAL;
> > +
> > +    while ( niov-- )
> > +    {
> > +        ret = copy_from_guest_errno(&iov, iovs, 1);
> > +        if ( ret )
> > +            return ret;
> > +
> > +        /* check each to protect sum against integer overflow */
> > +        if ( iov.iov_len > ARGO_MAX_RING_SIZE )
> > +            return -EINVAL;
> > +
> > +        sum_iov_lens += iov.iov_len;
> > +
> > +        /*
> > +         * Again protect sum from integer overflow
> > +         * and ensure total msg size will be within bounds.
> > +         */
> > +        if ( sum_iov_lens > ARGO_MAX_MSG_SIZE )
> > +            return -EINVAL;
>
> So you do overflow checks here. But how does this help when ...
>
> > +        guest_handle_add_offset(iovs, 1);
> > +    }
> > +
> > +    *count = sum_iov_lens;
> > +    return 0;
> > +}
> > +
> > +static int
> > +argo_ringbuf_insert(struct domain *d,
> > +                    struct argo_ring_info *ring_info,
> > +                    const struct argo_ring_id *src_id,
> > +                    XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs, uint8_t niov,
> > +                    uint32_t message_type, unsigned long *out_len)
> > +{
> > +    argo_ring_t ring;
> > +    struct argo_ring_message_header mh = { 0 };
> > +    int32_t sp;
> > +    int32_t ret = 0;
> > +    uint32_t len;
> > +    uint32_t iov_len;
> > +    uint32_t sum_iov_len = 0;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    if ( (ret = argo_iov_count(iovs, niov, &len)) )
> > +        return ret;
> > +
> > +    if ( ((ARGO_ROUNDUP(len) + sizeof (struct argo_ring_message_header) ) >=
> > +          ring_info->len)
> > +         || (len > ARGO_MAX_MSG_SIZE) )
> > +        return -EMSGSIZE;
> > +
> > +    do {
> > +        ret =  argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr);
> > +        if ( ret )
> > +            break;
> > +
> > +        argo_sanitize_ring(&ring, ring_info);
> > +
> > +        argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring.len=%d"
> > +                     " ring_info->tx_ptr=%d\n",
> > +                     ring.tx_ptr, ring.rx_ptr, ring.len, ring_info->tx_ptr);
> > +
> > +        if ( ring.rx_ptr == ring.tx_ptr )
> > +            sp = ring_info->len;
> > +        else
> > +        {
> > +            sp = ring.rx_ptr - ring.tx_ptr;
> > +            if ( sp < 0 )
> > +                sp += ring.len;
> > +        }
> > +
> > +        if ( (ARGO_ROUNDUP(len) + sizeof(struct argo_ring_message_header)) >= sp )
> > +        {
> > +            argo_dprintk("EAGAIN\n");
> > +            ret = -EAGAIN;
> > +            break;
> > +        }
> > +
> > +        mh.len = len + sizeof(struct argo_ring_message_header);
> > +        mh.source.port = src_id->addr.port;
> > +        mh.source.domain_id = src_id->addr.domain_id;
> > +        mh.message_type = message_type;
> > +
> > +        /*
> > +         * For this copy to the guest ring, tx_ptr is always 16-byte aligned
> > +         * and the message header is 16 bytes long.
> > +         */
> > +        BUILD_BUG_ON(sizeof(struct argo_ring_message_header) != ARGO_ROUNDUP(1));
> > +
> > +        if ( (ret = argo_memcpy_to_guest_ring(ring_info,
> > +                                              ring.tx_ptr + sizeof(argo_ring_t),
> > +                                              &mh,
> > +                                              XEN_GUEST_HANDLE_NULL(uint8_t),
> > +                                              sizeof(mh))) )
> > +            break;
> > +
> > +        ring.tx_ptr += sizeof(mh);
> > +        if ( ring.tx_ptr == ring_info->len )
> > +            ring.tx_ptr = 0;
> > +
> > +        while ( niov-- )
> > +        {
> > +            XEN_GUEST_HANDLE_PARAM(uint8_t) bufp_hnd;
> > +            XEN_GUEST_HANDLE(uint8_t) buf_hnd;
> > +            argo_iov_t iov;
> > +
> > +            ret = copy_from_guest_errno(&iov, iovs, 1);
>
> ... here you copy the structure again from guest memory, at
> which point it may have changed? I see you do some checks
> further down, but the question then is - is the checking in
> argo_iov_count() redundant and hence unnecessary? Are
> you really safe here against inconsistencies between the
> first and second reads? If so, a thorough explanation in a
> comment is needed here.

Fair point and comments have been added to v2.

>
> > +            if ( ret )
> > +                break;
> > +
> > +            bufp_hnd = guest_handle_from_ptr((uintptr_t)iov.iov_base, uint8_t);
>
> Please use a handle in the public interface instead of such a
> cast.

ack.

> > +            sp = ring.len - ring.tx_ptr;
> > +
> > +            if ( iov_len > sp )
> > +            {
> > +                ret = argo_memcpy_to_guest_ring(ring_info,
> > +                        ring.tx_ptr + sizeof(argo_ring_t),
> > +                        NULL, buf_hnd, sp);
> > +                if ( ret )
> > +                    break;
> > +
> > +                ring.tx_ptr = 0;
> > +                iov_len -= sp;
> > +                guest_handle_add_offset(buf_hnd, sp);
> > +            }
> > +
> > +            ret = argo_memcpy_to_guest_ring(ring_info,
> > +                        ring.tx_ptr + sizeof(argo_ring_t),
> > +                        NULL, buf_hnd, iov_len);
>
> Extending the remark on double guest memory read above, is
> it certain you won't overrun the ring here?

Yes, certain it's ok. Comments added to explain.

>
> > +            if ( ret )
> > +                break;
> > +
> > +            ring.tx_ptr += iov_len;
> > +
> > +            if ( ring.tx_ptr == ring_info->len )
> > +                ring.tx_ptr = 0;
> > +
> > +            guest_handle_add_offset(iovs, 1);
> > +        }
> > +
> > +        if ( ret )
> > +            break;
> > +
> > +        ring.tx_ptr = ARGO_ROUNDUP(ring.tx_ptr);
> > +
> > +        if ( ring.tx_ptr >= ring_info->len )
> > +            ring.tx_ptr -= ring_info->len;
> > +
> > +        mb();
> > +        ring_info->tx_ptr = ring.tx_ptr;
>
> What does the above barrier guard against? It's all hypervisor
> local memory which gets altered afaict.

ack, dropped.

>
> > +static int
> > +argo_pending_requeue(struct argo_ring_info *ring_info, domid_t src_id, int len)
> > +{
> > +    struct hlist_node *node;
> > +    struct argo_pending_ent *ent;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    hlist_for_each_entry(ent, node, &ring_info->pending, node)
> > +    {
> > +        if ( ent->id == src_id )
> > +        {
> > +            if ( ent->len < len )
> > +                ent->len = len;
>
> What does this achieve? I.e. why is this not either a plain
> assignment or a check that the length is the same?

New comment added:

/*
 * Reuse an existing queue entry for a notification rather than add
 * another. If the existing entry is waiting for a smaller size than
 * the current message then adjust the record to wait for the
 * current (larger) size to be available before triggering a
 * notification.
 * This assists the waiting sender by ensuring that whenever a
 * notification is triggered, there is sufficient space available
 * for (at least) any one of the messages awaiting transmission.
 */

> > +static struct argo_ring_info *
> > +argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
> > +                             domid_t partner_id, uint64_t partner_cookie)
> > +{
> > +    argo_ring_id_t id;
> > +    struct argo_ring_info *ring_info;
> > +
> > +    ASSERT(rw_is_locked(&d->argo->lock));
> > +
> > +    id.addr.port = port;
> > +    id.addr.domain_id = d->domain_id;
> > +    id.partner = partner_id;
> > +
> > +    ring_info = argo_ring_find_info(d, &id);
> > +    if ( ring_info && (partner_cookie == ring_info->partner_cookie) )
> > +        return ring_info;
>
> Such a cookie makes mismatches unlikely, but it doesn't exclude
> them. If there are other checks, is the cookie useful at all?

Yes, I think so and it's proved useful elsewhere in the second
version of the series: it helps avoid sending signals to incorrect
domains that may not be argo-enabled.

> > @@ -813,6 +1318,29 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >          rc = argo_unregister_ring(d, ring_hnd);
> >          break;
> >      }
> > +    case ARGO_MESSAGE_OP_sendv:
> > +    {
> > +        argo_send_addr_t send_addr;
> > +        uint32_t niov = arg3;
> > +        uint32_t message_type = arg4;
>
> At the example of these (perhaps I've again overlooked earlier
> instances), what about the upper halves on 64-bit? Given the
> rather generic interface of the actual hypercall, I don't think it
> is a good idea to ignore the bits. The situation is different for
> the "cmd" parameter, which is uniformly 32-bit for all sub-ops.

ack.

> Talking of "cmd" and its type: In case it wasn't said by anyone
> else yet, please use unsigned types wherever negative values
> are impossible.
>
> > +        XEN_GUEST_HANDLE_PARAM(argo_send_addr_t) send_addr_hnd =
> > +            guest_handle_cast(arg1, argo_send_addr_t);
> > +        XEN_GUEST_HANDLE_PARAM(argo_iov_t) iovs =
> > +            guest_handle_cast(arg2, argo_iov_t);
> > +
> > +        if ( unlikely(!guest_handle_okay(send_addr_hnd, 1)) )
> > +            break;
> > +        rc = copy_from_guest_errno(&send_addr, send_addr_hnd, 1);
> > +        if ( rc )
> > +            break;
> > +
> > +        send_addr.src.domain_id = d->domain_id;
>
> What use is the field if you override it like this?

ack, have switched to a correct match check in v2.

> I don't think I've found any checking of this field to be zero, to
> allow for future re-use.

ack.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 16/25] argo: implement the notify op
  2018-12-13 14:06   ` Jan Beulich
@ 2018-12-20  6:12     ` Christopher Clark
  2018-12-20  8:39       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  6:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Thu, Dec 13, 2018 at 6:06 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> > +static uint32_t
> > +argo_ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    argo_ring_t ring;
> > +    int32_t ret;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    ring.len = ring_info->len;
> > +    if ( !ring.len )
> > +        return 0;
> > +
> > +    ring.tx_ptr = ring_info->tx_ptr;
> > +
> > +    if ( argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr) )
> > +        return 0;
> > +
> > +    argo_dprintk("argo_ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
> > +                 ring.tx_ptr, ring.rx_ptr);
> > +
> > +    if ( ring.rx_ptr == ring.tx_ptr )
> > +        return ring.len - sizeof(struct argo_ring_message_header);
> > +
> > +    ret = ring.rx_ptr - ring.tx_ptr;
> > +    if ( ret < 0 )
> > +        ret += ring.len;
>
> Seeing these two if()-s - how is an empty ring distinguished from
> a completely full one? I'm getting the impression that
> ring.rx_ptr == ring.tx_ptr in both cases.

The subtraction from ring.len above is missing an additional subtraction of
ARGO_ROUNDUP(1), which doesn't help reasoning about this. (Fixed in v2.)

If rx_ptr == tx_ptr, then the ring is empty. The ring insertion
functions won't allow filling the ring, and I've added more comments
in the v2 code to explain.

> > +    ret -= sizeof(struct argo_ring_message_header);
> > +    ret -= ARGO_ROUNDUP(1);
>
> Wouldn't you instead better round ret to a suitable multiple of
> whatever granularity you try to arrange for here? Otherwise
> what is this extra subtraction supposed to do?

re: subtraction, have added new comment:
/*
 * The maximum size payload for a message that will be accepted is:
 * (the available space between the ring indexes)
 *    minus (space for a message header)
 *    minus (space for one message slot)
 * since argo_ringbuf_insert requires that one message slot be left
 * unfilled, to avoid filling the ring to capacity and confusing a full
 * ring with an empty one.
 */

re: rounding: Possibly. Not sure. In practice, both sides are
updating the indexes in quantized steps matching the
ARGO_ROUNDUP unit. Not sure it needs to change.

>
> > @@ -627,6 +679,43 @@ argo_pending_remove_all(struct argo_ring_info *ring_info)
> >      }
> >  }
> >
> > +static void
> > +argo_pending_notify(struct hlist_head *to_notify)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct argo_pending_ent *pending_ent;
> > +
> > +    ASSERT(rw_is_locked(&argo_lock));
> > +
> > +    hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
> > +    {
> > +        hlist_del(&pending_ent->node);
> > +        argo_signal_domid(pending_ent->id);
> > +        xfree(pending_ent);
> > +    }
> > +}
> > +
> > +static void
> > +argo_pending_find(const struct domain *d, struct argo_ring_info *ring_info,
> > +                  uint32_t payload_space, struct hlist_head *to_notify)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct argo_pending_ent *ent;
> > +
> > +    ASSERT(rw_is_locked(&d->argo->lock));
> > +
> > +    spin_lock(&ring_info->lock);
> > +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> > +    {
> > +        if ( payload_space >= ent->len )
> > +        {
> > +            hlist_del(&ent->node);
> > +            hlist_add_head(&ent->node, to_notify);
> > +        }
> > +    }
>
> So if there's space available to fit e.g. just the first pending entry,
> you'd continue the loop and also signal all others, provided their
> lengths aren't too big? What good does producing such a burst of
> notifications do, when only one of the interested parties is
> actually going to be able to put something on the ring?

Added new comment:
/*
 * TODO: Current policy here is to signal _all_ of the waiting domains
 *       interested in sending a message of size less than payload_space.
 *
 * This is likely to be suboptimal, since once one of them has added
 * their message to the ring, there may well be insufficient room
 * available for any of the others to transmit, meaning that they were
 * woken in vain, which created extra work just to requeue their wait.
 *
 * Retain this simple policy for now since it at least avoids starving a
 * domain of available space notifications because of a policy that only
 * notified other domains instead. Improvement may be possible;
 * investigation required.
 */

> > +typedef struct argo_ring_data
> > +{
> > +    uint64_t magic;
>
> What is this good for?

New comment added:
/*
 * Contents of the 'magic' field are inspected to verify that they contain
 * an expected value before the hypervisor will perform writes into this
 * structure in guest-supplied memory.
 */

>
> > @@ -179,6 +214,33 @@ struct argo_ring_message_header
> >   */
> >  #define ARGO_MESSAGE_OP_sendv               5
> >
> > +/*
> > + * ARGO_MESSAGE_OP_notify
> > + *
> > + * Asks Xen for information about other rings in the system.
> > + *
> > + * ent->ring is the argo_addr_t of the ring you want information on.
> > + * Uses the same ring matching rules as ARGO_MESSAGE_OP_sendv.
> > + *
> > + * ent->space_required : if this field is not null then Xen will check
> > + * that there is space in the destination ring for this many bytes of  payload.
> > + * If sufficient space is available, it will set ARGO_RING_DATA_F_SUFFICIENT
> > + * and CANCEL any pending notification for that ent->ring; otherwise it
> > + * will schedule a notification event and the flag will not be set.
> > + *
> > + * These flags are set by Xen when notify replies:
> > + * ARGO_RING_DATA_F_EMPTY       ring is empty
> > + * ARGO_RING_DATA_F_PENDING     notify event is pending - * don't rely on this *
> > + * ARGO_RING_DATA_F_SUFFICIENT  sufficient space for space_required is there
> > + * ARGO_RING_DATA_F_EXISTS      ring exists
> > + *
> > + * arg1: XEN_GUEST_HANDLE(argo_ring_data_t) ring_data (may be NULL)
> > + * arg2: NULL
> > + * arg3: 0 (ZERO)
> > + * arg4: 0 (ZERO)
>
> Another observation I probably should have made earlier: You
> don't check that the NULL/ZERO specified argument are indeed
> so. Just like for padding fields, please do.

ack, thanks.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
  2018-12-13 14:16   ` Jan Beulich
@ 2018-12-20  6:20     ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  6:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Thu, Dec 13, 2018 at 6:16 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> > * x86 PV domains are notified via event channel.
> >
> > PV guests are known to have the event channel software present in the guest
> > kernel, so it is fine to depend on and use it.
> >
> > * x86 HVM domains and all ARM domains are notified via VIRQ.
> >
> > The intent is to remove the requirement for event channel software to be
> > installed within these guests in order to use Argo. VIRQ signalling is also
> > the method that has been in use for the longest period with this hypercall
> > in both XenClient and OpenXT.
>
> I'm afraid I don't follow: send_guest_global_virq() uses, well,
> evtchn_port_set_pending(), just like evtchn_send() does.
> Therefore how does sending a vIRQ help with a guest without
> event channel awareness?

On this topic, signal delivery to guests, I'm simplifying the next version of
the patch series: I'm just going to use VIRQs.

It doesn't remove the dependency on event channel software in the guest,
and it doesn't optimize efficiency of notifications with HVM guests
but I'd like to come back and address that in a subsequent patch once
this series has been accepted. It'll follow from the explanation
that James's message provides.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume
  2018-12-13 14:26   ` Jan Beulich
@ 2018-12-20  6:25     ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-20  6:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

On Thu, Dec 13, 2018 at 6:26 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 01.12.18 at 02:33, <christopher.w.clark@gmail.com> wrote:
> > so that the guest may re-register the rings on resume with current mappings.
>
> Is this something guests really need help with, rather than managing
> it on their own? What does "current mappings" here mean, i.e. why
> do rings need re-registration in the first place?

My understanding is that the gfn->mfn mapping is not necessarily
stable across entry and exit to host S4, suspend to disk, so the
rings need to be torn down before suspend to stop further writes
into those pages after resume.
When the guest gets the notification after resume, it can
re-register the rings with its list of gfns, which can then
be re-translated into the (possibly) new mfns needed for the ring.

> > +void
> > +argo_resume(struct domain *d)
> > +{
> > +    bool send_wakeup;
> > +
> > +    if ( !d )
> > +        return;
> > +
> > +    if ( !get_domain(d) )
> > +        return;
> > +
> > +    read_lock(&argo_lock);
> > +
> > +    read_lock(&d->argo->lock);
> > +    send_wakeup = ( d->argo->ring_count > 0 );
> > +    read_unlock(&d->argo->lock);
> > +
> > +    if ( send_wakeup )
> > +        argo_signal_domain(d);
> > +
> > +    read_unlock(&argo_lock);
> > +
> > +    put_domain(d);
> > +}
>
> domain_resume() also gets called from domain_soft_reset(). Do
> you really want such handling in that case as well, when after a
> soft-reset the domain is supposed to be "blank"?

Thanks for the pointer to soft reset: I've added implementation
for this to the next version of the patch series, and it'll be fine
with resume then.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-20  5:29     ` Christopher Clark
@ 2018-12-20  8:29       ` Jan Beulich
  2018-12-21  1:25         ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-20  8:29 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 20.12.18 at 06:29, <christopher.w.clark@gmail.com> wrote:
> On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> > +static int
>> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
>> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
>> > +                    uint32_t len)
>> > +{
>> > +    int i;
>> > +    int ret = 0;
>> > +
>> > +    if ( (npage << PAGE_SHIFT) < len )
>> > +        return -EINVAL;
>> > +
>> > +    if ( ring_info->mfns )
>> > +    {
>> > +        /*
>> > +         * Ring already existed. Check if it's the same ring,
>> > +         * i.e. same number of pages and all translated gpfns still
>> > +         * translating to the same mfns
>> > +         */
>>
>> This comment makes me wonder whether the translations are
>> permitted to change at other times. If so I'm not sure what
>> value verification here has. If not, this probably would want to
>> be debugging-only code.
> 
> My understanding is that the gfn->mfn translation is not necessarily stable
> across entry and exit from host power state S4, suspend to disk.

How would that be? It's not stable across guest migration (or
its non-live save/restore equivalent), but how would things
change across S3? And there's no support for S4 (and I can't
see it appearing any time soon).

>> > +static struct argo_ring_info *
>> > +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
>> > +{
>> > +    uint16_t hash;
>> > +    struct hlist_node *node;
>>
>> const?
> 
> I couldn't determine exactly what you were pointing towards with this one.
> I've applied 'const' in a lot further place in the next version; please
> let me know if I've missed where you intended.

This is a pretty general rule: const should be applied to pointer
target types whenever no modification is intended, to make
this read-only aspect very obvious (and force people to think
twice if they alter such a property).

>> > +    uint64_t dst_domain_cookie = 0;
>> > +
>> > +    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
>> > +        return -EINVAL;
>>
>> Why? You don't store the handle for later use (and you shouldn't).
>> If there really is a need for a full page's worth of memory, it
>> would better be passed in as GFN.
> 
> I've added this comment for this behaviour in v2:
> 
> +    /*
> +     * Verify the alignment of the ring data structure supplied with the
> +     * understanding that the ring handle supplied points to the same memory as
> +     * the first entry in the array of pages provided via pg_descr_hnd, where
> +     * the head of the ring will reside.
> +     * See argo_update_tx_ptr where the location of the tx_ptr is accessed at a
> +     * fixed offset from head of the first page in the mfn array.
> +     */

Well, this then suggests that you don't want to verify alignment,
but instead you want to verify addresses match.

>> > @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>> >
>> >      switch (cmd)
>> >      {
>> > +    case ARGO_MESSAGE_OP_register_ring:
>> > +    {
>> > +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
>> > +            guest_handle_cast(arg1, argo_ring_t);
>> > +        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
>> > +            guest_handle_cast(arg2, argo_pfn_t);
>> > +        uint32_t npage = arg3;
>> > +        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
>> > +
>> > +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>> > +            break;
>>
>> I don't understand the need for this and ...
>>
>> > +        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
>> > +        {
>> > +            rc = -EINVAL;
>> > +            break;
>> > +        }
>> > +        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>> > +            break;
>>
>> ... perhaps also this, when you use copy_from_guest() upon access.
> 
> This is the one piece of feedback on version 1 of this series that I haven't
> taken the time to address yet. The code is evidently safe, with only a possible
> performance decrease a concern, so I'd like to study it further before removing
> any of the checks rather than delay posting version two of this series.

Hmm, re-posting without all comments addressed is not ideal.
It means extra work for the reviewers (unless you've clearly
marked respective code fragments with some sort of TBD
comment).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2018-12-20  5:58     ` Christopher Clark
@ 2018-12-20  8:33       ` Jan Beulich
  2019-01-04  8:13         ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-20  8:33 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 20.12.18 at 06:58, <christopher.w.clark@gmail.com> wrote:
> On Wed, Dec 12, 2018 at 3:53 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
>> > +static struct argo_ring_info *
>> > +argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
>> > +                             domid_t partner_id, uint64_t partner_cookie)
>> > +{
>> > +    argo_ring_id_t id;
>> > +    struct argo_ring_info *ring_info;
>> > +
>> > +    ASSERT(rw_is_locked(&d->argo->lock));
>> > +
>> > +    id.addr.port = port;
>> > +    id.addr.domain_id = d->domain_id;
>> > +    id.partner = partner_id;
>> > +
>> > +    ring_info = argo_ring_find_info(d, &id);
>> > +    if ( ring_info && (partner_cookie == ring_info->partner_cookie) )
>> > +        return ring_info;
>>
>> Such a cookie makes mismatches unlikely, but it doesn't exclude
>> them. If there are other checks, is the cookie useful at all?
> 
> Yes, I think so and it's proved useful elsewhere in the second
> version of the series: it helps avoid sending signals to incorrect
> domains that may not be argo-enabled.

"It helps avoid" still isn't "it allows to avoid", i.e. it still sounds like
an approach reducing likelihood instead of one excluding mistakes
altogether.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 16/25] argo: implement the notify op
  2018-12-20  6:12     ` Christopher Clark
@ 2018-12-20  8:39       ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-20  8:39 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 20.12.18 at 07:12, <christopher.w.clark@gmail.com> wrote:
> On Thu, Dec 13, 2018 at 6:06 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
>> > +static uint32_t
>> > +argo_ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
>> > +{
>> > +    argo_ring_t ring;
>> > +    int32_t ret;
>> > +
>> > +    ASSERT(spin_is_locked(&ring_info->lock));
>> > +
>> > +    ring.len = ring_info->len;
>> > +    if ( !ring.len )
>> > +        return 0;
>> > +
>> > +    ring.tx_ptr = ring_info->tx_ptr;
>> > +
>> > +    if ( argo_ringbuf_get_rx_ptr(ring_info, &ring.rx_ptr) )
>> > +        return 0;
>> > +
>> > +    argo_dprintk("argo_ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
>> > +                 ring.tx_ptr, ring.rx_ptr);
>> > +
>> > +    if ( ring.rx_ptr == ring.tx_ptr )
>> > +        return ring.len - sizeof(struct argo_ring_message_header);
>> > +
>> > +    ret = ring.rx_ptr - ring.tx_ptr;
>> > +    if ( ret < 0 )
>> > +        ret += ring.len;
>>
>> Seeing these two if()-s - how is an empty ring distinguished from
>> a completely full one? I'm getting the impression that
>> ring.rx_ptr == ring.tx_ptr in both cases.
> 
> The subtraction from ring.len above is missing an additional subtraction of
> ARGO_ROUNDUP(1), which doesn't help reasoning about this. (Fixed in v2.)
> 
> If rx_ptr == tx_ptr, then the ring is empty. The ring insertion
> functions won't allow filling the ring, and I've added more comments
> in the v2 code to explain.
> 
>> > +    ret -= sizeof(struct argo_ring_message_header);
>> > +    ret -= ARGO_ROUNDUP(1);
>>
>> Wouldn't you instead better round ret to a suitable multiple of
>> whatever granularity you try to arrange for here? Otherwise
>> what is this extra subtraction supposed to do?
> 
> re: subtraction, have added new comment:
> /*
>  * The maximum size payload for a message that will be accepted is:
>  * (the available space between the ring indexes)
>  *    minus (space for a message header)
>  *    minus (space for one message slot)
>  * since argo_ringbuf_insert requires that one message slot be left
>  * unfilled, to avoid filling the ring to capacity and confusing a full
>  * ring with an empty one.
>  */
> 
> re: rounding: Possibly. Not sure. In practice, both sides are
> updating the indexes in quantized steps matching the
> ARGO_ROUNDUP unit. Not sure it needs to change.

Here you appear to talk about both sides being well behaved. Did
you also consider misbehaving partners?

>> > +typedef struct argo_ring_data
>> > +{
>> > +    uint64_t magic;
>>
>> What is this good for?
> 
> New comment added:
> /*
>  * Contents of the 'magic' field are inspected to verify that they contain
>  * an expected value before the hypervisor will perform writes into this
>  * structure in guest-supplied memory.
>  */

But this does not help understand what this verification is good
for (or what it guards against). This again looks to be a reduction
of likelihood of misbehavior, instead of its exclusion.

As things accumulate: Personally I'd consider it better to wait
with posting a new version until discussions have settled. At
this point I'm already uncertain whether it'll be worthwhile for
me to thoroughly look at v2, when I'm likely to re-encounter
things I've already commented on in v1.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
  2018-12-20  5:16     ` Christopher Clark
@ 2018-12-20  8:45       ` Jan Beulich
  2018-12-20 12:57       ` Roger Pau Monné
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-20  8:45 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 20.12.18 at 06:16, <christopher.w.clark@gmail.com> wrote:
> On Wed, Dec 12, 2018 at 8:03 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>
>> On Fri, Nov 30, 2018 at 05:32:46PM -0800, Christopher Clark wrote:
>> > Applied to both x86 and ARM headers.
>> >
>> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
>> > ---
>> >  xen/include/asm-arm/guest_access.h | 25 +++++++++++++++++++++++++
>> >  xen/include/asm-x86/guest_access.h | 29 +++++++++++++++++++++++++++++
>> >  xen/include/xen/guest_access.h     |  3 +++
>> >  3 files changed, 57 insertions(+)
>> >
>> > diff --git a/xen/include/asm-arm/guest_access.h 
> b/xen/include/asm-arm/guest_access.h
>> > index 224d2a0..7b6f89c 100644
>> > --- a/xen/include/asm-arm/guest_access.h
>> > +++ b/xen/include/asm-arm/guest_access.h
>> > @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t 
> ipa, void *buf,
>> >  #define __raw_copy_from_guest raw_copy_from_guest
>> >  #define __raw_clear_guest raw_clear_guest
>> >
>> > +#define raw_copy_from_guest_errno(dst, src, len)             \
>> > +    (raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
>> > +#define raw_copy_to_guest_errno(dst, src, len)               \
>> > +    (raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)
>>
>> Since the only error that you return is EFAULT, I don't really see the
>> point in adding all those helpers. You achieve exactly the same by
>> returning a boolean and doing the translation to EFAULT in the caller
>> if required.
>>
>> It might have been nice to have the copy to/from set of functions
>> return an error value, but adding a new set of helpers that have the
>> same functionality but just differ in the return value look
>> redundant.
> 
> It is true that there is redundancy with these -- but I think there are decent
> arguments in favour of taking these in:
> 
> * the errno-providing interface is just a better fit for almost every call site
> - which means less source code in total, that is easier to read.
> 
> * it is promoting good interface design for error handling:
>   return of error code.
> 
> * since these are in use within the uxen source code, it eases comparison and
>   work across both codebases - relevant for Argo, due to v4v.
> 
> I've rewritten the implementation of these for the second version of the patch
> series -- now much simpler -- and hopefully that will mitigate some of your
> concern about them.

Without having looked at their v2 forms, I continue to fully agree
with Roger here: There's no reason to introduce these flavors just
for argo to use. We've been doing fine with what we have, and if
you want to change things, then you'd want to do so everywhere.
That would then also eliminate the need for separate flavors: The
ones that are there would then simply behave along the lines of
the principles you outline above.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-20  5:41     ` Christopher Clark
@ 2018-12-20  8:51       ` Jan Beulich
  2018-12-20 12:52       ` Roger Pau Monné
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2018-12-20  8:51 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 20.12.18 at 06:41, <christopher.w.clark@gmail.com> wrote:
> On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>
>> On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>> > +static inline uint16_t
>> > +argo_hash_fn(const struct argo_ring_id *id)
>>
>> No need for the argo_ prefix for static functions, this is already an
>> argo specific file.
> 
> Although the compiler could live without the prefix, I'm finding it helpful to
> very easily determine that functions being used are not defined elsewhere
> within Xen; so I've left the prefix as is for version two of this series.

But you realize that this needlessly increases the string table
size as well as the volume of output going over the (normally
low bandwidth) serial line in case of an isuse?

>> > +/*
>> > + * Messages on the ring are padded to 128 bits
>> > + * Len here refers to the exact length of the data not including the
>> > + * 128 bit header. The message uses
>> > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
>> > + * Using typeof(a) make clear that this does not truncate any high-order bits.
>> > + */
>> > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
>>
>> Why not just use ROUNDUP?
>>
>> And in any case this shouldn't be on the public header IMO, since it's
>> not part of the interface AFAICT.
> 
> Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
> because it does need to be in the public header because it's used within the
> Linux device driver, and items in that public Xen header need the 'xen' prefix
> (so they now do).  Within the Linux code, it's used to choose a sensible ring
> size, and also used when manipulating the rx_ptr on the guest side.

It's at least questionable to put constructs into the public header
that the header itself doesn't use, and the naming of which may
not be what consumers would prefer. I can see the "single
central place" point though.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-20  5:41     ` Christopher Clark
  2018-12-20  8:51       ` Jan Beulich
@ 2018-12-20 12:52       ` Roger Pau Monné
  2018-12-21 23:05         ` Christopher Clark
  1 sibling, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2018-12-20 12:52 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > > +static inline uint16_t
> > > +argo_hash_fn(const struct argo_ring_id *id)
> >
> > No need for the argo_ prefix for static functions, this is already an
> > argo specific file.
> 
> Although the compiler could live without the prefix, I'm finding it helpful to
> very easily determine that functions being used are not defined elsewhere
> within Xen; so I've left the prefix as is for version two of this series.

Why do you care whether they are defined elsewhere in Xen? The scope
of static functions is limited to the translation unit anyway.

> > > +    *mfn = get_gfn_unshare(d, pfn, &p2mt);
> >
> > Is this supposed to work for PV guests?
> 
> Yes -- and they seem to work OK. Am I missing something?

No, my fault, this should indeed work for both paging and non paging
assisted guests, sorry for the noise.

> > > +#else
> > > +    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > > +#endif
> > > +
> > > +    if ( !mfn_valid(*mfn) )
> > > +        ret = -EINVAL;
> > > +#ifdef CONFIG_X86
> > > +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > > +        ret = -EAGAIN;
> > > +#endif
> > > +    else if ( (p2mt != p2m_ram_rw) ||
> > > +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > > +        ret = -EINVAL;
> > > +
> > > +#ifdef CONFIG_X86
> > > +    put_gfn(d, pfn);
> >
> > If you do this put_gfn here, by the time you check that the gfn -> mfn
> > matches your expectations the guest might have somehow changed the gfn
> > -> mfn mapping already (for example by ballooning down memory?)
> 
> If the guest does that, I think it only harms itself. If for some reason
> a memory access is denied, then the op would just fail. I don't think
> there's a more serious consequence to be worried about.

Then I wonder why you need such check in any case if the code can
handle such cases, the more than the check itself is racy.

> Above, if we're going to use the mfn, then we've just done a successful:
>     get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)
> 
> which should hold it in a state that we're ok with until we're done
> with it -- see put_page_and_type in argo_ring_remove_mfns.
> 
> > > +        /* W(L2) protects all the elements of the domain's ring_info */
> > > +        write_lock(&d->argo->lock);
> >
> > I don't understand this W(L2) nomenclature, is this explain somewhere?
> 
> Yes, sort of. Lock "L2" is the per-domain argo lock, identified in a
> comment near the top of the file. It's a read-write lock, so 'W' means:
> take the write lock on it.
> 
> > Also there's no such comment when you take the global argo_lock above.
> 
> L2 covers more interesting work than L1, which is why there are more
> comments pertaining to it than L1.

I would add such comments about which locks protect what items to the
declaration of the locks, rather than the usage place. I don't see a
lot of value in the comments there unless they maybe describe an
exception or a corner case, but that might just be my taste.

> > > +/*
> > > + * Messages on the ring are padded to 128 bits
> > > + * Len here refers to the exact length of the data not including the
> > > + * 128 bit header. The message uses
> > > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> > > + * Using typeof(a) make clear that this does not truncate any high-order bits.
> > > + */
> > > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
> >
> > Why not just use ROUNDUP?
> >
> > And in any case this shouldn't be on the public header IMO, since it's
> > not part of the interface AFAICT.
> 
> Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
> because it does need to be in the public header because it's used within the
> Linux device driver, and items in that public Xen header need the 'xen' prefix
> (so they now do).  Within the Linux code, it's used to choose a sensible ring
> size, and also used when manipulating the rx_ptr on the guest side.

I'm quite sure Linux (or any other OS) will have a roundup helper, or
if there's indeed an OS without a roundup helper it should be added to
the generic OS code. There's nothing Xen or ARGO specific in this
roundup helper, hence I see no need to add it to the public header.

I think you should instead:

#define XEN_ARGO_MESSAGE_SIZE 0xf

Or some such and use that value with the OS roundup helper.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy
  2018-12-20  5:16     ` Christopher Clark
  2018-12-20  8:45       ` Jan Beulich
@ 2018-12-20 12:57       ` Roger Pau Monné
  1 sibling, 0 replies; 111+ messages in thread
From: Roger Pau Monné @ 2018-12-20 12:57 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, James McKenzie, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Wed, Dec 19, 2018 at 09:16:38PM -0800, Christopher Clark wrote:
> On Wed, Dec 12, 2018 at 8:03 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Nov 30, 2018 at 05:32:46PM -0800, Christopher Clark wrote:
> > > Applied to both x86 and ARM headers.
> > >
> > > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > > ---
> > >  xen/include/asm-arm/guest_access.h | 25 +++++++++++++++++++++++++
> > >  xen/include/asm-x86/guest_access.h | 29 +++++++++++++++++++++++++++++
> > >  xen/include/xen/guest_access.h     |  3 +++
> > >  3 files changed, 57 insertions(+)
> > >
> > > diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> > > index 224d2a0..7b6f89c 100644
> > > --- a/xen/include/asm-arm/guest_access.h
> > > +++ b/xen/include/asm-arm/guest_access.h
> > > @@ -24,6 +24,11 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
> > >  #define __raw_copy_from_guest raw_copy_from_guest
> > >  #define __raw_clear_guest raw_clear_guest
> > >
> > > +#define raw_copy_from_guest_errno(dst, src, len)             \
> > > +    (raw_copy_from_guest((dst), (src), (len)) ? -EFAULT : 0)
> > > +#define raw_copy_to_guest_errno(dst, src, len)               \
> > > +    (raw_copy_to_guest((dst), (src), (len)) ? -EFAULT : 0)
> >
> > Since the only error that you return is EFAULT, I don't really see the
> > point in adding all those helpers. You achieve exactly the same by
> > returning a boolean and doing the translation to EFAULT in the caller
> > if required.
> >
> > It might have been nice to have the copy to/from set of functions
> > return an error value, but adding a new set of helpers that have the
> > same functionality but just differ in the return value look
> > redundant.
> 
> It is true that there is redundancy with these -- but I think there are decent
> arguments in favour of taking these in:
> 
> * the errno-providing interface is just a better fit for almost every call site
> - which means less source code in total, that is easier to read.
> 
> * it is promoting good interface design for error handling:
>   return of error code.

Then I'm afraid that you will have to change the current copy to/from
helpers to return an error code and fix all the callers. I don't think
it's acceptable to have this duplication of functionality in the code
base.

IMO having such redundancy creates confusion, specially with new
developers, so if returning an error code is much better and provides
cleaner code it should be argued for the whole Xen code base, and a
global switch should be made.

> * since these are in use within the uxen source code, it eases comparison and
>   work across both codebases - relevant for Argo, due to v4v.
> 
> I've rewritten the implementation of these for the second version of the patch
> series -- now much simpler -- and hopefully that will mitigate some of your
> concern about them.

My issue is not so much with the implementation, but rather the
redundancy.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-20  8:29       ` Jan Beulich
@ 2018-12-21  1:25         ` Christopher Clark
  2018-12-21  7:28           ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-21  1:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 20.12.18 at 06:29, <christopher.w.clark@gmail.com> wrote:
> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> > +static int
> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> >> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> >> > +                    uint32_t len)
> >> > +{
> >> > +    int i;
> >> > +    int ret = 0;
> >> > +
> >> > +    if ( (npage << PAGE_SHIFT) < len )
> >> > +        return -EINVAL;
> >> > +
> >> > +    if ( ring_info->mfns )
> >> > +    {
> >> > +        /*
> >> > +         * Ring already existed. Check if it's the same ring,
> >> > +         * i.e. same number of pages and all translated gpfns still
> >> > +         * translating to the same mfns
> >> > +         */
> >>
> >> This comment makes me wonder whether the translations are
> >> permitted to change at other times. If so I'm not sure what
> >> value verification here has. If not, this probably would want to
> >> be debugging-only code.
> >
> > My understanding is that the gfn->mfn translation is not necessarily stable
> > across entry and exit from host power state S4, suspend to disk.
>
> How would that be? It's not stable across guest migration (or
> its non-live save/restore equivalent),

Right, that's clear.

> but how would things change across S3?

I don't think that they do change in that case.

From studying the code involved above, a related item: the guest runs the same
suspend and resume kernel code before entering into/exiting from either guest
S3 or S4, so the guest kernel resume code needs to re-register the rings, to
cover the case where it is coming up in an environment where they were dropped
- so that's what it does.

This relates to the code section above: if guest entry to S3 is aborted at the
final step (eg. error or platform refuses, eg. maybe a physical device
interaction with passthrough) then the hypervisor has not torn down the rings,
the guest remains running within the same domain, and the guest resume logic
runs, which runs through re-registration for all its rings. The check in the
logic above allows the existing ring mappings within the hypervisor to be
preserved.

I'm not certain that is an enormous win though; it looks like it would be ok
to drop that logic and reestablish the mappings as the ring is used, as per
other cases.

> And there's no support for S4 (and I can't see it appearing any time soon).

OK. oh well.

>
> >> > +static struct argo_ring_info *
> >> > +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> >> > +{
> >> > +    uint16_t hash;
> >> > +    struct hlist_node *node;
> >>
> >> const?
> >
> > I couldn't determine exactly what you were pointing towards with this one.
> > I've applied 'const' in a lot further place in the next version; please
> > let me know if I've missed where you intended.
>
> This is a pretty general rule: const should be applied to pointer
> target types whenever no modification is intended, to make
> this read-only aspect very obvious (and force people to think
> twice if they alter such a property).
>
> >> > +    uint64_t dst_domain_cookie = 0;
> >> > +
> >> > +    if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> >> > +        return -EINVAL;
> >>
> >> Why? You don't store the handle for later use (and you shouldn't).
> >> If there really is a need for a full page's worth of memory, it
> >> would better be passed in as GFN.
> >
> > I've added this comment for this behaviour in v2:
> >
> > +    /*
> > +     * Verify the alignment of the ring data structure supplied with the
> > +     * understanding that the ring handle supplied points to the same memory as
> > +     * the first entry in the array of pages provided via pg_descr_hnd, where
> > +     * the head of the ring will reside.
> > +     * See argo_update_tx_ptr where the location of the tx_ptr is accessed at a
> > +     * fixed offset from head of the first page in the mfn array.
> > +     */
>
> Well, this then suggests that you don't want to verify alignment,
> but instead you want to verify addresses match.

ack. I'll take a look at doing that.

>
> >> > @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >> >
> >> >      switch (cmd)
> >> >      {
> >> > +    case ARGO_MESSAGE_OP_register_ring:
> >> > +    {
> >> > +        XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> >> > +            guest_handle_cast(arg1, argo_ring_t);
> >> > +        XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> >> > +            guest_handle_cast(arg2, argo_pfn_t);
> >> > +        uint32_t npage = arg3;
> >> > +        bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> >> > +
> >> > +        if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> >> > +            break;
> >>
> >> I don't understand the need for this and ...
> >>
> >> > +        if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> >> > +        {
> >> > +            rc = -EINVAL;
> >> > +            break;
> >> > +        }
> >> > +        if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> >> > +            break;
> >>
> >> ... perhaps also this, when you use copy_from_guest() upon access.
> >
> > This is the one piece of feedback on version 1 of this series that I haven't
> > taken the time to address yet. The code is evidently safe, with only a possible
> > performance decrease a concern, so I'd like to study it further before removing
> > any of the checks rather than delay posting version two of this series.
>
> Hmm, re-posting without all comments addressed is not ideal.
> It means extra work for the reviewers (unless you've clearly
> marked respective code fragments with some sort of TBD
> comment).

Understood.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-21  1:25         ` Christopher Clark
@ 2018-12-21  7:28           ` Jan Beulich
  2018-12-21  8:16             ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-21  7:28 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 21.12.18 at 02:25, <christopher.w.clark@gmail.com> wrote:
> On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 20.12.18 at 06:29, <christopher.w.clark@gmail.com> wrote:
>> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>
>> >> > +static int
>> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
>> >> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
>> >> > +                    uint32_t len)
>> >> > +{
>> >> > +    int i;
>> >> > +    int ret = 0;
>> >> > +
>> >> > +    if ( (npage << PAGE_SHIFT) < len )
>> >> > +        return -EINVAL;
>> >> > +
>> >> > +    if ( ring_info->mfns )
>> >> > +    {
>> >> > +        /*
>> >> > +         * Ring already existed. Check if it's the same ring,
>> >> > +         * i.e. same number of pages and all translated gpfns still
>> >> > +         * translating to the same mfns
>> >> > +         */
>> >>
>> >> This comment makes me wonder whether the translations are
>> >> permitted to change at other times. If so I'm not sure what
>> >> value verification here has. If not, this probably would want to
>> >> be debugging-only code.
>> >
>> > My understanding is that the gfn->mfn translation is not necessarily stable
>> > across entry and exit from host power state S4, suspend to disk.

Now I'm afraid there's some confusion here: Originally you've
said "host".

>> How would that be? It's not stable across guest migration (or
>> its non-live save/restore equivalent),
> 
> Right, that's clear.
> 
>> but how would things change across S3?
> 
> I don't think that they do change in that case.
> 
> From studying the code involved above, a related item: the guest runs the same
> suspend and resume kernel code before entering into/exiting from either guest
> S3 or S4, so the guest kernel resume code needs to re-register the rings, to
> cover the case where it is coming up in an environment where they were dropped
> - so that's what it does.
> 
> This relates to the code section above: if guest entry to S3 is aborted at the
> final step (eg. error or platform refuses, eg. maybe a physical device
> interaction with passthrough) then the hypervisor has not torn down the rings,
> the guest remains running within the same domain, and the guest resume logic
> runs, which runs through re-registration for all its rings. The check in the
> logic above allows the existing ring mappings within the hypervisor to be
> preserved.

Yet now you suddenly talk about guest S3.

>> And there's no support for S4 (and I can't see it appearing any time soon).
> 
> OK. oh well.

Considering the original "host" context, my response here was
relating to host S4. Guest S4 ought to be functional (as being
mostly a guest kernel function anyway).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-21  7:28           ` Jan Beulich
@ 2018-12-21  8:16             ` Christopher Clark
  2018-12-21  8:53               ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-21  8:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 21.12.18 at 02:25, <christopher.w.clark@gmail.com> wrote:
> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 20.12.18 at 06:29, <christopher.w.clark@gmail.com> wrote:
> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>
> >> >> > +static int
> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> >> >> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> >> >> > +                    uint32_t len)
> >> >> > +{
> >> >> > +    int i;
> >> >> > +    int ret = 0;
> >> >> > +
> >> >> > +    if ( (npage << PAGE_SHIFT) < len )
> >> >> > +        return -EINVAL;
> >> >> > +
> >> >> > +    if ( ring_info->mfns )
> >> >> > +    {
> >> >> > +        /*
> >> >> > +         * Ring already existed. Check if it's the same ring,
> >> >> > +         * i.e. same number of pages and all translated gpfns still
> >> >> > +         * translating to the same mfns
> >> >> > +         */
> >> >>
> >> >> This comment makes me wonder whether the translations are
> >> >> permitted to change at other times. If so I'm not sure what
> >> >> value verification here has. If not, this probably would want to
> >> >> be debugging-only code.
> >> >
> >> > My understanding is that the gfn->mfn translation is not necessarily stable
> >> > across entry and exit from host power state S4, suspend to disk.
>
> Now I'm afraid there's some confusion here: Originally you've
> said "host".
>
> >> How would that be? It's not stable across guest migration (or
> >> its non-live save/restore equivalent),
> >
> > Right, that's clear.
> >
> >> but how would things change across S3?
> >
> > I don't think that they do change in that case.
> >
> > From studying the code involved above, a related item: the guest runs the same
> > suspend and resume kernel code before entering into/exiting from either guest
> > S3 or S4, so the guest kernel resume code needs to re-register the rings, to
> > cover the case where it is coming up in an environment where they were dropped
> > - so that's what it does.
> >
> > This relates to the code section above: if guest entry to S3 is aborted at the
> > final step (eg. error or platform refuses, eg. maybe a physical device
> > interaction with passthrough) then the hypervisor has not torn down the rings,
> > the guest remains running within the same domain, and the guest resume logic
> > runs, which runs through re-registration for all its rings. The check in the
> > logic above allows the existing ring mappings within the hypervisor to be
> > preserved.
>
> Yet now you suddenly talk about guest S3.

Well, the context is that you did just ask about S3, without
specifying host or guest. Host S3 doesn't involve much at all, so I
went and studied the code in both the Linux driver and the hypervisor
to determine what it does in the case of guest S3, and then replied
with the above since it is relevant to the code in question. I hope I
was clear about referring to guest S3 above in my last reply.

That logic aims to make ring registration idempotent, to avoid the
teardown of established mappings of the ring pages in the case where
doing so isn't needed.

> >> And there's no support for S4 (and I can't see it appearing any time soon).
> >
> > OK. oh well.
>
> Considering the original "host" context, my response here was
> relating to host S4. Guest S4 ought to be functional (as being
> mostly a guest kernel function anyway).

ack.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-21  8:16             ` Christopher Clark
@ 2018-12-21  8:53               ` Jan Beulich
  2018-12-21 23:28                 ` Christopher Clark
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2018-12-21  8:53 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 21.12.18 at 09:16, <christopher.w.clark@gmail.com> wrote:
> On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 21.12.18 at 02:25, <christopher.w.clark@gmail.com> wrote:
>> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>
>> >> >>> On 20.12.18 at 06:29, <christopher.w.clark@gmail.com> wrote:
>> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >> >>
>> >> >> > +static int
>> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
>> >> >> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
>> >> >> > +                    uint32_t len)
>> >> >> > +{
>> >> >> > +    int i;
>> >> >> > +    int ret = 0;
>> >> >> > +
>> >> >> > +    if ( (npage << PAGE_SHIFT) < len )
>> >> >> > +        return -EINVAL;
>> >> >> > +
>> >> >> > +    if ( ring_info->mfns )
>> >> >> > +    {
>> >> >> > +        /*
>> >> >> > +         * Ring already existed. Check if it's the same ring,
>> >> >> > +         * i.e. same number of pages and all translated gpfns still
>> >> >> > +         * translating to the same mfns
>> >> >> > +         */
>> >> >>
>> >> >> This comment makes me wonder whether the translations are
>> >> >> permitted to change at other times. If so I'm not sure what
>> >> >> value verification here has. If not, this probably would want to
>> >> >> be debugging-only code.
>> >> >
>> >> > My understanding is that the gfn->mfn translation is not necessarily stable
>> >> > across entry and exit from host power state S4, suspend to disk.

Note this ^^^ (and see below).

>> Now I'm afraid there's some confusion here: Originally you've
>> said "host".
>>
>> >> How would that be? It's not stable across guest migration (or
>> >> its non-live save/restore equivalent),
>> >
>> > Right, that's clear.
>> >
>> >> but how would things change across S3?
>> >
>> > I don't think that they do change in that case.
>> >
>> > From studying the code involved above, a related item: the guest runs the same
>> > suspend and resume kernel code before entering into/exiting from either guest
>> > S3 or S4, so the guest kernel resume code needs to re-register the rings, to
>> > cover the case where it is coming up in an environment where they were dropped
>> > - so that's what it does.
>> >
>> > This relates to the code section above: if guest entry to S3 is aborted at the
>> > final step (eg. error or platform refuses, eg. maybe a physical device
>> > interaction with passthrough) then the hypervisor has not torn down the rings,
>> > the guest remains running within the same domain, and the guest resume logic
>> > runs, which runs through re-registration for all its rings. The check in the
>> > logic above allows the existing ring mappings within the hypervisor to be
>> > preserved.
>>
>> Yet now you suddenly talk about guest S3.
> 
> Well, the context is that you did just ask about S3, without
> specifying host or guest.

I'm sorry to be picky, but no, I don't think I did. You did expicitly
say "host", making me further think only about that case.

> Host S3 doesn't involve much at all, so I
> went and studied the code in both the Linux driver and the hypervisor
> to determine what it does in the case of guest S3, and then replied
> with the above since it is relevant to the code in question. I hope I
> was clear about referring to guest S3 above in my last reply.
> 
> That logic aims to make ring registration idempotent, to avoid the
> teardown of established mappings of the ring pages in the case where
> doing so isn't needed.

You treat complexity in the kernel for complexity in the hypervisor.
I'm not sure this is appropriate, as I can't judge how much more
difficult it would be for the guest to look after itself. But let's look
at both cases again:
- For guest S3, afaik, the domain doesn't change, and hence
  memory assignment remains the same. No re-registration
  necessary then afaict.
- For guest S4, aiui, the domain gets destroyed and a new one
  built upon resume. Re-registration would be needed, but due
  to the domain re-construction no leftovers ought to exist in
  Xen.
Hence to me it would seem more natural to have the guest deal
with the situation, and have no extra logic for this in Xen. You
want the guest to re-register anyway, yet simply avoiding to
do so in the S3 case ought to be a single (or very few)
conditional(s), i.e. not a whole lot of complexity.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-20 12:52       ` Roger Pau Monné
@ 2018-12-21 23:05         ` Christopher Clark
  2019-01-04  8:57           ` Roger Pau Monné
  0 siblings, 1 reply; 111+ messages in thread
From: Christopher Clark @ 2018-12-21 23:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > > > +static inline uint16_t
> > > > +argo_hash_fn(const struct argo_ring_id *id)
> > >
> > > No need for the argo_ prefix for static functions, this is already an
> > > argo specific file.
> >
> > Although the compiler could live without the prefix, I'm finding it helpful to
> > very easily determine that functions being used are not defined elsewhere
> > within Xen; so I've left the prefix as is for version two of this series.
>
> Why do you care whether they are defined elsewhere in Xen? The scope
> of static functions is limited to the translation unit anyway.

ok, I'll remove the prefixes - you're right that I shouldn't care
whether they are defined elsewhere in Xen, and Jan's points about the
string table expansion and serial line bandwidth are true - I had not
considered those. Would adding a note to describe this reasoning to
the CODING_STYLE document be welcome?

> > > > +#else
> > > > +    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > > > +#endif
> > > > +
> > > > +    if ( !mfn_valid(*mfn) )
> > > > +        ret = -EINVAL;
> > > > +#ifdef CONFIG_X86
> > > > +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > > > +        ret = -EAGAIN;
> > > > +#endif
> > > > +    else if ( (p2mt != p2m_ram_rw) ||
> > > > +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > > > +        ret = -EINVAL;
> > > > +
> > > > +#ifdef CONFIG_X86
> > > > +    put_gfn(d, pfn);
> > >
> > > If you do this put_gfn here, by the time you check that the gfn -> mfn
> > > matches your expectations the guest might have somehow changed the gfn
> > > -> mfn mapping already (for example by ballooning down memory?)
> >
> > If the guest does that, I think it only harms itself. If for some reason
> > a memory access is denied, then the op would just fail. I don't think
> > there's a more serious consequence to be worried about.
>
> Then I wonder why you need such check in any case if the code can
> handle such cases, the more than the check itself is racy.

OK, so at the root of the question here is: does it matter what the p2m
type of the memory is at these points:

1) when the gfn is translated to mfn, at the time of ring registration

2) when the hypervisor writes into guest memory:
    - where the tx_ptr index is initialized in the register op
    - where ringbuf data is written in sendv
    - where ring description data is written in notify

or is having PGT_writable_page type and ownership by the domain
sufficient?

For 1), I think there's some use in saying no to a guest that has
supplied a region that appears misconfigured.

For 2), input would be appreciated. It currently works under the
assumption that a p2m type check is unnecessary, which is why the
put_gfn is where it is.

For further background context, here's my understanding of this section:

When the guest invokes the hypercall operation to register a ring, it
identifies the memory that it owns and wants the hypervisor to use by
supplying an array of gfns (or in v2, addresses which are shifted to
extract their gfns).

The hypervisor translates from gfns to mfns, using the translation that
exists at that time, and then refers to that memory internally by mfn
from there on out. This find_ring_mfn function is where the gfn->mfn
translation happens.  (The variable name does needs renaming from pfn,
as you noted - thanks.)

To do the translation from gfn to mfn, (on x86) it's using
get_gfn_unshare. That's doing three things:
* returns the mfn, if there is one.
* returns the p2m type of that memory.
* acquires a reference to that gfn, which needs to be dropped at some
  point.

The p2m type type check on the gfn in find_ring_mfn at that time is
possibly conservative, rejecting more types than perhaps it needs to,
but the type that it accepts (p2m_ram_rw) is sane. It is a validation of
the p2m type at that instant, intended to detect if the guest has
supplied memory to the ring register op that does not make sense for it
to use as a ring, as indicated by the current p2m type, and if so, fail
early, or indicate that a retry later is needed.

Then the get_page_and_type call is where the memory identified by the
mfn that was just obtained, gets locked to PGT_writable_page type, and
ownership fixed to its current owner domain, by adding to its reference
count.

Then the gfn reference count is dropped with the put_gfn call. This
means that the guest can elect to change the p2m type afterwards, if it
wants; (any change needs to be consistent with its domain ownership and
PGT_writable_page type though -- not sure if that constrains possible types).

That memory can have guest-supplied data written into it, either by the
domain owning the page itself, or in response to argo sendv operations
by other domains that are authorized to transmit into the ring.

Your note that the "check itself is racy": ie. that a change of p2m type
could occur immediately afterwards is true.

So: Do you think that a check on the current p2m type of the pages in
the ring is needed at the points where the hypervisor issues writes into
that ring memory?


> > Above, if we're going to use the mfn, then we've just done a successful:
> >     get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)
> >
> > which should hold it in a state that we're ok with until we're done
> > with it -- see put_page_and_type in argo_ring_remove_mfns.
> >
> > > > +        /* W(L2) protects all the elements of the domain's ring_info */
> > > > +        write_lock(&d->argo->lock);
> > >
> > > I don't understand this W(L2) nomenclature, is this explain somewhere?
> >
> > Yes, sort of. Lock "L2" is the per-domain argo lock, identified in a
> > comment near the top of the file. It's a read-write lock, so 'W' means:
> > take the write lock on it.
> >
> > > Also there's no such comment when you take the global argo_lock above.
> >
> > L2 covers more interesting work than L1, which is why there are more
> > comments pertaining to it than L1.
>
> I would add such comments about which locks protect what items to the
> declaration of the locks, rather than the usage place.

ack, and those comments are there, introduced earlier in the series.
There's a dedicated section with comments on locking near the top
(titled: "locking is organized as follows") and comments within the
argo_ring_info data structure definition for protection of each field.

> I don't see a
> lot of value in the comments there unless they maybe describe an
> exception or a corner case, but that might just be my taste.

ack. It's notable with that specific site that it's (intentionally,
necessarily) a write_lock that is being taken, rather than a
read_lock; maybe that's enough to fall under your exception
/ corner case condition. I can drop it if it's really unwanted.

> > > > +/*
> > > > + * Messages on the ring are padded to 128 bits
> > > > + * Len here refers to the exact length of the data not including the
> > > > + * 128 bit header. The message uses
> > > > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> > > > + * Using typeof(a) make clear that this does not truncate any high-order bits.
> > > > + */
> > > > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
> > >
> > > Why not just use ROUNDUP?
> > >
> > > And in any case this shouldn't be on the public header IMO, since it's
> > > not part of the interface AFAICT.
> >
> > Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
> > because it does need to be in the public header because it's used within the
> > Linux device driver, and items in that public Xen header need the 'xen' prefix
> > (so they now do).  Within the Linux code, it's used to choose a sensible ring
> > size, and also used when manipulating the rx_ptr on the guest side.
>
> I'm quite sure Linux (or any other OS) will have a roundup helper, or
> if there's indeed an OS without a roundup helper it should be added to
> the generic OS code. There's nothing Xen or ARGO specific in this
> roundup helper, hence I see no need to add it to the public header.
>
> I think you should instead:
>
> #define XEN_ARGO_MESSAGE_SIZE 0xf
>
> Or some such and use that value with the OS roundup helper.

OK, I'll go and look. Thanks for the suggestion.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-21  8:53               ` Jan Beulich
@ 2018-12-21 23:28                 ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2018-12-21 23:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Fri, Dec 21, 2018 at 12:53 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 21.12.18 at 09:16, <christopher.w.clark@gmail.com> wrote:
> > On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 21.12.18 at 02:25, <christopher.w.clark@gmail.com> wrote:
> >> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>
> >> >> >>> On 20.12.18 at 06:29, <christopher.w.clark@gmail.com> wrote:
> >> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >> >>
> >> >> >> > +static int
> >> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> >> >> >> > +                    uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> >> >> >> > +                    uint32_t len)
> >> >> >> > +{
> >> >> >> > +    int i;
> >> >> >> > +    int ret = 0;
> >> >> >> > +
> >> >> >> > +    if ( (npage << PAGE_SHIFT) < len )
> >> >> >> > +        return -EINVAL;
> >> >> >> > +
> >> >> >> > +    if ( ring_info->mfns )
> >> >> >> > +    {
> >> >> >> > +        /*
> >> >> >> > +         * Ring already existed. Check if it's the same ring,
> >> >> >> > +         * i.e. same number of pages and all translated gpfns still
> >> >> >> > +         * translating to the same mfns
> >> >> >> > +         */
> >> >> >>
> >> >> >> This comment makes me wonder whether the translations are
> >> >> >> permitted to change at other times. If so I'm not sure what
> >> >> >> value verification here has. If not, this probably would want to
> >> >> >> be debugging-only code.
> >> >> >
> >> >> > My understanding is that the gfn->mfn translation is not necessarily stable
> >> >> > across entry and exit from host power state S4, suspend to disk.
>
> Note this ^^^ (and see below).
>
> >> Now I'm afraid there's some confusion here: Originally you've
> >> said "host".
> >>
> >> >> How would that be? It's not stable across guest migration (or
> >> >> its non-live save/restore equivalent),
> >> >
> >> > Right, that's clear.
> >> >
> >> >> but how would things change across S3?
> >> >
> >> > I don't think that they do change in that case.
> >> >
> >> > From studying the code involved above, a related item: the guest runs the same
> >> > suspend and resume kernel code before entering into/exiting from either guest
> >> > S3 or S4, so the guest kernel resume code needs to re-register the rings, to
> >> > cover the case where it is coming up in an environment where they were dropped
> >> > - so that's what it does.
> >> >
> >> > This relates to the code section above: if guest entry to S3 is aborted at the
> >> > final step (eg. error or platform refuses, eg. maybe a physical device
> >> > interaction with passthrough) then the hypervisor has not torn down the rings,
> >> > the guest remains running within the same domain, and the guest resume logic
> >> > runs, which runs through re-registration for all its rings. The check in the
> >> > logic above allows the existing ring mappings within the hypervisor to be
> >> > preserved.
> >>
> >> Yet now you suddenly talk about guest S3.
> >
> > Well, the context is that you did just ask about S3, without
> > specifying host or guest.
>
> I'm sorry to be picky, but no, I don't think I did. You did expicitly
> say "host", making me further think only about that case.

OK, apologies for the confusing direction of the reply. It was not intended
to be so.

> > That logic aims to make ring registration idempotent, to avoid the
> > teardown of established mappings of the ring pages in the case where
> > doing so isn't needed.
>
> You treat complexity in the kernel for complexity in the hypervisor.

(s/treat/trade/ ?) OK, that is a fair concern, yes.

> I'm not sure this is appropriate, as I can't judge how much more
> difficult it would be for the guest to look after itself. But let's look
> at both cases again:
> - For guest S3, afaik, the domain doesn't change, and hence
>   memory assignment remains the same. No re-registration
>   necessary then afaict.
> - For guest S4, aiui, the domain gets destroyed and a new one
>   built upon resume. Re-registration would be needed, but due
>   to the domain re-construction no leftovers ought to exist in
>   Xen.

I agree.

> Hence to me it would seem more natural to have the guest deal
> with the situation, and have no extra logic for this in Xen. You
> want the guest to re-register anyway, yet simply avoiding to
> do so in the S3 case ought to be a single (or very few)
> conditional(s), i.e. not a whole lot of complexity.

OK. That looks doable. thanks.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2018-12-20  8:33       ` Jan Beulich
@ 2019-01-04  8:13         ` Christopher Clark
  2019-01-04  8:43           ` Roger Pau Monné
  2019-01-04 13:37           ` Jan Beulich
  0 siblings, 2 replies; 111+ messages in thread
From: Christopher Clark @ 2019-01-04  8:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, George Dunlap, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	eric chanudet, Roger Pau Monné

On Thu, Dec 20, 2018 at 12:33 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 20.12.18 at 06:58, <christopher.w.clark@gmail.com> wrote:
> > On Wed, Dec 12, 2018 at 3:53 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 01.12.18 at 02:32, <christopher.w.clark@gmail.com> wrote:
> >> > +static struct argo_ring_info *
> >> > +argo_ring_find_info_by_match(const struct domain *d, uint32_t port,
> >> > +                             domid_t partner_id, uint64_t partner_cookie)
> >> > +{
> >> > +    argo_ring_id_t id;
> >> > +    struct argo_ring_info *ring_info;
> >> > +
> >> > +    ASSERT(rw_is_locked(&d->argo->lock));
> >> > +
> >> > +    id.addr.port = port;
> >> > +    id.addr.domain_id = d->domain_id;
> >> > +    id.partner = partner_id;
> >> > +
> >> > +    ring_info = argo_ring_find_info(d, &id);
> >> > +    if ( ring_info && (partner_cookie == ring_info->partner_cookie) )
> >> > +        return ring_info;
> >>
> >> Such a cookie makes mismatches unlikely, but it doesn't exclude
> >> them. If there are other checks, is the cookie useful at all?
> >
> > Yes, I think so and it's proved useful elsewhere in the second
> > version of the series: it helps avoid sending signals to incorrect
> > domains that may not be argo-enabled.
>
> "It helps avoid" still isn't "it allows to avoid", i.e. it still sounds like
> an approach reducing likelihood instead of one excluding mistakes
> altogether.

ok, I'm at the point where I'm close to having a version three of the
series to post that addresses all the feedback so far, plus some
additional improvements, with the following two items remaining to
discuss:

1) the domain_cookie, with Jan's question about a) its exclusion of
mismatches and b) its utility.

Given the expressed concern that the timer-based cookie initialization
does not necessarily exclude mismatches, I've reimplemented it as a
simple 128-bit counter protected by the L1 lock: this does now exclude
mismatches.

The utility of the cookie follows from this:

domid, despite its name, is not a unique domain identifier; it's a
temporally unique id: Xen will ensure that no two domains that execute
concurrently have the same domid. Domain authentication needs to take
this into account.

With Argo, it affects these points:

* ring registration: when the partner domain domid is specified, argo
finds the currently executing domain with that domid, and needs to
be able to confirm that it is the same domain later when a sendv is
issued.

* sendv: needs to confirm that the domain sending a message is the same
as the single domain authorized to transmit when the ring was first
registered.

* notify: the querying domain asks about free space, and if there's not
enough then a record is kept internal to the hypervisor, and a signal
will be sent to the caller later when sufficient space becomes
available.  Before sending the signal, Xen needs to confirm that the
current domain with the domid it remembered is the same as the one that
issued the query, otherwise Xen is sending spurious signals to domains
that are not expecting it (and unless it checks, may not even be
argo-enabled).

* domain teardown: in the absence of the domain cookie, or an
alternative data structure that achieves the same ability to
distinguish a reincarnated domain, all the rings that are registered
that authorize the dying domid to send need to be torn down with
suitable notification to their owners, and all the pending signals for
that domain about available free space need to be nullified, to prevent
a later domain inheriting these credentials and signals.

Doing so either entails a potentially-expensive walk of all rings of all
domains, plus all the pending notifications on all rings the domain can
access, or additional complexity with new data structures storing
further metadata on the authorized domain on ring registration, etc.
The domain cookie which enables identity confirmation on a domid is
a reasonable alternative solution.

So: if the switch to a simple counter is sufficient to mitigate the
mismatch concern, and the utility of the cookie is potentially
acceptable, I'll post a v3 series for review with that present.


2) the p2m type of the guest-supplied memory for the ring.

Roger raised a query about not pinning the p2m type of memory
used for the ring, and my response on 21st December is here:

https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02204.html

At the moment, I haven't changed that code. If the p2m type is changed
after ring registration, is it a problem? If not, then I think the code is
OK; but if so then a pointer to what makes it problematic would be
helpful to determine an appropriate next step.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2019-01-04  8:13         ` Christopher Clark
@ 2019-01-04  8:43           ` Roger Pau Monné
  2019-01-04 13:37           ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-04  8:43 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, George Dunlap, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	eric chanudet

On Fri, Jan 04, 2019 at 12:13:09AM -0800, Christopher Clark wrote:
> 2) the p2m type of the guest-supplied memory for the ring.
> 
> Roger raised a query about not pinning the p2m type of memory
> used for the ring, and my response on 21st December is here:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02204.html
> 
> At the moment, I haven't changed that code. If the p2m type is changed
> after ring registration, is it a problem? If not, then I think the code is
> OK; but if so then a pointer to what makes it problematic would be
> helpful to determine an appropriate next step.

My point was that you don't need to check every time that the gfn ->
mfn translations for the ring are the same (as is done in
argo_find_ring_mfns). AFAICT you take a reference to each page in the
ring, so there's no need to check that the p2m mapping is still the
same.

I think the p2m type checks that you do are correct, and you should
only allow p2m_ram_rw pages to be used for the ring.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2018-12-21 23:05         ` Christopher Clark
@ 2019-01-04  8:57           ` Roger Pau Monné
  2019-01-04 13:22             ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-04  8:57 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, James McKenzie, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet

On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > Then I wonder why you need such check in any case if the code can
> > handle such cases, the more than the check itself is racy.
> 
> OK, so at the root of the question here is: does it matter what the p2m
> type of the memory is at these points:
> 
> 1) when the gfn is translated to mfn, at the time of ring registration

This is the important check, because that's where you should take a
reference to the page. In this case you should check that the page is
of ram_rw type.

> 2) when the hypervisor writes into guest memory:
>     - where the tx_ptr index is initialized in the register op
>     - where ringbuf data is written in sendv
>     - where ring description data is written in notify

As long as you keep a reference to the pages that are part of the ring
you don't need to do any checks when writing/reading from them. If the
guest messes up it's p2m and does change the gfn -> mfn mappings for
pages that are part of the ring that's the guest problem, the
hypervisor still has a reference to those pages so they won't be
reused.

> or is having PGT_writable_page type and ownership by the domain
> sufficient?
> 
> For 1), I think there's some use in saying no to a guest that has
> supplied a region that appears misconfigured.

Sure, when you setup the ring in Xen you should execute all the checks
and make sure the ring pages are of type ram_rw and take a reference
to each page.

> For 2), input would be appreciated. It currently works under the
> assumption that a p2m type check is unnecessary, which is why the
> put_gfn is where it is.

Right, and this should go away since it's confusing and unnecessary
AFAICT.

> For further background context, here's my understanding of this section:
> 
> When the guest invokes the hypercall operation to register a ring, it
> identifies the memory that it owns and wants the hypervisor to use by
> supplying an array of gfns (or in v2, addresses which are shifted to
> extract their gfns).
> 
> The hypervisor translates from gfns to mfns, using the translation that
> exists at that time, and then refers to that memory internally by mfn
> from there on out. This find_ring_mfn function is where the gfn->mfn
> translation happens.  (The variable name does needs renaming from pfn,
> as you noted - thanks.)
> 
> To do the translation from gfn to mfn, (on x86) it's using
> get_gfn_unshare. That's doing three things:
> * returns the mfn, if there is one.
> * returns the p2m type of that memory.
> * acquires a reference to that gfn, which needs to be dropped at some
>   point.
> 
> The p2m type type check on the gfn in find_ring_mfn at that time is
> possibly conservative, rejecting more types than perhaps it needs to,
> but the type that it accepts (p2m_ram_rw) is sane. It is a validation of
> the p2m type at that instant, intended to detect if the guest has
> supplied memory to the ring register op that does not make sense for it
> to use as a ring, as indicated by the current p2m type, and if so, fail
> early, or indicate that a retry later is needed.
> 
> Then the get_page_and_type call is where the memory identified by the
> mfn that was just obtained, gets locked to PGT_writable_page type, and
> ownership fixed to its current owner domain, by adding to its reference
> count.

Here you should make sure the get_page is performed while the gfn is
still locked, and once you have a reference to the page you can unlock
the gfn. As said above, if the guest then wants to mess up with the
gfn -> mfn mapping that's fine, the hypervisor already has a
reference to the underlying original memory page.

> Then the gfn reference count is dropped with the put_gfn call. This
> means that the guest can elect to change the p2m type afterwards, if it
> wants; (any change needs to be consistent with its domain ownership and
> PGT_writable_page type though -- not sure if that constrains possible types).

Hm, not really get_gfn / put_gfn doesn't actually take references, but
rather pick the p2m lock. What takes references is get_page.

> That memory can have guest-supplied data written into it, either by the
> domain owning the page itself, or in response to argo sendv operations
> by other domains that are authorized to transmit into the ring.
> 
> Your note that the "check itself is racy": ie. that a change of p2m type
> could occur immediately afterwards is true.
> 
> So: Do you think that a check on the current p2m type of the pages in
> the ring is needed at the points where the hypervisor issues writes into
> that ring memory?

No, as long as you have a reference to the page (note: not the gfn)
that you are writing to it should be fine.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-04  8:57           ` Roger Pau Monné
@ 2019-01-04 13:22             ` Jan Beulich
  2019-01-04 15:35               ` Roger Pau Monné
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2019-01-04 13:22 UTC (permalink / raw)
  To: Roger Pau Monne, Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet

>>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
> On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
>> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> >
>> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
>> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> > > >
>> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>> > Then I wonder why you need such check in any case if the code can
>> > handle such cases, the more than the check itself is racy.
>> 
>> OK, so at the root of the question here is: does it matter what the p2m
>> type of the memory is at these points:
>> 
>> 1) when the gfn is translated to mfn, at the time of ring registration
> 
> This is the important check, because that's where you should take a
> reference to the page. In this case you should check that the page is
> of ram_rw type.
> 
>> 2) when the hypervisor writes into guest memory:
>>     - where the tx_ptr index is initialized in the register op
>>     - where ringbuf data is written in sendv
>>     - where ring description data is written in notify
> 
> As long as you keep a reference to the pages that are part of the ring
> you don't need to do any checks when writing/reading from them. If the
> guest messes up it's p2m and does change the gfn -> mfn mappings for
> pages that are part of the ring that's the guest problem, the
> hypervisor still has a reference to those pages so they won't be
> reused.

For use cases like introspection this may not be fully correct,
but it may also be that my understanding there isn't fully
correct. If introspection agents care about _any_ writes to
a page, hypervisor ones (which in most cases are merely
writes on behalf of the guest) might matter as well. I think
to decide whether page accesses need to be accompanied
by any checks (and if so, which ones) one needs to
- establish what p2m type transitions are possible for a
  given page,
- verify what restrictions may occur "behind the back" of
  the entity wanting to do the accesses,
- explore whether doing the extra checking at p2m type
  change time wouldn't be better than at the time of access.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2019-01-04  8:13         ` Christopher Clark
  2019-01-04  8:43           ` Roger Pau Monné
@ 2019-01-04 13:37           ` Jan Beulich
  2019-01-07 20:54             ` Christopher Clark
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2019-01-04 13:37 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 04.01.19 at 09:13, <christopher.w.clark@gmail.com> wrote:
> ok, I'm at the point where I'm close to having a version three of the
> series to post that addresses all the feedback so far, plus some
> additional improvements, with the following two items remaining to
> discuss:
> 
> 1) the domain_cookie, with Jan's question about a) its exclusion of
> mismatches and b) its utility.
> 
> Given the expressed concern that the timer-based cookie initialization
> does not necessarily exclude mismatches, I've reimplemented it as a
> simple 128-bit counter protected by the L1 lock: this does now exclude
> mismatches.

... for all practical purposes, I assume you mean. In which case
I'd then immediately ask whether a 64-bit counter wouldn't do
as well.

> The utility of the cookie follows from this:
> 
> domid, despite its name, is not a unique domain identifier; it's a
> temporally unique id: Xen will ensure that no two domains that execute
> concurrently have the same domid. Domain authentication needs to take
> this into account.

Correct, at which point the question arises whether domain IDs
aren't too narrow. After all this isn't the first time we run into such
a restriction - see the opt_ibpb related code in context_switch().

> With Argo, it affects these points:
> 
> * ring registration: when the partner domain domid is specified, argo
> finds the currently executing domain with that domid, and needs to
> be able to confirm that it is the same domain later when a sendv is
> issued.
> 
> * sendv: needs to confirm that the domain sending a message is the same
> as the single domain authorized to transmit when the ring was first
> registered.
> 
> * notify: the querying domain asks about free space, and if there's not
> enough then a record is kept internal to the hypervisor, and a signal
> will be sent to the caller later when sufficient space becomes
> available.  Before sending the signal, Xen needs to confirm that the
> current domain with the domid it remembered is the same as the one that
> issued the query, otherwise Xen is sending spurious signals to domains
> that are not expecting it (and unless it checks, may not even be
> argo-enabled).
> 
> * domain teardown: in the absence of the domain cookie, or an
> alternative data structure that achieves the same ability to
> distinguish a reincarnated domain, all the rings that are registered
> that authorize the dying domid to send need to be torn down with
> suitable notification to their owners, and all the pending signals for
> that domain about available free space need to be nullified, to prevent
> a later domain inheriting these credentials and signals.
> 
> Doing so either entails a potentially-expensive walk of all rings of all
> domains, plus all the pending notifications on all rings the domain can
> access, or additional complexity with new data structures storing
> further metadata on the authorized domain on ring registration, etc.
> The domain cookie which enables identity confirmation on a domid is
> a reasonable alternative solution.

For all of these the question then is whether holding a reference
to the other domain (which has been looked up during ring
registration) wouldn't help. Furthermore this isn't a new problem,
see e.g. how event channel code deals with the ECS_INTERDOMAIN
case - without acquiring extra references, but instead with suitable
(and mutual) cleanup during domain destruction.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-04 13:22             ` Jan Beulich
@ 2019-01-04 15:35               ` Roger Pau Monné
  2019-01-04 15:47                 ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-04 15:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet

On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> >>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> >
> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> > > >
> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> >> > Then I wonder why you need such check in any case if the code can
> >> > handle such cases, the more than the check itself is racy.
> >> 
> >> OK, so at the root of the question here is: does it matter what the p2m
> >> type of the memory is at these points:
> >> 
> >> 1) when the gfn is translated to mfn, at the time of ring registration
> > 
> > This is the important check, because that's where you should take a
> > reference to the page. In this case you should check that the page is
> > of ram_rw type.
> > 
> >> 2) when the hypervisor writes into guest memory:
> >>     - where the tx_ptr index is initialized in the register op
> >>     - where ringbuf data is written in sendv
> >>     - where ring description data is written in notify
> > 
> > As long as you keep a reference to the pages that are part of the ring
> > you don't need to do any checks when writing/reading from them. If the
> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> > pages that are part of the ring that's the guest problem, the
> > hypervisor still has a reference to those pages so they won't be
> > reused.
> 
> For use cases like introspection this may not be fully correct,
> but it may also be that my understanding there isn't fully
> correct. If introspection agents care about _any_ writes to
> a page, hypervisor ones (which in most cases are merely
> writes on behalf of the guest) might matter as well. I think
> to decide whether page accesses need to be accompanied
> by any checks (and if so, which ones) one needs to
> - establish what p2m type transitions are possible for a
>   given page,
> - verify what restrictions may occur "behind the back" of
>   the entity wanting to do the accesses,
> - explore whether doing the extra checking at p2m type
>   change time wouldn't be better than at the time of access.

Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-04 15:35               ` Roger Pau Monné
@ 2019-01-04 15:47                 ` Jan Beulich
  2019-01-07  9:00                   ` Roger Pau Monné
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2019-01-04 15:47 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet

>>> On 04.01.19 at 16:35, <roger.pau@citrix.com> wrote:
> On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
>> >>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
>> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
>> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> >> >
>> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
>> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> 
> wrote:
>> >> > > >
>> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>> >> > Then I wonder why you need such check in any case if the code can
>> >> > handle such cases, the more than the check itself is racy.
>> >> 
>> >> OK, so at the root of the question here is: does it matter what the p2m
>> >> type of the memory is at these points:
>> >> 
>> >> 1) when the gfn is translated to mfn, at the time of ring registration
>> > 
>> > This is the important check, because that's where you should take a
>> > reference to the page. In this case you should check that the page is
>> > of ram_rw type.
>> > 
>> >> 2) when the hypervisor writes into guest memory:
>> >>     - where the tx_ptr index is initialized in the register op
>> >>     - where ringbuf data is written in sendv
>> >>     - where ring description data is written in notify
>> > 
>> > As long as you keep a reference to the pages that are part of the ring
>> > you don't need to do any checks when writing/reading from them. If the
>> > guest messes up it's p2m and does change the gfn -> mfn mappings for
>> > pages that are part of the ring that's the guest problem, the
>> > hypervisor still has a reference to those pages so they won't be
>> > reused.
>> 
>> For use cases like introspection this may not be fully correct,
>> but it may also be that my understanding there isn't fully
>> correct. If introspection agents care about _any_ writes to
>> a page, hypervisor ones (which in most cases are merely
>> writes on behalf of the guest) might matter as well. I think
>> to decide whether page accesses need to be accompanied
>> by any checks (and if so, which ones) one needs to
>> - establish what p2m type transitions are possible for a
>>   given page,
>> - verify what restrictions may occur "behind the back" of
>>   the entity wanting to do the accesses,
>> - explore whether doing the extra checking at p2m type
>>   change time wouldn't be better than at the time of access.
> 
> Maybe this is use-case is different, but how does introspection handle
> accesses to the shared info page or the runstate info for example?
> 
> I would consider argo to be the same in this regard.

Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-04 15:47                 ` Jan Beulich
@ 2019-01-07  9:00                   ` Roger Pau Monné
  2019-01-09 16:15                     ` Tamas K Lengyel
  0 siblings, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-07  9:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	ross.philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, Tamas K Lengyel, xen-devel, eric chanudet

Adding the introspection guys.

On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
> >>> On 04.01.19 at 16:35, <roger.pau@citrix.com> wrote:
> > On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
> >> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> >> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> >> >
> >> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> >> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com> 
> > wrote:
> >> >> > > >
> >> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> >> >> > Then I wonder why you need such check in any case if the code can
> >> >> > handle such cases, the more than the check itself is racy.
> >> >> 
> >> >> OK, so at the root of the question here is: does it matter what the p2m
> >> >> type of the memory is at these points:
> >> >> 
> >> >> 1) when the gfn is translated to mfn, at the time of ring registration
> >> > 
> >> > This is the important check, because that's where you should take a
> >> > reference to the page. In this case you should check that the page is
> >> > of ram_rw type.
> >> > 
> >> >> 2) when the hypervisor writes into guest memory:
> >> >>     - where the tx_ptr index is initialized in the register op
> >> >>     - where ringbuf data is written in sendv
> >> >>     - where ring description data is written in notify
> >> > 
> >> > As long as you keep a reference to the pages that are part of the ring
> >> > you don't need to do any checks when writing/reading from them. If the
> >> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> >> > pages that are part of the ring that's the guest problem, the
> >> > hypervisor still has a reference to those pages so they won't be
> >> > reused.
> >> 
> >> For use cases like introspection this may not be fully correct,
> >> but it may also be that my understanding there isn't fully
> >> correct. If introspection agents care about _any_ writes to
> >> a page, hypervisor ones (which in most cases are merely
> >> writes on behalf of the guest) might matter as well. I think
> >> to decide whether page accesses need to be accompanied
> >> by any checks (and if so, which ones) one needs to
> >> - establish what p2m type transitions are possible for a
> >>   given page,
> >> - verify what restrictions may occur "behind the back" of
> >>   the entity wanting to do the accesses,
> >> - explore whether doing the extra checking at p2m type
> >>   change time wouldn't be better than at the time of access.
> > 
> > Maybe this is use-case is different, but how does introspection handle
> > accesses to the shared info page or the runstate info for example?
> > 
> > I would consider argo to be the same in this regard.
> 
> Not exactly: The shared info page is special in any event. For
> runstate info (and alike - there's also struct vcpu_time_info)
> I'd question correctness of the current handling. If that's
> wrong already, I'd prefer if the issue wasn't spread.

There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/25] argo: implement the sendv op
  2019-01-04 13:37           ` Jan Beulich
@ 2019-01-07 20:54             ` Christopher Clark
  0 siblings, 0 replies; 111+ messages in thread
From: Christopher Clark @ 2019-01-07 20:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Fri, Jan 4, 2019 at 5:37 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 04.01.19 at 09:13, <christopher.w.clark@gmail.com> wrote:
> > ok, I'm at the point where I'm close to having a version three of the
> > series to post that addresses all the feedback so far, plus some
> > additional improvements, with the following two items remaining to
> > discuss:
> >
> > 1) the domain_cookie, with Jan's question about a) its exclusion of
> > mismatches and b) its utility.
> >
> > Given the expressed concern that the timer-based cookie initialization
> > does not necessarily exclude mismatches, I've reimplemented it as a
> > simple 128-bit counter protected by the L1 lock: this does now exclude
> > mismatches.
>
> ... for all practical purposes, I assume you mean. In which case
> I'd then immediately ask whether a 64-bit counter wouldn't do
> as well.
>
> > The utility of the cookie follows from this:
> >
> > domid, despite its name, is not a unique domain identifier; it's a
> > temporally unique id: Xen will ensure that no two domains that execute
> > concurrently have the same domid. Domain authentication needs to take
> > this into account.
>
> Correct, at which point the question arises whether domain IDs
> aren't too narrow. After all this isn't the first time we run into such
> a restriction - see the opt_ibpb related code in context_switch().
>
> > With Argo, it affects these points:
> >
> > * ring registration: when the partner domain domid is specified, argo
> > finds the currently executing domain with that domid, and needs to
> > be able to confirm that it is the same domain later when a sendv is
> > issued.
> >
> > * sendv: needs to confirm that the domain sending a message is the same
> > as the single domain authorized to transmit when the ring was first
> > registered.
> >
> > * notify: the querying domain asks about free space, and if there's not
> > enough then a record is kept internal to the hypervisor, and a signal
> > will be sent to the caller later when sufficient space becomes
> > available.  Before sending the signal, Xen needs to confirm that the
> > current domain with the domid it remembered is the same as the one that
> > issued the query, otherwise Xen is sending spurious signals to domains
> > that are not expecting it (and unless it checks, may not even be
> > argo-enabled).
> >
> > * domain teardown: in the absence of the domain cookie, or an
> > alternative data structure that achieves the same ability to
> > distinguish a reincarnated domain, all the rings that are registered
> > that authorize the dying domid to send need to be torn down with
> > suitable notification to their owners, and all the pending signals for
> > that domain about available free space need to be nullified, to prevent
> > a later domain inheriting these credentials and signals.
> >
> > Doing so either entails a potentially-expensive walk of all rings of all
> > domains, plus all the pending notifications on all rings the domain can
> > access, or additional complexity with new data structures storing
> > further metadata on the authorized domain on ring registration, etc.
> > The domain cookie which enables identity confirmation on a domid is
> > a reasonable alternative solution.
>
> For all of these the question then is whether holding a reference
> to the other domain (which has been looked up during ring
> registration) wouldn't help. Furthermore this isn't a new problem,
> see e.g. how event channel code deals with the ECS_INTERDOMAIN
> case - without acquiring extra references, but instead with suitable
> (and mutual) cleanup during domain destruction.

Just to close on this thread: the v3 series posted last night has
added state teardown and mutual cleanup as requested, with the cookie
removed. Thanks for the pointers.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-07  9:00                   ` Roger Pau Monné
@ 2019-01-09 16:15                     ` Tamas K Lengyel
  2019-01-09 16:23                       ` Razvan Cojocaru
  2019-01-09 16:34                       ` Roger Pau Monné
  0 siblings, 2 replies; 111+ messages in thread
From: Tamas K Lengyel @ 2019-01-09 16:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	ross.philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, eric chanudet

On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> Adding the introspection guys.
>
> On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
> > >>> On 04.01.19 at 16:35, <roger.pau@citrix.com> wrote:
> > > On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> > >> >>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
> > >> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> > >> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >> >> >
> > >> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > >> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com>
> > > wrote:
> > >> >> > > >
> > >> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > >> >> > Then I wonder why you need such check in any case if the code can
> > >> >> > handle such cases, the more than the check itself is racy.
> > >> >>
> > >> >> OK, so at the root of the question here is: does it matter what the p2m
> > >> >> type of the memory is at these points:
> > >> >>
> > >> >> 1) when the gfn is translated to mfn, at the time of ring registration
> > >> >
> > >> > This is the important check, because that's where you should take a
> > >> > reference to the page. In this case you should check that the page is
> > >> > of ram_rw type.
> > >> >
> > >> >> 2) when the hypervisor writes into guest memory:
> > >> >>     - where the tx_ptr index is initialized in the register op
> > >> >>     - where ringbuf data is written in sendv
> > >> >>     - where ring description data is written in notify
> > >> >
> > >> > As long as you keep a reference to the pages that are part of the ring
> > >> > you don't need to do any checks when writing/reading from them. If the
> > >> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> > >> > pages that are part of the ring that's the guest problem, the
> > >> > hypervisor still has a reference to those pages so they won't be
> > >> > reused.
> > >>
> > >> For use cases like introspection this may not be fully correct,
> > >> but it may also be that my understanding there isn't fully
> > >> correct. If introspection agents care about _any_ writes to
> > >> a page, hypervisor ones (which in most cases are merely
> > >> writes on behalf of the guest) might matter as well. I think
> > >> to decide whether page accesses need to be accompanied
> > >> by any checks (and if so, which ones) one needs to
> > >> - establish what p2m type transitions are possible for a
> > >>   given page,
> > >> - verify what restrictions may occur "behind the back" of
> > >>   the entity wanting to do the accesses,
> > >> - explore whether doing the extra checking at p2m type
> > >>   change time wouldn't be better than at the time of access.
> > >
> > > Maybe this is use-case is different, but how does introspection handle
> > > accesses to the shared info page or the runstate info for example?
> > >
> > > I would consider argo to be the same in this regard.
> >
> > Not exactly: The shared info page is special in any event. For
> > runstate info (and alike - there's also struct vcpu_time_info)
> > I'd question correctness of the current handling. If that's
> > wrong already, I'd prefer if the issue wasn't spread.
>
> There are also grants, which when used together with another guest on
> the same host could allow to bypass introspection AFAICT? (unless
> there's some policy applied that limit grant sharing to trusted
> domains)
>
> TBH I'm not sure how to handle hypoervisor accesses with
> introspection.  My knowledge of introspection is fairly limited, but
> it pauses the guest and sends a notification to an in guest agent. I'm
> not sure this is applicable to hypervisor writes, since it's not
> possible to pause hypervisor execution and wait for a response from a
> guest agent.
>

Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-09 16:15                     ` Tamas K Lengyel
@ 2019-01-09 16:23                       ` Razvan Cojocaru
  2019-01-09 16:34                       ` Roger Pau Monné
  1 sibling, 0 replies; 111+ messages in thread
From: Razvan Cojocaru @ 2019-01-09 16:23 UTC (permalink / raw)
  To: Tamas K Lengyel, Roger Pau Monné
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, eric chanudet

On 1/9/19 6:15 PM, Tamas K Lengyel wrote:
> On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>
>> Adding the introspection guys.
>>
>> On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
>>>>>> On 04.01.19 at 16:35, <roger.pau@citrix.com> wrote:
>>>> On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
>>>>>>>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
>>>>>> On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
>>>>>>> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
>>>>>>>>> On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com>
>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>>>>>>>> Then I wonder why you need such check in any case if the code can
>>>>>>>> handle such cases, the more than the check itself is racy.
>>>>>>>
>>>>>>> OK, so at the root of the question here is: does it matter what the p2m
>>>>>>> type of the memory is at these points:
>>>>>>>
>>>>>>> 1) when the gfn is translated to mfn, at the time of ring registration
>>>>>>
>>>>>> This is the important check, because that's where you should take a
>>>>>> reference to the page. In this case you should check that the page is
>>>>>> of ram_rw type.
>>>>>>
>>>>>>> 2) when the hypervisor writes into guest memory:
>>>>>>>      - where the tx_ptr index is initialized in the register op
>>>>>>>      - where ringbuf data is written in sendv
>>>>>>>      - where ring description data is written in notify
>>>>>>
>>>>>> As long as you keep a reference to the pages that are part of the ring
>>>>>> you don't need to do any checks when writing/reading from them. If the
>>>>>> guest messes up it's p2m and does change the gfn -> mfn mappings for
>>>>>> pages that are part of the ring that's the guest problem, the
>>>>>> hypervisor still has a reference to those pages so they won't be
>>>>>> reused.
>>>>>
>>>>> For use cases like introspection this may not be fully correct,
>>>>> but it may also be that my understanding there isn't fully
>>>>> correct. If introspection agents care about _any_ writes to
>>>>> a page, hypervisor ones (which in most cases are merely
>>>>> writes on behalf of the guest) might matter as well. I think
>>>>> to decide whether page accesses need to be accompanied
>>>>> by any checks (and if so, which ones) one needs to
>>>>> - establish what p2m type transitions are possible for a
>>>>>    given page,
>>>>> - verify what restrictions may occur "behind the back" of
>>>>>    the entity wanting to do the accesses,
>>>>> - explore whether doing the extra checking at p2m type
>>>>>    change time wouldn't be better than at the time of access.
>>>>
>>>> Maybe this is use-case is different, but how does introspection handle
>>>> accesses to the shared info page or the runstate info for example?
>>>>
>>>> I would consider argo to be the same in this regard.
>>>
>>> Not exactly: The shared info page is special in any event. For
>>> runstate info (and alike - there's also struct vcpu_time_info)
>>> I'd question correctness of the current handling. If that's
>>> wrong already, I'd prefer if the issue wasn't spread.
>>
>> There are also grants, which when used together with another guest on
>> the same host could allow to bypass introspection AFAICT? (unless
>> there's some policy applied that limit grant sharing to trusted
>> domains)
>>
>> TBH I'm not sure how to handle hypoervisor accesses with
>> introspection.  My knowledge of introspection is fairly limited, but
>> it pauses the guest and sends a notification to an in guest agent. I'm
>> not sure this is applicable to hypervisor writes, since it's not
>> possible to pause hypervisor execution and wait for a response from a
>> guest agent.
>>
> 
> Introspection applications only care about memory accesses performed
> by the guest. Hypervisor accesses to monitored pages are not included
> when monitoring - it is actually a feature when using the emulator in
> Xen to continue guest execution because the hypervisor ignores EPT
> memory permissions that trip the guest for introspection. So having
> the hypervisor access memory or a grant-shared page being accessed in
> another domain are not a problem for introspection.

Indeed, that's how it goes.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-09 16:15                     ` Tamas K Lengyel
  2019-01-09 16:23                       ` Razvan Cojocaru
@ 2019-01-09 16:34                       ` Roger Pau Monné
  2019-01-09 16:48                         ` Razvan Cojocaru
  1 sibling, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-09 16:34 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	ross.philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Tim Deegan, Christopher Clark,
	James McKenzie, George Dunlap, Rich Persaud, Paul Durrant,
	Jan Beulich, eric chanudet, xen-devel, Ian Jackson,
	Roger Pau Monné

On Wed, Jan 9, 2019 at 5:17 PM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > Adding the introspection guys.
> >
> > On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
> > > >>> On 04.01.19 at 16:35, <roger.pau@citrix.com> wrote:
> > > > On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> > > >> >>> On 04.01.19 at 09:57, <roger.pau@citrix.com> wrote:
> > > >> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> > > >> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >> >> >
> > > >> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > > >> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@citrix.com>
> > > > wrote:
> > > >> >> > > >
> > > >> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > > >> >> > Then I wonder why you need such check in any case if the code can
> > > >> >> > handle such cases, the more than the check itself is racy.
> > > >> >>
> > > >> >> OK, so at the root of the question here is: does it matter what the p2m
> > > >> >> type of the memory is at these points:
> > > >> >>
> > > >> >> 1) when the gfn is translated to mfn, at the time of ring registration
> > > >> >
> > > >> > This is the important check, because that's where you should take a
> > > >> > reference to the page. In this case you should check that the page is
> > > >> > of ram_rw type.
> > > >> >
> > > >> >> 2) when the hypervisor writes into guest memory:
> > > >> >>     - where the tx_ptr index is initialized in the register op
> > > >> >>     - where ringbuf data is written in sendv
> > > >> >>     - where ring description data is written in notify
> > > >> >
> > > >> > As long as you keep a reference to the pages that are part of the ring
> > > >> > you don't need to do any checks when writing/reading from them. If the
> > > >> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> > > >> > pages that are part of the ring that's the guest problem, the
> > > >> > hypervisor still has a reference to those pages so they won't be
> > > >> > reused.
> > > >>
> > > >> For use cases like introspection this may not be fully correct,
> > > >> but it may also be that my understanding there isn't fully
> > > >> correct. If introspection agents care about _any_ writes to
> > > >> a page, hypervisor ones (which in most cases are merely
> > > >> writes on behalf of the guest) might matter as well. I think
> > > >> to decide whether page accesses need to be accompanied
> > > >> by any checks (and if so, which ones) one needs to
> > > >> - establish what p2m type transitions are possible for a
> > > >>   given page,
> > > >> - verify what restrictions may occur "behind the back" of
> > > >>   the entity wanting to do the accesses,
> > > >> - explore whether doing the extra checking at p2m type
> > > >>   change time wouldn't be better than at the time of access.
> > > >
> > > > Maybe this is use-case is different, but how does introspection handle
> > > > accesses to the shared info page or the runstate info for example?
> > > >
> > > > I would consider argo to be the same in this regard.
> > >
> > > Not exactly: The shared info page is special in any event. For
> > > runstate info (and alike - there's also struct vcpu_time_info)
> > > I'd question correctness of the current handling. If that's
> > > wrong already, I'd prefer if the issue wasn't spread.
> >
> > There are also grants, which when used together with another guest on
> > the same host could allow to bypass introspection AFAICT? (unless
> > there's some policy applied that limit grant sharing to trusted
> > domains)
> >
> > TBH I'm not sure how to handle hypoervisor accesses with
> > introspection.  My knowledge of introspection is fairly limited, but
> > it pauses the guest and sends a notification to an in guest agent. I'm
> > not sure this is applicable to hypervisor writes, since it's not
> > possible to pause hypervisor execution and wait for a response from a
> > guest agent.
> >
>
> Introspection applications only care about memory accesses performed
> by the guest. Hypervisor accesses to monitored pages are not included
> when monitoring - it is actually a feature when using the emulator in
> Xen to continue guest execution because the hypervisor ignores EPT
> memory permissions that trip the guest for introspection. So having
> the hypervisor access memory or a grant-shared page being accessed in
> another domain are not a problem for introspection.

Can't then two guests running on the same host be able to completely
bypass introspection? I guess you prevent this by limiting to which
guests pages can be shared?

If that's the case, and introspection doesn't care about hypervisor
accesses to guest pages, then just getting a reference to the
underlying page when the ring is setup should be enough. There's no
need to check the gfn -> mfn relation every time there's an hypervisor
access to the ring.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-09 16:34                       ` Roger Pau Monné
@ 2019-01-09 16:48                         ` Razvan Cojocaru
  2019-01-09 16:50                           ` Tamas K Lengyel
  0 siblings, 1 reply; 111+ messages in thread
From: Razvan Cojocaru @ 2019-01-09 16:48 UTC (permalink / raw)
  To: Roger Pau Monné, Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Tim Deegan, Christopher Clark,
	James McKenzie, George Dunlap, Rich Persaud, Paul Durrant,
	Jan Beulich, eric chanudet, xen-devel, Ian Jackson,
	Roger Pau Monné

On 1/9/19 6:34 PM, Roger Pau Monné wrote:
>>>>> Maybe this is use-case is different, but how does introspection handle
>>>>> accesses to the shared info page or the runstate info for example?
>>>>>
>>>>> I would consider argo to be the same in this regard.
>>>>
>>>> Not exactly: The shared info page is special in any event. For
>>>> runstate info (and alike - there's also struct vcpu_time_info)
>>>> I'd question correctness of the current handling. If that's
>>>> wrong already, I'd prefer if the issue wasn't spread.
>>>
>>> There are also grants, which when used together with another guest on
>>> the same host could allow to bypass introspection AFAICT? (unless
>>> there's some policy applied that limit grant sharing to trusted
>>> domains)
>>>
>>> TBH I'm not sure how to handle hypoervisor accesses with
>>> introspection.  My knowledge of introspection is fairly limited, but
>>> it pauses the guest and sends a notification to an in guest agent. I'm
>>> not sure this is applicable to hypervisor writes, since it's not
>>> possible to pause hypervisor execution and wait for a response from a
>>> guest agent.
>>>
>>
>> Introspection applications only care about memory accesses performed
>> by the guest. Hypervisor accesses to monitored pages are not included
>> when monitoring - it is actually a feature when using the emulator in
>> Xen to continue guest execution because the hypervisor ignores EPT
>> memory permissions that trip the guest for introspection. So having
>> the hypervisor access memory or a grant-shared page being accessed in
>> another domain are not a problem for introspection.
> 
> Can't then two guests running on the same host be able to completely
> bypass introspection? I guess you prevent this by limiting to which
> guests pages can be shared?

Would these two guests be HVM guests? Introspection only works for HVM 
guests. I'm not sure I follow your scenario though. How would these 
guests collaborate to escape introspection via grants?

> If that's the case, and introspection doesn't care about hypervisor
> accesses to guest pages, then just getting a reference to the
> underlying page when the ring is setup should be enough. There's no
> need to check the gfn -> mfn relation every time there's an hypervisor
> access to the ring.

I think so, but I might be missing something.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-09 16:48                         ` Razvan Cojocaru
@ 2019-01-09 16:50                           ` Tamas K Lengyel
  2019-01-09 16:59                             ` Roger Pau Monné
  2019-01-09 17:03                             ` Razvan Cojocaru
  0 siblings, 2 replies; 111+ messages in thread
From: Tamas K Lengyel @ 2019-01-09 16:50 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, James McKenzie,
	ross.philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Roger Pau Monné,
	Tim Deegan, Christopher Clark, Konrad Rzeszutek Wilk,
	George Dunlap, Rich Persaud, Paul Durrant, Jan Beulich,
	eric chanudet, xen-devel, Ian Jackson, Roger Pau Monné

On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
>
> On 1/9/19 6:34 PM, Roger Pau Monné wrote:
> >>>>> Maybe this is use-case is different, but how does introspection handle
> >>>>> accesses to the shared info page or the runstate info for example?
> >>>>>
> >>>>> I would consider argo to be the same in this regard.
> >>>>
> >>>> Not exactly: The shared info page is special in any event. For
> >>>> runstate info (and alike - there's also struct vcpu_time_info)
> >>>> I'd question correctness of the current handling. If that's
> >>>> wrong already, I'd prefer if the issue wasn't spread.
> >>>
> >>> There are also grants, which when used together with another guest on
> >>> the same host could allow to bypass introspection AFAICT? (unless
> >>> there's some policy applied that limit grant sharing to trusted
> >>> domains)
> >>>
> >>> TBH I'm not sure how to handle hypoervisor accesses with
> >>> introspection.  My knowledge of introspection is fairly limited, but
> >>> it pauses the guest and sends a notification to an in guest agent. I'm
> >>> not sure this is applicable to hypervisor writes, since it's not
> >>> possible to pause hypervisor execution and wait for a response from a
> >>> guest agent.
> >>>
> >>
> >> Introspection applications only care about memory accesses performed
> >> by the guest. Hypervisor accesses to monitored pages are not included
> >> when monitoring - it is actually a feature when using the emulator in
> >> Xen to continue guest execution because the hypervisor ignores EPT
> >> memory permissions that trip the guest for introspection. So having
> >> the hypervisor access memory or a grant-shared page being accessed in
> >> another domain are not a problem for introspection.
> >
> > Can't then two guests running on the same host be able to completely
> > bypass introspection? I guess you prevent this by limiting to which
> > guests pages can be shared?
>
> Would these two guests be HVM guests? Introspection only works for HVM
> guests. I'm not sure I follow your scenario though. How would these
> guests collaborate to escape introspection via grants?

If there are two domains acting maliciously in concert to bypass
monitoring of memory writes they could achieve that with grants, yes.
Say a write-monitored page is grant-shared to another domain, which
then does the write on behalf of the first. I wouldn't say that's
"completely bypassing introspection" though, there are many types of
events that can be monitored, write-accesses are only one. I'm not
aware of any mechanism that can be used to limit which pages can be
shared but you can use XSM to restrict which domains can share pages
to begin with. That's normally enough.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-09 16:50                           ` Tamas K Lengyel
@ 2019-01-09 16:59                             ` Roger Pau Monné
  2019-01-09 17:03                               ` Fwd: " Roger Pau Monné
  2019-01-09 17:03                             ` Razvan Cojocaru
  1 sibling, 1 reply; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-09 16:59 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Ross Philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Tim Deegan, Christopher Clark,
	James McKenzie, George Dunlap, Rich Persaud, Paul Durrant,
	Jan Beulich, eric chanudet, xen-devel, Ian Jackson,
	Roger Pau Monné

On Wed, Jan 9, 2019 at 5:51 PM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
> >
> > On 1/9/19 6:34 PM, Roger Pau Monné wrote:
> > >>>>> Maybe this is use-case is different, but how does introspection handle
> > >>>>> accesses to the shared info page or the runstate info for example?
> > >>>>>
> > >>>>> I would consider argo to be the same in this regard.
> > >>>>
> > >>>> Not exactly: The shared info page is special in any event. For
> > >>>> runstate info (and alike - there's also struct vcpu_time_info)
> > >>>> I'd question correctness of the current handling. If that's
> > >>>> wrong already, I'd prefer if the issue wasn't spread.
> > >>>
> > >>> There are also grants, which when used together with another guest on
> > >>> the same host could allow to bypass introspection AFAICT? (unless
> > >>> there's some policy applied that limit grant sharing to trusted
> > >>> domains)
> > >>>
> > >>> TBH I'm not sure how to handle hypoervisor accesses with
> > >>> introspection.  My knowledge of introspection is fairly limited, but
> > >>> it pauses the guest and sends a notification to an in guest agent. I'm
> > >>> not sure this is applicable to hypervisor writes, since it's not
> > >>> possible to pause hypervisor execution and wait for a response from a
> > >>> guest agent.
> > >>>
> > >>
> > >> Introspection applications only care about memory accesses performed
> > >> by the guest. Hypervisor accesses to monitored pages are not included
> > >> when monitoring - it is actually a feature when using the emulator in
> > >> Xen to continue guest execution because the hypervisor ignores EPT
> > >> memory permissions that trip the guest for introspection. So having
> > >> the hypervisor access memory or a grant-shared page being accessed in
> > >> another domain are not a problem for introspection.
> > >
> > > Can't then two guests running on the same host be able to completely
> > > bypass introspection? I guess you prevent this by limiting to which
> > > guests pages can be shared?
> >
> > Would these two guests be HVM guests? Introspection only works for HVM
> > guests. I'm not sure I follow your scenario though. How would these
> > guests collaborate to escape introspection via grants?
>
> If there are two domains acting maliciously in concert to bypass
> monitoring of memory writes they could achieve that with grants, yes.
> Say a write-monitored page is grant-shared to another domain, which
> then does the write on behalf of the first. I wouldn't say that's
> "completely bypassing introspection" though, there are many types of
> events that can be monitored, write-accesses are only one. I'm not
> aware of any mechanism that can be used to limit which pages can be
> shared but you can use XSM to restrict which domains can share pages
> to begin with. That's normally enough.

Yes, I assumed that would be the way to protect against such attacks,
ie: limiting to which guests pages can be shared. I think just making
sure the right access checks are placed in XSM (just like they are for
grants) should be enough.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Fwd:  [PATCH 13/25] argo: implement the register op
  2019-01-09 16:59                             ` Roger Pau Monné
@ 2019-01-09 17:03                               ` Roger Pau Monné
  0 siblings, 0 replies; 111+ messages in thread
From: Roger Pau Monné @ 2019-01-09 17:03 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, Razvan Cojocaru,
	Ross Philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Tim Deegan, Christopher Clark,
	James McKenzie, George Dunlap, Rich Persaud, Paul Durrant,
	Jan Beulich, eric chanudet, xen-devel, Ian Jackson,
	Roger Pau Monné

On Wed, Jan 9, 2019 at 5:51 PM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
> >
> > On 1/9/19 6:34 PM, Roger Pau Monné wrote:
> > >>>>> Maybe this is use-case is different, but how does introspection handle
> > >>>>> accesses to the shared info page or the runstate info for example?
> > >>>>>
> > >>>>> I would consider argo to be the same in this regard.
> > >>>>
> > >>>> Not exactly: The shared info page is special in any event. For
> > >>>> runstate info (and alike - there's also struct vcpu_time_info)
> > >>>> I'd question correctness of the current handling. If that's
> > >>>> wrong already, I'd prefer if the issue wasn't spread.
> > >>>
> > >>> There are also grants, which when used together with another guest on
> > >>> the same host could allow to bypass introspection AFAICT? (unless
> > >>> there's some policy applied that limit grant sharing to trusted
> > >>> domains)
> > >>>
> > >>> TBH I'm not sure how to handle hypoervisor accesses with
> > >>> introspection.  My knowledge of introspection is fairly limited, but
> > >>> it pauses the guest and sends a notification to an in guest agent. I'm
> > >>> not sure this is applicable to hypervisor writes, since it's not
> > >>> possible to pause hypervisor execution and wait for a response from a
> > >>> guest agent.
> > >>>
> > >>
> > >> Introspection applications only care about memory accesses performed
> > >> by the guest. Hypervisor accesses to monitored pages are not included
> > >> when monitoring - it is actually a feature when using the emulator in
> > >> Xen to continue guest execution because the hypervisor ignores EPT
> > >> memory permissions that trip the guest for introspection. So having
> > >> the hypervisor access memory or a grant-shared page being accessed in
> > >> another domain are not a problem for introspection.
> > >
> > > Can't then two guests running on the same host be able to completely
> > > bypass introspection? I guess you prevent this by limiting to which
> > > guests pages can be shared?
> >
> > Would these two guests be HVM guests? Introspection only works for HVM
> > guests. I'm not sure I follow your scenario though. How would these
> > guests collaborate to escape introspection via grants?
>
> If there are two domains acting maliciously in concert to bypass
> monitoring of memory writes they could achieve that with grants, yes.
> Say a write-monitored page is grant-shared to another domain, which
> then does the write on behalf of the first. I wouldn't say that's
> "completely bypassing introspection" though, there are many types of
> events that can be monitored, write-accesses are only one. I'm not
> aware of any mechanism that can be used to limit which pages can be
> shared but you can use XSM to restrict which domains can share pages
> to begin with. That's normally enough.

Yes, I assumed that would be the way to protect against such attacks,
ie: limiting to which guests pages can be shared. I think just making
sure the right access checks are placed in XSM (just like they are for
grants) should be enough.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 13/25] argo: implement the register op
  2019-01-09 16:50                           ` Tamas K Lengyel
  2019-01-09 16:59                             ` Roger Pau Monné
@ 2019-01-09 17:03                             ` Razvan Cojocaru
  1 sibling, 0 replies; 111+ messages in thread
From: Razvan Cojocaru @ 2019-01-09 17:03 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, James McKenzie,
	ross.philipson, Jason Andryuk, Daniel Smith, Andrew Cooper,
	Roger Pau Monné,
	Tim Deegan, Christopher Clark, Konrad Rzeszutek Wilk,
	George Dunlap, Rich Persaud, Paul Durrant, Jan Beulich,
	eric chanudet, xen-devel, Ian Jackson, Roger Pau Monné

On 1/9/19 6:50 PM, Tamas K Lengyel wrote:
> On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>>
>> On 1/9/19 6:34 PM, Roger Pau Monné wrote:
>>>>>>> Maybe this is use-case is different, but how does introspection handle
>>>>>>> accesses to the shared info page or the runstate info for example?
>>>>>>>
>>>>>>> I would consider argo to be the same in this regard.
>>>>>>
>>>>>> Not exactly: The shared info page is special in any event. For
>>>>>> runstate info (and alike - there's also struct vcpu_time_info)
>>>>>> I'd question correctness of the current handling. If that's
>>>>>> wrong already, I'd prefer if the issue wasn't spread.
>>>>>
>>>>> There are also grants, which when used together with another guest on
>>>>> the same host could allow to bypass introspection AFAICT? (unless
>>>>> there's some policy applied that limit grant sharing to trusted
>>>>> domains)
>>>>>
>>>>> TBH I'm not sure how to handle hypoervisor accesses with
>>>>> introspection.  My knowledge of introspection is fairly limited, but
>>>>> it pauses the guest and sends a notification to an in guest agent. I'm
>>>>> not sure this is applicable to hypervisor writes, since it's not
>>>>> possible to pause hypervisor execution and wait for a response from a
>>>>> guest agent.
>>>>>
>>>>
>>>> Introspection applications only care about memory accesses performed
>>>> by the guest. Hypervisor accesses to monitored pages are not included
>>>> when monitoring - it is actually a feature when using the emulator in
>>>> Xen to continue guest execution because the hypervisor ignores EPT
>>>> memory permissions that trip the guest for introspection. So having
>>>> the hypervisor access memory or a grant-shared page being accessed in
>>>> another domain are not a problem for introspection.
>>>
>>> Can't then two guests running on the same host be able to completely
>>> bypass introspection? I guess you prevent this by limiting to which
>>> guests pages can be shared?
>>
>> Would these two guests be HVM guests? Introspection only works for HVM
>> guests. I'm not sure I follow your scenario though. How would these
>> guests collaborate to escape introspection via grants?
> 
> If there are two domains acting maliciously in concert to bypass
> monitoring of memory writes they could achieve that with grants, yes.
> Say a write-monitored page is grant-shared to another domain, which
> then does the write on behalf of the first. I wouldn't say that's
> "completely bypassing introspection" though, there are many types of
> events that can be monitored, write-accesses are only one. I'm not
> aware of any mechanism that can be used to limit which pages can be
> shared but you can use XSM to restrict which domains can share pages
> to begin with. That's normally enough.

Right, I agree. We're not currently dealing with that case and assume 
that XSM (or a similar mechanism) will be used in scenarios where this 
level of access is possible.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2019-01-09 17:03 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-01  1:32 [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Christopher Clark
2018-12-01  1:32 ` [PATCH 01/25] xen/evtchn: expose evtchn_bind_ipi_vcpu0_domain for use within Xen Christopher Clark
2018-12-03 16:20   ` Jan Beulich
2018-12-04  9:17     ` Christopher Clark
2018-12-01  1:32 ` [PATCH 02/25] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
2018-12-03 15:51   ` Jan Beulich
2018-12-04  9:12     ` Christopher Clark
2018-12-01  1:32 ` [PATCH 03/25] argo: introduce the argo_message_op hypercall boilerplate Christopher Clark
2018-12-04  9:44   ` Paul Durrant
2018-12-20  5:13     ` Christopher Clark
2018-12-01  1:32 ` [PATCH 04/25] argo: define argo_dprintk for subsystem debugging Christopher Clark
2018-12-03 15:59   ` Jan Beulich
2018-12-01  1:32 ` [PATCH 05/25] argo: Add initial argo_init and argo_destroy Christopher Clark
2018-12-04  9:12   ` Paul Durrant
2018-12-13 13:16   ` Jan Beulich
2018-12-01  1:32 ` [PATCH 06/25] argo: Xen command line parameter 'argo': bool to enable/disable Christopher Clark
2018-12-04  9:18   ` Paul Durrant
2018-12-04 11:35   ` Jan Beulich
2018-12-01  1:32 ` [PATCH 07/25] xen (ARM, x86): add errno-returning functions for copy Christopher Clark
2018-12-04  9:35   ` Paul Durrant
2018-12-12 16:01   ` Roger Pau Monné
2018-12-20  5:16     ` Christopher Clark
2018-12-20  8:45       ` Jan Beulich
2018-12-20 12:57       ` Roger Pau Monné
2018-12-01  1:32 ` [PATCH 08/25] xen: define XEN_GUEST_HANDLE_NULL as null XEN_GUEST_HANDLE Christopher Clark
2018-12-04 11:39   ` Jan Beulich
2018-12-01  1:32 ` [PATCH 09/25] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
2018-12-03 15:42   ` Jan Beulich
2018-12-04  9:10     ` Christopher Clark
2018-12-04 10:04       ` Jan Beulich
2018-12-01  1:32 ` [PATCH 10/25] arm: introduce guest_handle_for_field() Christopher Clark
2018-12-04  9:46   ` Paul Durrant
2018-12-01  1:32 ` [PATCH 11/25] xsm, argo: XSM control for argo register operation, argo_mac bootparam Christopher Clark
2018-12-04  9:52   ` Paul Durrant
2018-12-20  5:19     ` Christopher Clark
2018-12-01  1:32 ` [PATCH 12/25] xsm, argo: XSM control for argo message send operation Christopher Clark
2018-12-04  9:53   ` Paul Durrant
2018-12-01  1:32 ` [PATCH 13/25] argo: implement the register op Christopher Clark
2018-12-02 20:10   ` Julien Grall
2018-12-04  9:08     ` Christopher Clark
2018-12-05 17:20       ` Julien Grall
2018-12-05 22:35         ` Christopher Clark
2018-12-11 13:51           ` Julien Grall
2018-12-04 10:57   ` Paul Durrant
2018-12-12  9:48   ` Jan Beulich
2018-12-20  5:29     ` Christopher Clark
2018-12-20  8:29       ` Jan Beulich
2018-12-21  1:25         ` Christopher Clark
2018-12-21  7:28           ` Jan Beulich
2018-12-21  8:16             ` Christopher Clark
2018-12-21  8:53               ` Jan Beulich
2018-12-21 23:28                 ` Christopher Clark
2018-12-12 16:47   ` Roger Pau Monné
2018-12-20  5:41     ` Christopher Clark
2018-12-20  8:51       ` Jan Beulich
2018-12-20 12:52       ` Roger Pau Monné
2018-12-21 23:05         ` Christopher Clark
2019-01-04  8:57           ` Roger Pau Monné
2019-01-04 13:22             ` Jan Beulich
2019-01-04 15:35               ` Roger Pau Monné
2019-01-04 15:47                 ` Jan Beulich
2019-01-07  9:00                   ` Roger Pau Monné
2019-01-09 16:15                     ` Tamas K Lengyel
2019-01-09 16:23                       ` Razvan Cojocaru
2019-01-09 16:34                       ` Roger Pau Monné
2019-01-09 16:48                         ` Razvan Cojocaru
2019-01-09 16:50                           ` Tamas K Lengyel
2019-01-09 16:59                             ` Roger Pau Monné
2019-01-09 17:03                               ` Fwd: " Roger Pau Monné
2019-01-09 17:03                             ` Razvan Cojocaru
2018-12-01  1:32 ` [PATCH 14/25] argo: implement the unregister op Christopher Clark
2018-12-04 11:10   ` Paul Durrant
2018-12-12  9:51   ` Jan Beulich
2018-12-01  1:32 ` [PATCH 15/25] argo: implement the sendv op Christopher Clark
2018-12-04 11:22   ` Paul Durrant
2018-12-12 11:52   ` Jan Beulich
2018-12-20  5:58     ` Christopher Clark
2018-12-20  8:33       ` Jan Beulich
2019-01-04  8:13         ` Christopher Clark
2019-01-04  8:43           ` Roger Pau Monné
2019-01-04 13:37           ` Jan Beulich
2019-01-07 20:54             ` Christopher Clark
2018-12-01  1:32 ` [PATCH 16/25] argo: implement the notify op Christopher Clark
2018-12-13 14:06   ` Jan Beulich
2018-12-20  6:12     ` Christopher Clark
2018-12-20  8:39       ` Jan Beulich
2018-12-01  1:32 ` [PATCH 17/25] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
2018-12-01  1:32 ` [PATCH 18/25] argo: limit the max number of rings that a domain may register Christopher Clark
2018-12-13 14:08   ` Jan Beulich
2018-12-01  1:32 ` [PATCH 19/25] argo: limit the max number of notify requests in a single operation Christopher Clark
2018-12-01  1:32 ` [PATCH 20/25] argo, xsm: notify: don't describe rings that cannot be sent to Christopher Clark
2018-12-01  1:33 ` [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func Christopher Clark
2018-12-13 14:10   ` Jan Beulich
2018-12-01  1:33 ` [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen Christopher Clark
2018-12-13 14:12   ` Jan Beulich
2018-12-01  1:33 ` [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ Christopher Clark
2018-12-02 19:55   ` Julien Grall
2018-12-04  9:03     ` Christopher Clark
2018-12-04  9:16       ` Paul Durrant
2018-12-12 14:49         ` James
2018-12-11 14:15       ` Julien Grall
2018-12-13 14:16   ` Jan Beulich
2018-12-20  6:20     ` Christopher Clark
2018-12-01  1:33 ` [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume Christopher Clark
2018-12-13 14:26   ` Jan Beulich
2018-12-20  6:25     ` Christopher Clark
2018-12-01  1:33 ` [PATCH 25/25] argo: implement the get_config op to query notification config Christopher Clark
2018-12-13 14:32   ` Jan Beulich
2018-12-03 16:49 ` [PATCH 00/25] Argo: hypervisor-mediated interdomain communication Chris Patterson
2018-12-04  9:00   ` Christopher Clark
2018-12-11 22:13     ` Chris Patterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.