All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication
@ 2019-01-07  7:42 Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
                   ` (14 more replies)
  0 siblings, 15 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, James McKenzie, Eric Chanudet, Roger Pau Monne

Version three of this patch series:

* Teardown of rings and pending notifications is implemented for
  domain destroy, removing need to avoid state for defunct domains.
  Data structures added to track rings that a domain is the single
  partner for and pending notifications about wildcard-sender rings.

* Register and unregister ops take dedicated argument structs
  rather than a handle to the ring struct in ring memory as
  a simpler interface for this upstreaming effort.
  Ring data structure now has fewer member fields.
  Interface may need revision later with development of support
  for communication in L0/L1 nested hypervisor configuration.

* Added constraints to the notify op: limits number of pending
  notifications on a ring to a simple threshold value.
  Validates the space query to ensure within achievable bounds.

* Disallows resize of existing rings via re-registration.
  Could be added later; needs work to handle pending notifications
  where resized ring would make space availability unachievable.

* Reordered series: XSM patches after main implementation.

* Improved hypercall arg validation; using faster __copy ops where ok.

* Guest memory region validation via fixed constant value fields removed.

Christopher Clark (15):
  argo: Introduce the Kconfig option to govern inclusion of Argo
  argo: introduce the argo_op hypercall boilerplate
  argo: define argo_dprintk for subsystem debugging
  argo: init, destroy and soft-reset, with enable command line opt
  errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  xen/arm: introduce guest_handle_for_field()
  argo: implement the register op
  argo: implement the unregister op
  argo: implement the sendv op; evtchn: expose send_guest_global_virq
  argo: implement the notify op
  xsm, argo: XSM control for argo register
  xsm, argo: XSM control for argo message send operation
  xsm, argo: XSM control for any access to argo by a domain
  xsm, argo: notify: don't describe rings that cannot be sent to
  argo: validate hypercall arg structures via compat machinery

 docs/misc/xen-command-line.pandoc     |   26 +
 xen/arch/x86/guest/hypercall_page.S   |    2 +-
 xen/arch/x86/hvm/hypercall.c          |    3 +
 xen/arch/x86/hypercall.c              |    3 +
 xen/arch/x86/pv/hypercall.c           |    3 +
 xen/common/Kconfig                    |   19 +
 xen/common/Makefile                   |    3 +-
 xen/common/argo.c                     | 2214 +++++++++++++++++++++++++++++++++
 xen/common/compat/argo.c              |   61 +
 xen/common/domain.c                   |   20 +
 xen/common/event_channel.c            |    2 +-
 xen/include/Makefile                  |    1 +
 xen/include/asm-arm/guest_access.h    |    5 +
 xen/include/asm-x86/guest_access.h    |    2 +
 xen/include/public/argo.h             |  277 +++++
 xen/include/public/errno.h            |    2 +
 xen/include/public/xen.h              |    4 +-
 xen/include/xen/argo.h                |   23 +
 xen/include/xen/event.h               |    7 +
 xen/include/xen/hypercall.h           |    9 +
 xen/include/xen/sched.h               |    6 +
 xen/include/xlat.lst                  |    8 +
 xen/include/xsm/dummy.h               |   26 +
 xen/include/xsm/xsm.h                 |   31 +
 xen/xsm/dummy.c                       |    6 +
 xen/xsm/flask/hooks.c                 |   41 +-
 xen/xsm/flask/policy/access_vectors   |   16 +
 xen/xsm/flask/policy/security_classes |    1 +
 28 files changed, 2813 insertions(+), 8 deletions(-)
 create mode 100644 xen/common/argo.c
 create mode 100644 xen/common/compat/argo.c
 create mode 100644 xen/include/public/argo.h
 create mode 100644 xen/include/xen/argo.h

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-08 15:46   ` Jan Beulich
  2019-01-07  7:42 ` [PATCH v3 02/15] argo: introduce the argo_op hypercall boilerplate Christopher Clark
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Defines CONFIG_ARGO when enabled. Default: disabled.

When the Kconfig option is enabled, the Argo hypercall implementation
will be included, allowing use of the hypervisor-mediated interdomain
communication mechanism.

Argo is implemented for x86 and ARM hardware platforms.

Availability of the option depends on EXPERT and Argo is currently an
experimental feature.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 #01 feedback, Jan: replace def_bool/prompt with bool
v1 #02 feedback, Jan: default Kconfig off, use EXPERT, fix whitespace

 xen/common/Kconfig | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 37f8505..5e1251e 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -200,6 +200,25 @@ config LATE_HWDOM
 
 	  If unsure, say N.
 
+config ARGO
+	bool "Argo: hypervisor-mediated interdomain communication" if EXPERT = "y"
+	---help---
+	  Enables a hypercall for domains to ask the hypervisor to perform
+	  data transfer of messages between domains.
+
+	  This allows communication channels to be established that do not
+	  require any shared memory between domains; the hypervisor is the
+	  entity that each domain interacts with. The hypervisor is able to
+	  enforce Mandatory Access Control policy over the communication.
+
+	  If XSM_FLASK is enabled, XSM policy can govern which domains may
+	  communicate via the Argo system.
+
+	  This feature does nothing if the "argo" boot parameter is not present.
+	  Argo is disabled at runtime by default.
+
+	  If unsure, say N.
+
 menu "Schedulers"
 	visible if EXPERT = "y"
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 02/15] argo: introduce the argo_op hypercall boilerplate
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Presence is gated upon CONFIG_ARGO.

Registers the hypercall previously reserved for this.
Takes 5 arguments, does nothing and returns -ENOSYS.

Will be avoiding a compat ABI by using fixed-size types in hypercall ops so
HYPERCALL, rather than COMPAT_CALL, is the correct macro for the hypercall
tables.

Even though handles will be used for (up to) two of the arguments to the
hypercall, there will be no need for any XLAT_* translation functions
because the referenced data structures have been constructed to be exactly
the same size and bit pattern on both 32-bit and 64-bit guests, and padded
to be integer multiples of 32 bits in size. This means that the same
copy_to_guest and copy_from_guest logic can be relied upon to perform as
required without any further intervention. Testing communication with 32
and 64 bit guests has confirmed this works as intended.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
v2 Copyright line: add 2019
v2 feedback #3 Jan: drop "message" from argo_message_op
v2 feedback #3 Jan: add Acked-by
v1 feedback #15 Jan: handle upper-halves of hypercall args
v1 feedback #15 Jan: use unsigned where negative values impossible

 xen/arch/x86/guest/hypercall_page.S |  2 +-
 xen/arch/x86/hvm/hypercall.c        |  3 +++
 xen/arch/x86/hypercall.c            |  3 +++
 xen/arch/x86/pv/hypercall.c         |  3 +++
 xen/common/Makefile                 |  1 +
 xen/common/argo.c                   | 28 ++++++++++++++++++++++++++++
 xen/include/public/xen.h            |  2 +-
 xen/include/xen/hypercall.h         |  9 +++++++++
 8 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 xen/common/argo.c

diff --git a/xen/arch/x86/guest/hypercall_page.S b/xen/arch/x86/guest/hypercall_page.S
index fdd2e72..26afabf 100644
--- a/xen/arch/x86/guest/hypercall_page.S
+++ b/xen/arch/x86/guest/hypercall_page.S
@@ -59,7 +59,7 @@ DECLARE_HYPERCALL(sysctl)
 DECLARE_HYPERCALL(domctl)
 DECLARE_HYPERCALL(kexec_op)
 DECLARE_HYPERCALL(tmem_op)
-DECLARE_HYPERCALL(xc_reserved_op)
+DECLARE_HYPERCALL(argo_op)
 DECLARE_HYPERCALL(xenpmu_op)
 
 DECLARE_HYPERCALL(arch_0)
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 19d1263..b4eaac3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -134,6 +134,9 @@ static const hypercall_table_t hvm_hypercall_table[] = {
 #ifdef CONFIG_TMEM
     HYPERCALL(tmem_op),
 #endif
+#ifdef CONFIG_ARGO
+    HYPERCALL(argo_op),
+#endif
     COMPAT_CALL(platform_op),
 #ifdef CONFIG_PV
     COMPAT_CALL(mmuext_op),
diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index 032de8f..93e7860 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -64,6 +64,9 @@ const hypercall_args_t hypercall_args_table[NR_hypercalls] =
     ARGS(domctl, 1),
     ARGS(kexec_op, 2),
     ARGS(tmem_op, 1),
+#ifdef CONFIG_ARGO
+    ARGS(argo_op, 5),
+#endif
     ARGS(xenpmu_op, 2),
 #ifdef CONFIG_HVM
     ARGS(hvm_op, 2),
diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
index 5d11911..ed75053 100644
--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -77,6 +77,9 @@ const hypercall_table_t pv_hypercall_table[] = {
 #ifdef CONFIG_TMEM
     HYPERCALL(tmem_op),
 #endif
+#ifdef CONFIG_ARGO
+    HYPERCALL(argo_op),
+#endif
     HYPERCALL(xenpmu_op),
 #ifdef CONFIG_HVM
     HYPERCALL(hvm_op),
diff --git a/xen/common/Makefile b/xen/common/Makefile
index ffdfb74..8c65c6f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,3 +1,4 @@
+obj-$(CONFIG_ARGO) += argo.o
 obj-y += bitmap.o
 obj-y += bsearch.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
diff --git a/xen/common/argo.c b/xen/common/argo.c
new file mode 100644
index 0000000..d69ad7c
--- /dev/null
+++ b/xen/common/argo.c
@@ -0,0 +1,28 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Derived from v4v, the version 2 of v2v.
+ *
+ * Copyright (c) 2010, Citrix Systems
+ * Copyright (c) 2018-2019 BAE Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/errno.h>
+#include <xen/guest_access.h>
+
+long
+do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
+           XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
+           unsigned long arg4)
+{
+    return -ENOSYS;
+}
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 1a56871..b3f6491 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -118,7 +118,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_domctl               36
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
-#define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_argo_op              39
 #define __HYPERVISOR_xenpmu_op            40
 #define __HYPERVISOR_dm_op                41
 
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index cc99aea..e2f61d6 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -136,6 +136,15 @@ do_tmem_op(
     XEN_GUEST_HANDLE_PARAM(tmem_op_t) uops);
 #endif
 
+#ifdef CONFIG_ARGO
+extern long do_argo_op(
+    unsigned int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg1,
+    XEN_GUEST_HANDLE_PARAM(void) arg2,
+    unsigned long arg3,
+    unsigned long arg4);
+#endif
+
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 02/15] argo: introduce the argo_op hypercall boilerplate Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-08 15:50   ` Jan Beulich
  2019-01-10  9:28   ` Roger Pau Monné
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

A convenience for working on development of the argo subsystem:
setting a #define variable enables additional debug messages.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 #03 feedback, Jan: fix ifdef/define confusion error
v1 #04 feedback, Jan: fix dprintk implementation

 xen/common/argo.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index d69ad7c..6f782f7 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -19,6 +19,15 @@
 #include <xen/errno.h>
 #include <xen/guest_access.h>
 
+/* Change this to #define ARGO_DEBUG here to enable more debug messages */
+#undef ARGO_DEBUG
+
+#ifdef ARGO_DEBUG
+#define argo_dprintk(format, args...) printk("argo: " format, ## args )
+#else
+#define argo_dprintk(format, ... ) ((void)0)
+#endif
+
 long
 do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
            XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (2 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-08 22:08   ` Ross Philipson
                     ` (6 more replies)
  2019-01-07  7:42 ` [PATCH v3 05/15] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
                   ` (10 subsequent siblings)
  14 siblings, 7 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Initialises basic data structures and performs teardown of argo state
for domain shutdown.

Inclusion of the Argo implementation is dependent on CONFIG_ARGO.

Introduces a new Xen command line parameter 'argo': bool to enable/disable
the argo hypercall. Defaults to disabled.

New headers:
  public/argo.h: with definions of addresses and ring structure, including
  indexes for atomic update for communication between domain and hypervisor.

  xen/argo.h: to expose the hooks for integration into domain lifecycle:
    argo_init: per-domain init of argo data structures for domain_create.
    argo_destroy: teardown for domain_destroy and the error exit
                  path of domain_create.
    argo_soft_reset: reset of domain state for domain_soft_reset.

Adds two new fields to struct domain:
    rwlock_t argo_lock;
    struct argo_domain *argo;

In accordance with recent work on _domain_destroy, argo_destroy is
idempotent. It will tear down: all rings registered by this domain, all
rings where this domain is the single sender (ie. specified partner,
non-wildcard rings), and all pending notifications where this domain is
awaiting signal about available space in the rings of other domains.

A count will be maintained of the number of rings that a domain has
registered in order to limit it below the fixed maximum limit defined here.

The software license on the public header is the BSD license, standard
procedure for the public Xen headers. The public header was originally
posted under a GPL license at: [1]:
https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html

The following ACK by Lars Kurth is to confirm that only people being
employees of Citrix contributed to the header files in the series posted at
[1] and that thus the copyright of the files in question is fully owned by
Citrix. The ACK also confirms that Citrix is happy for the header files to
be published under a BSD license in this series (which is based on [1]).

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
---
v2 rewrite locking explanation comment
v2 header copyright line now includes 2019
v2 self: use ring_info backpointer in pending_ent to maintain npending
v2 self: rename all_rings_remove_info to domain_rings_remove_all
v2 feedback Jan: drop cookie, implement teardown
v2 self: add npending to track number of pending entries per ring
v2 self: amend comment on locking; drop section comments
v2 cookie_eq: test low bits first and use likely on high bits
v2 self: OVERHAUL
v2 self: s/argo_pending_ent/pending_ent/g
v2 self: drop pending_remove_ent, inline at single call site
v1 feedback Roger, Jan: drop argo prefix on static functions
v2 #4 Lars: add Acked-by and details to commit message.
v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
v2 bugfix: xsm use in soft-reset prior to introduction
v2 feedback #9 Jan: drop 'message' from do_argo_message_op
v1 #5 feedback Paul: init/destroy unsigned, brackets and whitespace fixes
v1 #5 feedback Paul: Use mfn_eq for comparing mfns.
v1 #5 feedback Paul: init/destroy : use currd
v1 #6 (#5) feedback Jan: init/destroy: s/ENOSYS/EOPNOTSUPP/
v1 #6 feedback Paul: Folded patch 6 into patch 5.
v1 #6 feedback Jan: drop opt_argo_enabled initializer
v1 $6 feedback Jan: s/ENOSYS/EOPNOTSUPP/g and drop useless dprintk
v1. #5 feedback Paul: change the license on public header to BSD
- ack from Lars at Citrix.
v1. self, Jan: drop unnecessary xen include from sched.h
v1. self, Jan: drop inclusion of public argo.h in private one
v1. self, Jan: add include of public argo.h to argo.c
v1. self, Jan: drop fwd decl of argo_domain in priv header
v1. Paul/self/Jan: add data structures to xlat.lst and compat/argo.h to Makefile
v1. self: removed allocation of event channel since switching to VIRQ
v1. self: drop types.h include from private argo.h
v1: reorder public argo include position
v1: #13 feedback Jan: public namespace: prefix with xen
v1: self: rename pending ent "id" to "domain_id"
v1: self: add domain_cookie to ent struct
v1. #15 feedback Jan: make cmd unsigned
v1. #15 feedback Jan: make i loop variable unsigned
v1: self: adjust dprintks in init, destroy
v1: #18 feedback Jan: meld max ring count limit
v1: self: use type not struct in public defn, affects compat gen header
v1: feedback #15 Jan: handle upper-halves of hypercall args
v1: add comment explaining the 'magic' field
v1: self + Jan feedback: implement soft reset
v1: feedback #13 Roger: use ASSERT_UNREACHABLE

 docs/misc/xen-command-line.pandoc |  11 +
 xen/common/argo.c                 | 461 +++++++++++++++++++++++++++++++++++++-
 xen/common/domain.c               |  20 ++
 xen/include/Makefile              |   1 +
 xen/include/public/argo.h         |  59 +++++
 xen/include/xen/argo.h            |  23 ++
 xen/include/xen/sched.h           |   6 +
 xen/include/xlat.lst              |   2 +
 8 files changed, 582 insertions(+), 1 deletion(-)
 create mode 100644 xen/include/public/argo.h
 create mode 100644 xen/include/xen/argo.h

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a755a67..aea13eb 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
 in combination with cpuidle.  This option is only expected to be useful for
 developers wishing Xen to fall back to older timing methods on newer hardware.
 
+### argo
+> `= <boolean>`
+
+> Default: `false`
+
+Enable the Argo hypervisor-mediated interdomain communication mechanism.
+
+This allows domains access to the Argo hypercall, which supports registration
+of memory rings with the hypervisor to receive messages, sending messages to
+other domains by hypercall and querying the ring status of other domains.
+
 ### asid (x86)
 > `= <boolean>`
 
diff --git a/xen/common/argo.c b/xen/common/argo.c
index 6f782f7..86195d3 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -17,7 +17,177 @@
  */
 
 #include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/argo.h>
+#include <xen/event.h>
+#include <xen/domain_page.h>
 #include <xen/guest_access.h>
+#include <xen/time.h>
+#include <public/argo.h>
+
+DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
+
+/* Xen command line option to enable argo */
+static bool __read_mostly opt_argo_enabled;
+boolean_param("argo", opt_argo_enabled);
+
+typedef struct argo_ring_id
+{
+    uint32_t port;
+    domid_t partner_id;
+    domid_t domain_id;
+} argo_ring_id;
+
+/* Data about a domain's own ring that it has registered */
+struct argo_ring_info
+{
+    /* next node in the hash, protected by L2 */
+    struct hlist_node node;
+    /* this ring's id, protected by L2 */
+    struct argo_ring_id id;
+    /* L3 */
+    spinlock_t lock;
+    /* length of the ring, protected by L3 */
+    uint32_t len;
+    /* number of pages in the ring, protected by L3 */
+    uint32_t npage;
+    /* number of pages translated into mfns, protected by L3 */
+    uint32_t nmfns;
+    /* cached tx pointer location, protected by L3 */
+    uint32_t tx_ptr;
+    /* mapped ring pages protected by L3 */
+    uint8_t **mfn_mapping;
+    /* list of mfns of guest ring, protected by L3 */
+    mfn_t *mfns;
+    /* list of struct pending_ent for this ring, protected by L3 */
+    struct hlist_head pending;
+    /* number of pending entries queued for this ring, protected by L3 */
+    uint32_t npending;
+};
+
+/* Data about a single-sender ring, held by the sender (partner) domain */
+struct argo_send_info
+{
+    /* next node in the hash, protected by Lsend */
+    struct hlist_node node;
+    /* this ring's id, protected by Lsend */
+    struct argo_ring_id id;
+};
+
+/* A space-available notification that is awaiting sufficient space */
+struct pending_ent
+{
+    /* List node within argo_ring_info's pending list */
+    struct hlist_node node;
+    /*
+     * List node within argo_domain's wildcard_pend_list. Only used if the
+     * ring is one with a wildcard partner (ie. that any domain may send to)
+     * to enable cancelling signals on wildcard rings on domain destroy.
+     */
+    struct hlist_node wildcard_node;
+    /*
+     * Pointer to the ring_info that this ent pertains to. Used to ensure that
+     * ring_info->npending is decremented when ents for wildcard rings are
+     * cancelled for domain destroy.
+     * Caution: Must hold the correct locks before accessing ring_info via this.
+     */
+    struct argo_ring_info *ring_info;
+    /* domain to be notified when space is available */
+    domid_t domain_id;
+    uint16_t pad;
+    /* minimum ring space available that this signal is waiting upon */
+    uint32_t len;
+};
+
+/*
+ * The value of the argo element in a struct domain is
+ * protected by the global lock argo_lock: L1
+ */
+#define ARGO_HTABLE_SIZE 32
+struct argo_domain
+{
+    /* L2 */
+    rwlock_t lock;
+    /*
+     * Hash table of argo_ring_info about rings this domain has registered.
+     * Protected by L2.
+     */
+    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
+    /* Counter of rings registered by this domain. Protected by L2. */
+    uint32_t ring_count;
+
+    /* Lsend */
+    spinlock_t send_lock;
+    /*
+     * Hash table of argo_send_info about rings other domains have registered
+     * for this domain to send to. Single partner, non-wildcard rings.
+     * Protected by Lsend.
+     */
+    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
+
+    /* Lwildcard */
+    spinlock_t wildcard_lock;
+    /*
+     * List of pending space-available signals for this domain about wildcard
+     * rings registered by other domains. Protected by Lwildcard.
+     */
+    struct hlist_head wildcard_pend_list;
+};
+
+/*
+ * Locking is organized as follows:
+ *
+ * Terminology: R(<lock>) means taking a read lock on the specified lock;
+ *              W(<lock>) means taking a write lock on it.
+ *
+ * L1 : The global lock: argo_lock
+ * Protects the argo elements of all struct domain *d in the system.
+ * It does not protect any of the elements of d->argo, only their
+ * addresses.
+ *
+ * By extension since the destruction of a domain with a non-NULL
+ * d->argo will need to free the d->argo pointer, holding W(L1)
+ * guarantees that no domains pointers that argo is interested in
+ * become invalid whilst this lock is held.
+ */
+
+static DEFINE_RWLOCK(argo_lock); /* L1 */
+
+/*
+ * L2 : The per-domain ring hash lock: d->argo->lock
+ * Holding a read lock on L2 protects the ring hash table and
+ * the elements in the hash_table d->argo->ring_hash, and
+ * the node and id fields in struct argo_ring_info in the
+ * hash table.
+ * Holding a write lock on L2 protects all of the elements of
+ * struct argo_ring_info.
+ *
+ * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
+ *
+ * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
+ * Protects all the fields within the argo_ring_info, aside from the ones that
+ * L2 already protects: node, id, lock.
+ *
+ * To aquire L3 you must already have R(L2). W(L2) implies L3.
+ *
+ * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
+ * Protects the per-domain send hash table : d->argo->send_hash
+ * and the elements in the hash table, and the node and id fields
+ * in struct argo_send_info in the hash table.
+ *
+ * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
+ * Do not attempt to acquire a L2 on any domain after taking and while
+ * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
+ *
+ * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
+ * Protects the per-domain list of outstanding signals for space availability
+ * on wildcard rings.
+ *
+ * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
+ * No other locks are acquired after obtaining Lwildcard.
+ */
 
 /* Change this to #define ARGO_DEBUG here to enable more debug messages */
 #undef ARGO_DEBUG
@@ -28,10 +198,299 @@
 #define argo_dprintk(format, ... ) ((void)0)
 #endif
 
+static void
+ring_unmap(struct argo_ring_info *ring_info)
+{
+    unsigned int i;
+
+    if ( !ring_info->mfn_mapping )
+        return;
+
+    for ( i = 0; i < ring_info->nmfns; i++ )
+    {
+        if ( !ring_info->mfn_mapping[i] )
+            continue;
+        if ( ring_info->mfns )
+            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
+                         mfn_x(ring_info->mfns[i]),
+                         ring_info->mfn_mapping[i]);
+        unmap_domain_page_global(ring_info->mfn_mapping[i]);
+        ring_info->mfn_mapping[i] = NULL;
+    }
+}
+
+static void
+wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
+{
+    struct domain *d = get_domain_by_id(domain_id);
+    if ( !d )
+        return;
+
+    if ( d->argo )
+    {
+        spin_lock(&d->argo->wildcard_lock);
+        hlist_del(&ent->wildcard_node);
+        spin_unlock(&d->argo->wildcard_lock);
+    }
+    put_domain(d);
+}
+
+static void
+pending_remove_all(struct argo_ring_info *ring_info)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+            wildcard_pending_list_remove(ent->domain_id, ent);
+        hlist_del(&ent->node);
+        xfree(ent);
+    }
+    ring_info->npending = 0;
+}
+
+static void
+wildcard_rings_pending_remove(struct domain *d)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(rw_is_write_locked(&argo_lock));
+
+    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
+                              node)
+    {
+        hlist_del(&ent->node);
+        ent->ring_info->npending--;
+        hlist_del(&ent->wildcard_node);
+        xfree(ent);
+    }
+}
+
+static void
+ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
+{
+    unsigned int i;
+
+    ASSERT(rw_is_write_locked(&d->argo->lock) ||
+           rw_is_write_locked(&argo_lock));
+
+    if ( !ring_info->mfns )
+        return;
+
+    if ( !ring_info->mfn_mapping )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    ring_unmap(ring_info);
+
+    for ( i = 0; i < ring_info->nmfns; i++ )
+        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
+            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
+
+    xfree(ring_info->mfns);
+    ring_info->mfns = NULL;
+    ring_info->npage = 0;
+    xfree(ring_info->mfn_mapping);
+    ring_info->mfn_mapping = NULL;
+    ring_info->nmfns = 0;
+}
+
+static void
+ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
+{
+    ASSERT(rw_is_write_locked(&d->argo->lock) ||
+           rw_is_write_locked(&argo_lock));
+
+    pending_remove_all(ring_info);
+    hlist_del(&ring_info->node);
+    ring_remove_mfns(d, ring_info);
+    xfree(ring_info);
+}
+
+static void
+domain_rings_remove_all(struct domain *d)
+{
+    unsigned int i;
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node, *next;
+        struct argo_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node, next,
+                                  &d->argo->ring_hash[i], node)
+            ring_remove_info(d, ring_info);
+    }
+    d->argo->ring_count = 0;
+}
+
+/*
+ * Tear down all rings of other domains where src_d domain is the partner.
+ * (ie. it is the single domain that can send to those rings.)
+ * This will also cancel any pending notifications about those rings.
+ */
+static void
+partner_rings_remove(struct domain *src_d)
+{
+    unsigned int i;
+
+    ASSERT(rw_is_write_locked(&argo_lock));
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
+    {
+        struct hlist_node *node, *next;
+        struct argo_send_info *send_info;
+
+        hlist_for_each_entry_safe(send_info, node, next,
+                                  &src_d->argo->send_hash[i], node)
+        {
+            struct argo_ring_info *ring_info;
+            struct domain *dst_d;
+
+            dst_d = get_domain_by_id(send_info->id.domain_id);
+            if ( dst_d )
+            {
+                ring_info = ring_find_info(dst_d, &send_info->id);
+                if ( ring_info )
+                {
+                    ring_remove_info(dst_d, ring_info);
+                    dst_d->argo->ring_count--;
+                }
+
+                put_domain(dst_d);
+            }
+
+            hlist_del(&send_info->node);
+            xfree(send_info);
+        }
+    }
+}
+
 long
 do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
            XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
            unsigned long arg4)
 {
-    return -ENOSYS;
+    struct domain *currd = current->domain;
+    long rc = -EFAULT;
+
+    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
+                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
+
+    if ( unlikely(!opt_argo_enabled) )
+    {
+        rc = -EOPNOTSUPP;
+        return rc;
+    }
+
+    domain_lock(currd);
+
+    switch (cmd)
+    {
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    domain_unlock(currd);
+
+    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
+
+    return rc;
+}
+
+static void
+argo_domain_init(struct argo_domain *argo)
+{
+    unsigned int i;
+
+    rwlock_init(&argo->lock);
+    spin_lock_init(&argo->send_lock);
+    spin_lock_init(&argo->wildcard_lock);
+    argo->ring_count = 0;
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
+    {
+        INIT_HLIST_HEAD(&argo->ring_hash[i]);
+        INIT_HLIST_HEAD(&argo->send_hash[i]);
+    }
+    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
+}
+
+int
+argo_init(struct domain *d)
+{
+    struct argo_domain *argo;
+
+    if ( !opt_argo_enabled )
+    {
+        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
+        return 0;
+    }
+
+    argo_dprintk("init: domid: %d\n", d->domain_id);
+
+    argo = xmalloc(struct argo_domain);
+    if ( !argo )
+        return -ENOMEM;
+
+    write_lock(&argo_lock);
+
+    argo_domain_init(argo);
+
+    d->argo = argo;
+
+    write_unlock(&argo_lock);
+
+    return 0;
+}
+
+void
+argo_destroy(struct domain *d)
+{
+    BUG_ON(!d->is_dying);
+
+    write_lock(&argo_lock);
+
+    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
+
+    if ( d->argo )
+    {
+        domain_rings_remove_all(d);
+        partner_rings_remove(d);
+        wildcard_rings_pending_remove(d);
+        xfree(d->argo);
+        d->argo = NULL;
+    }
+    write_unlock(&argo_lock);
+}
+
+void
+argo_soft_reset(struct domain *d)
+{
+    write_lock(&argo_lock);
+
+    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
+
+    if ( d->argo )
+    {
+        domain_rings_remove_all(d);
+        partner_rings_remove(d);
+        wildcard_rings_pending_remove(d);
+
+        if ( !opt_argo_enabled )
+        {
+            xfree(d->argo);
+            d->argo = NULL;
+        }
+        else
+            argo_domain_init(d->argo);
+    }
+
+    write_unlock(&argo_lock);
 }
diff --git a/xen/common/domain.c b/xen/common/domain.c
index c623dae..9596840 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -32,6 +32,7 @@
 #include <xen/grant_table.h>
 #include <xen/xenoprof.h>
 #include <xen/irq.h>
+#include <xen/argo.h>
 #include <asm/debugger.h>
 #include <asm/p2m.h>
 #include <asm/processor.h>
@@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
 
     xfree(d->pbuf);
 
+#ifdef CONFIG_ARGO
+    argo_destroy(d);
+#endif
+
     rangeset_domain_destroy(d);
 
     free_cpumask_var(d->dirty_cpumask);
@@ -376,6 +381,9 @@ struct domain *domain_create(domid_t domid,
     spin_lock_init(&d->hypercall_deadlock_mutex);
     INIT_PAGE_LIST_HEAD(&d->page_list);
     INIT_PAGE_LIST_HEAD(&d->xenpage_list);
+#ifdef CONFIG_ARGO
+    rwlock_init(&d->argo_lock);
+#endif
 
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
@@ -445,6 +453,11 @@ struct domain *domain_create(domid_t domid,
             goto fail;
         init_status |= INIT_gnttab;
 
+#ifdef CONFIG_ARGO
+        if ( (err = argo_init(d)) != 0 )
+            goto fail;
+#endif
+
         err = -ENOMEM;
 
         d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
@@ -717,6 +730,9 @@ int domain_kill(struct domain *d)
         if ( d->is_dying != DOMDYING_alive )
             return domain_kill(d);
         d->is_dying = DOMDYING_dying;
+#ifdef CONFIG_ARGO
+        argo_destroy(d);
+#endif
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem_client);
@@ -1175,6 +1191,10 @@ int domain_soft_reset(struct domain *d)
 
     grant_table_warn_active_grants(d);
 
+#ifdef CONFIG_ARGO
+    argo_soft_reset(d);
+#endif
+
     for_each_vcpu ( d, v )
     {
         set_xen_guest_handle(runstate_guest(v), NULL);
diff --git a/xen/include/Makefile b/xen/include/Makefile
index f7895e4..3d14532 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -5,6 +5,7 @@ ifneq ($(CONFIG_COMPAT),)
 compat-arch-$(CONFIG_X86) := x86_32
 
 headers-y := \
+    compat/argo.h \
     compat/callback.h \
     compat/elfnote.h \
     compat/event_channel.h \
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
new file mode 100644
index 0000000..4818684
--- /dev/null
+++ b/xen/include/public/argo.h
@@ -0,0 +1,59 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Derived from v4v, the version 2 of v2v.
+ *
+ * Copyright (c) 2010, Citrix Systems
+ * Copyright (c) 2018-2019, BAE Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __XEN_PUBLIC_ARGO_H__
+#define __XEN_PUBLIC_ARGO_H__
+
+#include "xen.h"
+
+typedef struct xen_argo_addr
+{
+    uint32_t port;
+    domid_t domain_id;
+    uint16_t pad;
+} xen_argo_addr_t;
+
+typedef struct xen_argo_ring
+{
+    /* Guests should use atomic operations to access rx_ptr */
+    uint32_t rx_ptr;
+    /* Guests should use atomic operations to access tx_ptr */
+    uint32_t tx_ptr;
+    /*
+     * Header space reserved for later use. Align the start of the ring to a
+     * multiple of the message slot size.
+     */
+    uint8_t reserved[56];
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    uint8_t ring[];
+#elif defined(__GNUC__)
+    uint8_t ring[0];
+#endif
+} xen_argo_ring_t;
+
+#endif
diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
new file mode 100644
index 0000000..29d32a9
--- /dev/null
+++ b/xen/include/xen/argo.h
@@ -0,0 +1,23 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Copyright (c) 2018, BAE Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __XEN_ARGO_H__
+#define __XEN_ARGO_H__
+
+int argo_init(struct domain *d);
+void argo_destroy(struct domain *d);
+void argo_soft_reset(struct domain *d);
+
+#endif
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 4956a77..20418e7 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -490,6 +490,12 @@ struct domain
         unsigned int guest_request_enabled       : 1;
         unsigned int guest_request_sync          : 1;
     } monitor;
+
+#ifdef CONFIG_ARGO
+    /* Argo interdomain communication support */
+    rwlock_t argo_lock;
+    struct argo_domain *argo;
+#endif
 };
 
 /* Protect updates/reads (resp.) of domain_list and domain_hash. */
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 5273320..9f616e4 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -148,3 +148,5 @@
 ?	flask_setenforce		xsm/flask_op.h
 !	flask_sid_context		xsm/flask_op.h
 ?	flask_transition		xsm/flask_op.h
+?	argo_addr			argo.h
+?	argo_ring			argo.h
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 05/15] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (3 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field() Christopher Clark
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

EMSGSIZE: Argo's sendv operation will return EMSGSIZE when an excess amount
of data, across all iovs, has been supplied, exceeding either the statically
configured maximum size of a transmittable message, or the (variable) size
of the ring registered by the destination domain.

ECONNREFUSED: Argo's register operation will return ECONNREFUSED if a ring
is being registered to communicate with a specific remote domain that does
exist but is not argo-enabled.

These codes are described by POSIX here:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
    EMSGSIZE     : "Message too large"
    ECONNREFUSED : "Connection refused".

The numeric values assigned to each are taken from Linux, as is the case
for the existing error codes.
    EMSGSIZE     : 90
    ECONNREFUSED : 111

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/include/public/errno.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/public/errno.h b/xen/include/public/errno.h
index 305c112..e1d02fc 100644
--- a/xen/include/public/errno.h
+++ b/xen/include/public/errno.h
@@ -102,6 +102,7 @@ XEN_ERRNO(EILSEQ,	84)	/* Illegal byte sequence */
 XEN_ERRNO(ERESTART,	85)	/* Interrupted system call should be restarted */
 #endif
 XEN_ERRNO(ENOTSOCK,	88)	/* Socket operation on non-socket */
+XEN_ERRNO(EMSGSIZE,	90)	/* Message too large. */
 XEN_ERRNO(EOPNOTSUPP,	95)	/* Operation not supported on transport endpoint */
 XEN_ERRNO(EADDRINUSE,	98)	/* Address already in use */
 XEN_ERRNO(EADDRNOTAVAIL, 99)	/* Cannot assign requested address */
@@ -109,6 +110,7 @@ XEN_ERRNO(ENOBUFS,	105)	/* No buffer space available */
 XEN_ERRNO(EISCONN,	106)	/* Transport endpoint is already connected */
 XEN_ERRNO(ENOTCONN,	107)	/* Transport endpoint is not connected */
 XEN_ERRNO(ETIMEDOUT,	110)	/* Connection timed out */
+XEN_ERRNO(ECONNREFUSED,	111)	/* Connection refused */
 
 #undef XEN_ERRNO
 #endif /* XEN_ERRNO */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field()
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (4 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 05/15] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-08 22:03   ` Stefano Stabellini
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Ross Philipson, Jason Andryuk, Daniel Smith,
	Rich Persaud, James McKenzie, Julien Grall, Paul Durrant,
	Jan Beulich, Eric Chanudet, Roger Pau Monne

ARM port of c/s bb544585: "introduce guest_handle_for_field()"

This helper turns a field of a GUEST_HANDLE into a GUEST_HANDLE.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/include/asm-arm/guest_access.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
index 224d2a0..8997a1c 100644
--- a/xen/include/asm-arm/guest_access.h
+++ b/xen/include/asm-arm/guest_access.h
@@ -63,6 +63,9 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
     _y;                                                     \
 })
 
+#define guest_handle_for_field(hnd, type, fld)          \
+    ((XEN_GUEST_HANDLE(type)) { &(hnd).p->fld })
+
 #define guest_handle_from_ptr(ptr, type)        \
     ((XEN_GUEST_HANDLE_PARAM(type)) { (type *)ptr })
 #define const_guest_handle_from_ptr(ptr, type)  \
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 07/15] argo: implement the register op
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (5 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field() Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-09 15:55   ` Wei Liu
                     ` (4 more replies)
  2019-01-07  7:42 ` [PATCH v3 08/15] argo: implement the unregister op Christopher Clark
                   ` (7 subsequent siblings)
  14 siblings, 5 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

The register op is used by a domain to register a region of memory for
receiving messages from either a specified other domain, or, if specifying a
wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.

Wildcard any-sender rings are default disabled and registration will be
refused with EPERM unless they have been specifically enabled with the
argo-mac boot option introduced here. The reason why the default for
wildcard rings is 'deny' is that there is currently no means to protect the
ring from DoS by a noisy domain spamming the ring, affecting other domains
ability to send to it. This will be addressed with XSM policy controls in
subsequent work.

Since denying access to any-sender rings is a significant functional
constraint, a new bootparam is provided to enable overriding this:
 "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
Even though this is a boolean variable, use these descriptive strings in
order to make it obvious to an administrator that this has potential
security impact.

The p2m type of the memory supplied by the guest for the ring must be
p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
is registered.

xen_argo_page_descr_t type is introduced as a page descriptor, to convey
both the physical address of the start of the page and its granularity. The
smallest granularity page is assumed to be 4096 bytes and the lower twelve
bits of the type are used to indicate the size of page of memory supplied.
The implementation of the hypercall op currently only supports 4K pages.

array_index_nospec is used to guard the result of the ring id hash function.
This is out of an abundance of caution, since this is a very basic hash
function and it operates upon values supplied by the guest just before
being used as an array index.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 self: disallow ring resize via reregister
v2 feedback Jan: drop cookie, implement teardown
v2 feedback Jan: drop message from argo_message_op
v2 self: move hash_index function below locking comment
v2 self: OVERHAUL
v2 self/Jan: remove use of magic verification field and tidy up
v2 self: merge max and min ring size check clauses
v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
v2 feedback #9, Jan: use the argo-mac bootparam at point of introduction
v2 feedback #9, Jan: rename boot opt variable to comply with convention
v2 feedback #9, Jan: rename the argo_mac bootparam to argo-mac
v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
v1 feedback Roger, Jan: drop argo prefix on static functions
v1 feedback Roger: s/pfn/gfn/ and retire always-64-bit type
v2. feedback Jan: document the argo-mac boot opt
v2. feedback Jan: simplify re-register, drop mappings
v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops

v1 #13 feedback, Jan: register op : s/ECONNREFUSED/ESRCH/
v1 #5 (#13) feedback Paul: register op: use currd in do_message_op
v1 #13 feedback, Paul: register op: use mfn_eq comparator
v1 #5 (#13) feedback Paul: register op: use currd in argo_register_ring
v1 #13 feedback Paul: register op: whitespace, unsigned, bounds check
v1 #13 feedback Paul: use of hex in limit constant definition
v1 #13 feedback Paul, register op: set nmfns on loop termination
v1 #13 feedback Paul: register op: do/while -> gotos, reindent
v1 argo_ring_map_page: drop uint32_t for unsigned int
v1. #13 feedback Julien: use page descriptors instead of gpfns.
   - adds ABI support for pages with different granularity.
v1 feedback #13, Paul: adjust log level of message
v1 feedback #13, Paul: use gprintk for guest-triggered warning
v1 feedback #13, Paul: gprintk and XENLOG_DEBUG for ring registration
v1 feedback #13, Paul: use gprintk for errs in argo_ring_map_page
v1 feedback #13, Paul: use ENOMEM if global mapping fails
v1 feedback Paul: overflow check before shift
v1: add define for copy_field_to_guest_errno
v1: fix gprintk use for ARM as its defn dislikes split format strings
v1: use copy_field_to_guest_errno
v1 feedback #13, Jan: argo_hash_fn: no inline, rename, change type
v1 feedback #13, Paul, Jan: EFAULT -> ENOMEM in argo_ring_map_page
v1 feedback #13, Jan: rename page var in argo_ring_map_page
v1 feedback #13, Jan: switch uint8_t* to void* and drop cast
v1 feedback #13, Jan: switch memory barrier to smp_wmb
v1 feedback #13, Jan: make 'ring' comment comply with single-line style
v1 feedback #13, Jan: use xzalloc_array, drop loop NULL init
v1 feedback #13, Jan: init bool with false rather than 0
v1 feedback #13 Jan: use __copy; define and use __copy_field_to_guest_errno
v1 feedback #13, Jan: use xzalloc, drop individual init zeroes
v1 feedback #13, Jan: prefix public namespace with xen
v1 feedback #13, Jan: blank line after op case in do_argo_message_op
v1 self: reflow comment in argo_ring_map_page to within 80 char len
v1 feedback #13, Roger: use true not 1 in assign to update_tx_ptr bool
v1 feedback #21, Jan: fold in the array_index_nospec hash function guards
v1 feedback #18, Jan: fold the max ring count limit into the series
v1 self: use unsigned long type for XEN_ARGO_REGISTER_FLAG_MASK
v1: feedback #15 Jan: handle upper-halves of hypercall args
v1. feedback #13 Jan: add comment re: page alignment
v1. self: confirm ring magic presence in supplied page array
v1. feedback #13 Jan: add comment re: minimum ring size
v1. feedback #13 Roger: use ASSERT_UNREACHABLE
v1. feedback Roger: add comment to hash function

 docs/misc/xen-command-line.pandoc  |  15 +
 xen/common/argo.c                  | 566 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/guest_access.h |   2 +
 xen/include/asm-x86/guest_access.h |   2 +
 xen/include/public/argo.h          |  72 +++++
 xen/include/xlat.lst               |   1 +
 6 files changed, 658 insertions(+)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index aea13eb..68d4415 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which supports registration
 of memory rings with the hypervisor to receive messages, sending messages to
 other domains by hypercall and querying the ring status of other domains.
 
+### argo-mac
+> `= permissive | enforcing`
+
+> Default: `enforcing`
+
+Constrain the access control applied to the Argo communication mechanism.
+
+When `enforcing`, domains may not register rings that have wildcard specified
+for the sender which would allow messages to be sent to the ring by any domain.
+This is to protect rings and the services that utilize them against DoS by a
+malicious or buggy domain spamming the ring.
+
+When the boot option is set to `permissive`, this constraint is relaxed and
+wildcard any-sender rings are allowed to be registered.
+
 ### asid (x86)
 > `= <boolean>`
 
diff --git a/xen/common/argo.c b/xen/common/argo.c
index 86195d3..11988e7 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -23,16 +23,41 @@
 #include <xen/event.h>
 #include <xen/domain_page.h>
 #include <xen/guest_access.h>
+#include <xen/lib.h>
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <public/argo.h>
 
+#define MAX_RINGS_PER_DOMAIN            128U
+
+/* All messages on the ring are padded to a multiple of the slot size. */
+#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
+
 DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
 
 /* Xen command line option to enable argo */
 static bool __read_mostly opt_argo_enabled;
 boolean_param("argo", opt_argo_enabled);
 
+/* Xen command line option for conservative or relaxed access control */
+bool __read_mostly opt_argo_mac_enforcing = true;
+
+static int __init parse_opt_argo_mac(const char *s)
+{
+    if ( !strcmp(s, "enforcing") )
+        opt_argo_mac_enforcing = true;
+    else if ( !strcmp(s, "permissive") )
+        opt_argo_mac_enforcing = false;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+custom_param("argo-mac", parse_opt_argo_mac);
+
 typedef struct argo_ring_id
 {
     uint32_t port;
@@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
 #define argo_dprintk(format, ... ) ((void)0)
 #endif
 
+/*
+ * This hash function is used to distribute rings within the per-domain
+ * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
+ * will provide a struct if a match is found with a 'argo_ring_id' key:
+ * ie. the key is a (domain id, port, partner domain id) tuple.
+ * Since port number varies the most in expected use, and the Linux driver
+ * allocates at both the high and low ends, incorporate high and low bits to
+ * help with distribution.
+ * Apply array_index_nospec as a defensive measure since this operates
+ * on user-supplied input and the array size that it indexes into is known.
+ */
+static unsigned int
+hash_index(const struct argo_ring_id *id)
+{
+    unsigned int hash;
+
+    hash = (uint16_t)(id->port >> 16);
+    hash ^= (uint16_t)id->port;
+    hash ^= id->domain_id;
+    hash ^= id->partner_id;
+    hash &= (ARGO_HTABLE_SIZE - 1);
+
+    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
+}
+
 static void
 ring_unmap(struct argo_ring_info *ring_info)
 {
@@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
     }
 }
 
+static int
+ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void **out_ptr)
+{
+    if ( i >= ring_info->nmfns )
+    {
+        gprintk(XENLOG_ERR,
+               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of %u\n",
+                ring_info->id.domain_id, ring_info->id.port,
+                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
+        return -ENOMEM;
+    }
+
+    if ( !ring_info->mfns || !ring_info->mfn_mapping)
+    {
+        ASSERT_UNREACHABLE();
+        ring_info->len = 0;
+        return -ENOMEM;
+    }
+
+    if ( !ring_info->mfn_mapping[i] )
+    {
+        /*
+         * TODO:
+         * The first page of the ring contains the ring indices, so both read
+         * and write access to the page is required by the hypervisor, but
+         * read-access is not needed for this mapping for the remainder of the
+         * ring.
+         * Since this mapping will remain resident in Xen's address space for
+         * the lifetime of the ring, and following the principle of least
+         * privilege, it could be preferable to:
+         *  # add a XSM check to determine what policy is wanted here
+         *  # depending on the XSM query, optionally create this mapping as
+         *    _write-only_ on platforms that can support it.
+         *    (eg. Intel EPT/AMD NPT).
+         */
+        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]);
+
+        if ( !ring_info->mfn_mapping[i] )
+        {
+            gprintk(XENLOG_ERR,
+                "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of %u\n",
+                    ring_info->id.domain_id, ring_info->id.port,
+                    ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
+            return -ENOMEM;
+        }
+        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
+                     mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
+    }
+
+    if ( out_ptr )
+        *out_ptr = ring_info->mfn_mapping[i];
+
+    return 0;
+}
+
+static void
+update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
+{
+    void *dst;
+    uint32_t *p;
+
+    ASSERT(ring_info->mfn_mapping[0]);
+
+    ring_info->tx_ptr = tx_ptr;
+
+    dst = ring_info->mfn_mapping[0];
+    p = dst + offsetof(xen_argo_ring_t, tx_ptr);
+
+    write_atomic(p, tx_ptr);
+    smp_wmb();
+}
+
 static void
 wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
 {
@@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
     }
 }
 
+static int
+find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
+{
+    p2m_type_t p2mt;
+    int ret = 0;
+
+#ifdef CONFIG_X86
+    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
+#else
+    *mfn = p2m_lookup(d, gfn, &p2mt);
+#endif
+
+    if ( !mfn_valid(*mfn) )
+        ret = -EINVAL;
+#ifdef CONFIG_X86
+    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
+        ret = -EAGAIN;
+#endif
+    else if ( (p2mt != p2m_ram_rw) ||
+              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
+        ret = -EINVAL;
+
+#ifdef CONFIG_X86
+    put_gfn(d, gfn_x(gfn));
+#endif
+
+    return ret;
+}
+
+static int
+find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
+               uint32_t npage,
+               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
+               uint32_t len)
+{
+    unsigned int i;
+    int ret = 0;
+    mfn_t *mfns;
+    uint8_t **mfn_mapping;
+
+    /*
+     * first bounds check on npage here also serves as an overflow check
+     * before left shifting it
+     */
+    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
+         ((npage << PAGE_SHIFT) < len) )
+        return -EINVAL;
+
+    if ( ring_info->mfns )
+    {
+        /* Ring already existed: drop the previous mapping. */
+        gprintk(XENLOG_INFO,
+         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears mapping\n",
+                d->domain_id, ring_info->id.domain_id,
+                ring_info->id.port, ring_info->id.partner_id);
+
+        ring_remove_mfns(d, ring_info);
+        ASSERT(!ring_info->mfns);
+    }
+
+    mfns = xmalloc_array(mfn_t, npage);
+    if ( !mfns )
+        return -ENOMEM;
+
+    for ( i = 0; i < npage; i++ )
+        mfns[i] = INVALID_MFN;
+
+    mfn_mapping = xzalloc_array(uint8_t *, npage);
+    if ( !mfn_mapping )
+    {
+        xfree(mfns);
+        return -ENOMEM;
+    }
+
+    ring_info->npage = npage;
+    ring_info->mfns = mfns;
+    ring_info->mfn_mapping = mfn_mapping;
+
+    ASSERT(ring_info->npage == npage);
+
+    if ( ring_info->nmfns == ring_info->npage )
+        return 0;
+
+    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
+    {
+        xen_argo_page_descr_t pg_descr;
+        gfn_t gfn;
+        mfn_t mfn;
+
+        ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ?
+                -EFAULT : 0;
+        if ( ret )
+            break;
+
+        /* Implementation currently only supports handling 4K pages */
+        if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) !=
+                XEN_ARGO_PAGE_DESCR_SIZE_4K )
+        {
+            ret = -EINVAL;
+            break;
+        }
+        gfn = _gfn(pg_descr >> PAGE_SHIFT);
+
+        ret = find_ring_mfn(d, gfn, &mfn);
+        if ( ret )
+        {
+            gprintk(XENLOG_ERR,
+               "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p %d/%d\n",
+                    d->domain_id, gfn_x(gfn), ring_info->id.domain_id,
+                    ring_info->id.port, ring_info->id.partner_id,
+                    ring_info, i, ring_info->npage);
+            break;
+        }
+
+        ring_info->mfns[i] = mfn;
+
+        argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n",
+                     i, gfn_x(gfn), mfn_x(ring_info->mfns[i]));
+    }
+
+    ring_info->nmfns = i;
+
+    if ( ret )
+        ring_remove_mfns(d, ring_info);
+    else
+    {
+        ASSERT(ring_info->nmfns == ring_info->npage);
+
+        gprintk(XENLOG_DEBUG,
+        "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns %d\n",
+                d->domain_id, ring_info->id.domain_id,
+                ring_info->id.port, ring_info->id.partner_id, ring_info,
+                ring_info->mfn_mapping, ring_info->npage, ring_info->nmfns);
+    }
+
+    return ret;
+}
+
+static struct argo_ring_info *
+ring_find_info(const struct domain *d, const struct argo_ring_id *id)
+{
+    unsigned int ring_hash_index;
+    struct hlist_node *node;
+    struct argo_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    ring_hash_index = hash_index(id);
+
+    argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
+                 d->argo, ring_hash_index,
+                 d->argo->ring_hash[ring_hash_index].first, id);
+    argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n",
+                 id->port, id->domain_id, id->partner_id);
+
+    hlist_for_each_entry(ring_info, node, &d->argo->ring_hash[ring_hash_index],
+                         node)
+    {
+        struct argo_ring_id *cmpid = &ring_info->id;
+
+        if ( cmpid->port == id->port &&
+             cmpid->domain_id == id->domain_id &&
+             cmpid->partner_id == id->partner_id )
+        {
+            argo_dprintk("ring_info=%p\n", ring_info);
+            return ring_info;
+        }
+    }
+    argo_dprintk("no ring_info found\n");
+
+    return NULL;
+}
+
+static long
+register_ring(struct domain *currd,
+              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
+              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
+              uint32_t npage, bool fail_exist)
+{
+    xen_argo_register_ring_t reg;
+    struct argo_ring_id ring_id;
+    void *map_ringp;
+    xen_argo_ring_t *ringp;
+    struct argo_ring_info *ring_info;
+    struct argo_send_info *send_info = NULL;
+    struct domain *dst_d = NULL;
+    int ret = 0;
+    uint32_t private_tx_ptr;
+
+    if ( copy_from_guest(&reg, reg_hnd, 1) )
+    {
+        ret = -EFAULT;
+        goto out;
+    }
+
+    /*
+     * A ring must be large enough to transmit messages, so requires space for:
+     * * 1 message header, plus
+     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
+     *   for the message payload to be written into, plus
+     * * 1 more slot, so that the ring cannot be filled to capacity with a
+     *   single message -- see the logic in ringbuf_insert -- allowing for this
+     *   ensures that there can be space remaining when a message is present.
+     * The above determines the minimum acceptable ring size.
+     */
+    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
+                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
+         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
+         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
+         (reg.pad != 0) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ring_id.partner_id = reg.partner_id;
+    ring_id.port = reg.port;
+    ring_id.domain_id = currd->domain_id;
+
+    read_lock(&argo_lock);
+
+    if ( !currd->argo )
+    {
+        ret = -ENODEV;
+        goto out_unlock;
+    }
+
+    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
+    {
+        if ( opt_argo_mac_enforcing )
+        {
+            ret = -EPERM;
+            goto out_unlock;
+        }
+    }
+    else
+    {
+        dst_d = get_domain_by_id(reg.partner_id);
+        if ( !dst_d )
+        {
+            argo_dprintk("!dst_d, ESRCH\n");
+            ret = -ESRCH;
+            goto out_unlock;
+        }
+
+        if ( !dst_d->argo )
+        {
+            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
+            ret = -ECONNREFUSED;
+            put_domain(dst_d);
+            goto out_unlock;
+        }
+
+        send_info = xzalloc(struct argo_send_info);
+        if ( !send_info )
+        {
+            ret = -ENOMEM;
+            put_domain(dst_d);
+            goto out_unlock;
+        }
+        send_info->id = ring_id;
+    }
+
+    write_lock(&currd->argo->lock);
+
+    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
+    {
+        ret = -ENOSPC;
+        goto out_unlock2;
+    }
+
+    ring_info = ring_find_info(currd, &ring_id);
+    if ( !ring_info )
+    {
+        ring_info = xzalloc(struct argo_ring_info);
+        if ( !ring_info )
+        {
+            ret = -ENOMEM;
+            goto out_unlock2;
+        }
+
+        spin_lock_init(&ring_info->lock);
+
+        ring_info->id = ring_id;
+        INIT_HLIST_HEAD(&ring_info->pending);
+
+        hlist_add_head(&ring_info->node,
+                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
+
+        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
+                currd->domain_id, ring_id.domain_id, ring_id.port,
+                ring_id.partner_id);
+    }
+    else
+    {
+        if ( ring_info->len )
+        {
+            /*
+             * If the caller specified that the ring must not already exist,
+             * fail at attempt to add a completed ring which already exists.
+             */
+            if ( fail_exist )
+            {
+                argo_dprintk("disallowed reregistration of existing ring\n");
+                ret = -EEXIST;
+                goto out_unlock2;
+            }
+
+            if ( ring_info->len != reg.len )
+            {
+                /*
+                 * Change of ring size could result in entries on the pending
+                 * notifications list that will never trigger.
+                 * Simple blunt solution: disallow ring resize for now.
+                 * TODO: investigate enabling ring resize.
+                 */
+                gprintk(XENLOG_ERR,
+                    "argo: vm%u attempted to change ring size(vm%u:%x vm%d)\n",
+                        currd->domain_id, ring_id.domain_id, ring_id.port,
+                        ring_id.partner_id);
+                /*
+                 * Could return EINVAL here, but if the ring didn't already
+                 * exist then the arguments would have been valid, so: EEXIST.
+                 */
+                ret = -EEXIST;
+                goto out_unlock2;
+            }
+
+            gprintk(XENLOG_DEBUG,
+                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
+                    currd->domain_id, ring_id.domain_id, ring_id.port,
+                    ring_id.partner_id);
+        }
+    }
+
+    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
+    if ( ret )
+    {
+        gprintk(XENLOG_ERR,
+                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
+                currd->domain_id, ring_id.domain_id, ring_id.port,
+                ring_id.partner_id);
+
+        ring_remove_info(currd, ring_info);
+        goto out_unlock2;
+    }
+
+    /*
+     * The first page of the memory supplied for the ring has the xen_argo_ring
+     * structure at its head, which is where the ring indexes reside.
+     */
+    ret = ring_map_page(ring_info, 0, &map_ringp);
+    if ( ret )
+    {
+        gprintk(XENLOG_ERR,
+                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
+                currd->domain_id, ring_id.domain_id, ring_id.port,
+                ring_id.partner_id);
+
+        ring_remove_info(currd, ring_info);
+        goto out_unlock2;
+    }
+    ringp = map_ringp;
+
+    private_tx_ptr = read_atomic(&ringp->tx_ptr);
+
+    if ( (private_tx_ptr >= reg.len) ||
+         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
+    {
+        /*
+         * Since the ring is a mess, attempt to flush the contents of it
+         * here by setting the tx_ptr to the next aligned message slot past
+         * the latest rx_ptr we have observed. Handle ring wrap correctly.
+         */
+        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
+
+        if ( private_tx_ptr >= reg.len )
+            private_tx_ptr = 0;
+
+        update_tx_ptr(ring_info, private_tx_ptr);
+    }
+
+    ring_info->tx_ptr = private_tx_ptr;
+    ring_info->len = reg.len;
+    currd->argo->ring_count++;
+
+    if ( send_info )
+    {
+        spin_lock(&dst_d->argo->send_lock);
+
+        hlist_add_head(&send_info->node,
+                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
+
+        spin_unlock(&dst_d->argo->send_lock);
+    }
+
+ out_unlock2:
+    if ( !ret && send_info )
+        xfree(send_info);
+
+    if ( dst_d )
+        put_domain(dst_d);
+
+    write_unlock(&currd->argo->lock);
+
+ out_unlock:
+    read_unlock(&argo_lock);
+
+ out:
+    return ret;
+}
+
 long
 do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
            XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
@@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
 
     switch (cmd)
     {
+    case XEN_ARGO_OP_register_ring:
+    {
+        XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd =
+            guest_handle_cast(arg1, xen_argo_register_ring_t);
+        XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd =
+            guest_handle_cast(arg2, xen_argo_page_descr_t);
+        /* arg3 is npage */
+        /* arg4 is flags */
+        bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST;
+
+        if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+        /*
+         * Check access to the whole array here so we can use the faster __copy
+         * operations to read each element later.
+         */
+        if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) )
+            break;
+        /* arg4: reserve currently-undefined bits, require zero.  */
+        if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist);
+        break;
+    }
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
index 8997a1c..70e9a78 100644
--- a/xen/include/asm-arm/guest_access.h
+++ b/xen/include/asm-arm/guest_access.h
@@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
 
+#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
+
 /* Offset the given guest handle into the array it refers to. */
 #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
 #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index ca700c9..8dde5d5 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -41,6 +41,8 @@
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
 
+#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
+
 /* Offset the given guest handle into the array it refers to. */
 #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
 #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 4818684..8947230 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -31,6 +31,26 @@
 
 #include "xen.h"
 
+#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
+
+/*
+ * The maximum size of an Argo ring is defined to be: 16GB
+ *  -- which is 0x1000000 bytes.
+ * A byte index into the ring is at most 24 bits.
+ */
+#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
+
+/*
+ * Page descriptor: encoding both page address and size in a 64-bit value.
+ * Intended to allow ABI to support use of different granularity pages.
+ * example of how to populate:
+ * xen_argo_page_descr_t pg_desc =
+ *      (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K;
+ */
+typedef uint64_t xen_argo_page_descr_t;
+#define XEN_ARGO_PAGE_DESCR_SIZE_MASK   0x0000000000000fffULL
+#define XEN_ARGO_PAGE_DESCR_SIZE_4K     0
+
 typedef struct xen_argo_addr
 {
     uint32_t port;
@@ -56,4 +76,56 @@ typedef struct xen_argo_ring
 #endif
 } xen_argo_ring_t;
 
+typedef struct xen_argo_register_ring
+{
+    uint32_t port;
+    domid_t partner_id;
+    uint16_t pad;
+    uint32_t len;
+} xen_argo_register_ring_t;
+
+/* Messages on the ring are padded to a multiple of this size. */
+#define XEN_ARGO_MSG_SLOT_SIZE 0x10
+
+struct xen_argo_ring_message_header
+{
+    uint32_t len;
+    xen_argo_addr_t source;
+    uint32_t message_type;
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    uint8_t data[];
+#elif defined(__GNUC__)
+    uint8_t data[0];
+#endif
+};
+
+/*
+ * Hypercall operations
+ */
+
+/*
+ * XEN_ARGO_OP_register_ring
+ *
+ * Register a ring using the indicated memory.
+ * Also used to reregister an existing ring (eg. after resume from hibernate).
+ *
+ * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t)
+ * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t)
+ * arg3: unsigned long npages
+ * arg4: unsigned long flags
+ */
+#define XEN_ARGO_OP_register_ring     1
+
+/* Register op flags */
+/*
+ * Fail exist:
+ * If set, reject attempts to (re)register an existing established ring.
+ * If clear, reregistration occurs if the ring exists, with the new ring
+ * taking the place of the old, preserving tx_ptr if it remains valid.
+ */
+#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
+
+/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit */
+#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL
+
 #endif
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9f616e4..9c9d33f 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -150,3 +150,4 @@
 ?	flask_transition		xsm/flask_op.h
 ?	argo_addr			argo.h
 ?	argo_ring			argo.h
+?	argo_register_ring		argo.h
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 08/15] argo: implement the unregister op
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (6 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-10 11:40   ` Roger Pau Monné
  2019-01-14 15:06   ` Jan Beulich
  2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Takes a single argument: a handle to the ring unregistration struct,
which specifies the port and partner domain id or wildcard.

The ring's entry is removed from the hashtable of registered rings;
any entries for pending notifications are removed; and the ring is
unmapped from Xen's address space.

If the ring had been registered to communicate with a single specified
domain (ie. a non-wildcard ring) then the partner domain state is removed
from the partner domain's argo send_info hash table.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 feedback Jan: drop cookie, implement teardown
v2 feedback Jan: drop message from argo_message_op
v2 self: OVERHAUL
v2 self: reorder logic to shorten critical section
v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops
v1 feedback Roger, Jan: drop argo prefix on static functions
v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
v1 #5 (#14) feedback Paul: use currd in do_argo_message_op
v1 #5 (#14) feedback Paul: full use currd in argo_unregister_ring
v1 #13 (#14) feedback Paul: replace do/while with goto; reindent
v1 self: add blank lines in unregister case in do_argo_message_op
v1: #13 feedback Jan: public namespace: prefix with xen
v1: #13 feedback Jan: blank line after op case in do_argo_message_op
v1: #14 feedback Jan: replace domain id override with validation
v1: #18 feedback Jan: meld the ring count limit into the series
v1: feedback #15 Jan: verify zero in unused hypercall args

 xen/common/argo.c         | 115 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h |  19 ++++++++
 xen/include/xlat.lst      |   1 +
 3 files changed, 135 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 11988e7..59ce8c4 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -37,6 +37,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
 
 /* Xen command line option to enable argo */
 static bool __read_mostly opt_argo_enabled;
@@ -666,6 +667,105 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
     return NULL;
 }
 
+static struct argo_send_info *
+send_find_info(const struct domain *d, const struct argo_ring_id *id)
+{
+    struct hlist_node *node;
+    struct argo_send_info *send_info;
+
+    hlist_for_each_entry(send_info, node, &d->argo->send_hash[hash_index(id)],
+                         node)
+    {
+        struct argo_ring_id *cmpid = &send_info->id;
+
+        if ( cmpid->port == id->port &&
+             cmpid->domain_id == id->domain_id &&
+             cmpid->partner_id == id->partner_id )
+        {
+            argo_dprintk("send_info=%p\n", send_info);
+            return send_info;
+        }
+    }
+    argo_dprintk("no send_info found\n");
+
+    return NULL;
+}
+
+static long
+unregister_ring(struct domain *currd,
+                XEN_GUEST_HANDLE_PARAM(xen_argo_unregister_ring_t) unreg_hnd)
+{
+    xen_argo_unregister_ring_t unreg;
+    struct argo_ring_id ring_id;
+    struct argo_ring_info *ring_info;
+    struct argo_send_info *send_info;
+    struct domain *dst_d = NULL;
+    int ret;
+
+    ret = copy_from_guest(&unreg, unreg_hnd, 1) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    ret = unreg.pad ? -EINVAL : 0;
+    if ( ret )
+        goto out;
+
+    ring_id.partner_id = unreg.partner_id;
+    ring_id.port = unreg.port;
+    ring_id.domain_id = currd->domain_id;
+
+    read_lock(&argo_lock);
+
+    if ( !currd->argo )
+    {
+        ret = -ENODEV;
+        goto out_unlock;
+    }
+
+    write_lock(&currd->argo->lock);
+
+    ring_info = ring_find_info(currd, &ring_id);
+    if ( ring_info )
+    {
+        ring_remove_info(currd, ring_info);
+        currd->argo->ring_count--;
+    }
+
+    dst_d = get_domain_by_id(ring_id.partner_id);
+    if ( dst_d )
+    {
+        if ( dst_d->argo )
+        {
+            spin_lock(&dst_d->argo->send_lock);
+
+            send_info = send_find_info(dst_d, &ring_id);
+            if ( send_info )
+            {
+                hlist_del(&send_info->node);
+                xfree(send_info);
+            }
+
+            spin_unlock(&dst_d->argo->send_lock);
+        }
+        put_domain(dst_d);
+    }
+
+    write_unlock(&currd->argo->lock);
+
+    if ( !ring_info )
+    {
+        argo_dprintk("ENOENT\n");
+        ret = -ENOENT;
+        goto out_unlock;
+    }
+
+ out_unlock:
+    read_unlock(&argo_lock);
+
+ out:
+    return ret;
+}
+
 static long
 register_ring(struct domain *currd,
               XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
@@ -958,6 +1058,21 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
         break;
     }
 
+    case XEN_ARGO_OP_unregister_ring:
+    {
+        XEN_GUEST_HANDLE_PARAM(xen_argo_unregister_ring_t) unreg_hnd =
+            guest_handle_cast(arg1, xen_argo_unregister_ring_t);
+
+        if ( unlikely((!guest_handle_is_null(arg2)) || arg3 || arg4) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = unregister_ring(currd, unreg_hnd);
+        break;
+    }
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 8947230..6117bf2 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -84,6 +84,13 @@ typedef struct xen_argo_register_ring
     uint32_t len;
 } xen_argo_register_ring_t;
 
+typedef struct xen_argo_unregister_ring
+{
+    uint32_t port;
+    domid_t partner_id;
+    uint16_t pad;
+} xen_argo_unregister_ring_t;
+
 /* Messages on the ring are padded to a multiple of this size. */
 #define XEN_ARGO_MSG_SLOT_SIZE 0x10
 
@@ -128,4 +135,16 @@ struct xen_argo_ring_message_header
 /* Mask for all defined flags. unsigned long type so ok for both 32/64-bit */
 #define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL
 
+/*
+ * XEN_ARGO_OP_unregister_ring
+ *
+ * Unregister a previously-registered ring, ending communication.
+ *
+ * arg1: XEN_GUEST_HANDLE(xen_argo_unregister_ring_t)
+ * arg2: NULL
+ * arg3: 0 (ZERO)
+ * arg4: 0 (ZERO)
+ */
+#define XEN_ARGO_OP_unregister_ring     2
+
 #endif
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9d33f..411c661 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -151,3 +151,4 @@
 ?	argo_addr			argo.h
 ?	argo_ring			argo.h
 ?	argo_register_ring		argo.h
+?	argo_unregister_ring		argo.h
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (7 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 08/15] argo: implement the unregister op Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-09 18:05   ` Jason Andryuk
                     ` (2 more replies)
  2019-01-07  7:42 ` [PATCH v3 10/15] argo: implement the notify op Christopher Clark
                   ` (5 subsequent siblings)
  14 siblings, 3 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

sendv operation is invoked to perform a synchronous send of buffers
contained in iovs to a remote domain's registered ring.

It takes:
 * A destination address (domid, port) for the ring to send to.
   It performs a most-specific match lookup, to allow for wildcard.
 * A source address, used to inform the destination of where to reply.
 * The address of an array of iovs containing the data to send
 * .. and the length of that array of iovs
 * and a 32-bit message type, available to communicate message context
   data (eg. kernel-to-kernel, separate from the application data).

If insufficient space exists in the destination ring, it will return
-EAGAIN and Xen will notify the caller when sufficient space becomes
available.

Accesses to the ring indices are appropriately atomic. The rings are
mapped into Xen's private address space to write as needed and the
mappings are retained for later use.

Fixed-size types are used in some areas within this code where caution
around avoiding integer overflow is important.

Notifications are sent to guests via VIRQ and send_guest_global_virq is
exposed in the change to enable argo to call it. VIRQ_ARGO_MESSAGE is
claimed from the VIRQ previously reserved for this purpose (#11).

The VIRQ notification method is used rather than sending events using
evtchn functions directly because:

* no current event channel type is an exact fit for the intended
  behaviour. ECS_IPI is closest, but it disallows migration to
  other VCPUs which is not necessarily a requirement for Argo.

* at the point of argo_init, allocation of an event channel is
  complicated by none of the guest VCPUs being initialized yet
  and the event channel logic expects that a valid event channel
  has a present VCPU.

* at the point of signalling a notification, the VIRQ logic is already
  defensive: if d->vcpu[0] is NULL, the notification is just silently
  dropped, whereas the evtchn_send logic is not so defensive: vcpu[0]
  must not be NULL, otherwise a null pointer dereference occurs.

Using a VIRQ removes the need for the guest to query to determine which
event channel notifications will be delivered on. This is also likely to
simplify establishing future L0/L1 nested hypervisor argo communication.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
The previous double-read of iovs from guest memory has been removed.

v2 self: use ring_info backpointer in pending_ent to maintain npending
v2 feedback Jan: drop cookie, implement teardown
v2 self: pending_queue: reap stale ents when in need of space
v2 self: pending_requeue: reclaim ents for stale domains
v2.feedback Jan: only override sender domid if DOMID_ANY
v2 feedback Jan: drop message from argo_message_op
v2 self: check npending vs maximum limit
v2 self: get_sanitized_ring instead of get_rx_ptr
v2 feedback v1#13 Jan: remove double read from ringbuf insert, lower MAX_IOV
v2 self: make iov_count const
v2 self: iov_count : return EMSGSIZE for message too big
v2 self: OVERHAUL
v2 self: s/argo_pending_ent/pending_ent/g
v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
v1 feedback Roger, Jan: drop argo prefix on static functions
v1 feedback #13 Jan: drop guest_handle_okay when using copy_from_guest
    - reorder do_argo_op logic
v2 self: add _hnd suffix to iovs variable name to indicate guest handle type
v2 self: replace use of XEN_GUEST_HANDLE_NULL with two existing macros

v1 #15 feedback, Jan: sendv op : s/ECONNREFUSED/ESRCH/
v1 #5 (#15) feedback Paul: sendv: use currd in do_argo_message_op
v1 #13 (#15) feedback Paul: sendv op: do/while reindent only
v1 #13 (#15) feedback Paul: sendv op: do/while: argo_ringbuf_insert to goto style
v1 #13 (#15) feedback Paul: sendv op: do/while: reindent only again
v1 #13 (#15) feedback Paul: sendv op: do/while : goto
v1 #15 feedback Paul: sendv op: make page var: unsigned
v1 #15 feedback Paul: sendv op: new local var for PAGE_SIZE - offset
v1 #8 feedback Jan: XEN_GUEST_HANDLE : C89 compliance
v1 rebase after switching register op from pfns to page descriptors
v1 self: move iov DEFINE_XEN_GUEST_HANDLE out of public header into argo.c
v1 #13 (#15) feedback Paul: fix loglevel for guest-triggered messages
v1 : add compat xlat.lst entries
v1 self: switched notification to send_guest_global_virq instead of event
v1: fix gprintk use for ARM as its defn dislikes split format strings
v1: init len variable to satisfy ARM compiler initialized checking
v1 #13 feedback Jan: rename page var
v1:#14 feedback Jan: uint8_t* -> void*
v1: #13 feedback Jan: public namespace: prefix with xen
v1: #13 feedback Jan: blank line after case op in do_argo_message_op
v1: #15 feedback Jan: add comments explaining why the writes don't overrun
v1: self: add ASSERT to support comment that overrun cannot happen
v1: self: fail on short writes where guest manipulated the iov_lens
v1: self: rename ent id to domain_id
v1: self: add moan for iov rewrite
v1. feedback #15 Jan: require the pad bits are zero
v1. feedback #15 Jan: drop NULL check in argo_signal_domain as now using VIRQ
v1. self: store domain_cookie in pending ent
v1. feedback #15 Jan: use unsigned where possible
v1. feedback Jan: use handle type for iov_base in public iov interface
v1. self: log whenever visible error occurs
v1 feedback #15, Jan: drop unnecessary mb
v1 self: only update internal tx_ptr if able to return success
         and update the visible tx_ptr
v1 self: log on failure to map ring to update visible tx_ptr
v1 feedback #15 Jan: add comment re: notification size policy
v1 self/Roger? remove errant space after sizeof
v1. feedback #15 Jan: require iov pad be zero
v1. self: rename iov_base to iov_hnd for handle in public iov interface
v1: feedback #15 Jan: handle upper-halves of hypercall args; changes some
    types in function signatures to match.
v1: self: add dprintk to sendv
v1: self: add debug output to argo_iov_count
v1. feedback #14 Jan: blank line before return in argo_iov_count
v1 feedback #15 Jan: verify src id, not override

 xen/common/argo.c          | 653 +++++++++++++++++++++++++++++++++++++++++++++
 xen/common/event_channel.c |   2 +-
 xen/include/public/argo.h  |  60 +++++
 xen/include/public/xen.h   |   2 +-
 xen/include/xen/event.h    |   7 +
 xen/include/xlat.lst       |   2 +
 6 files changed, 724 insertions(+), 2 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 59ce8c4..4548435 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -29,14 +29,21 @@
 #include <public/argo.h>
 
 #define MAX_RINGS_PER_DOMAIN            128U
+#define MAX_PENDING_PER_RING             32U
 
 /* All messages on the ring are padded to a multiple of the slot size. */
 #define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
 
+/* The maximum size of a message that may be sent on the largest Argo ring. */
+#define MAX_ARGO_MESSAGE_SIZE ((XEN_ARGO_MAX_RING_SIZE) - \
+        (sizeof(struct xen_argo_ring_message_header)) - ROUNDUP_MESSAGE(1))
+
 DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_iov_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_send_addr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
 
 /* Xen command line option to enable argo */
@@ -250,6 +257,14 @@ hash_index(const struct argo_ring_id *id)
 }
 
 static void
+signal_domain(struct domain *d)
+{
+    argo_dprintk("signalling domid:%d\n", d->domain_id);
+
+    send_guest_global_virq(d, VIRQ_ARGO_MESSAGE);
+}
+
+static void
 ring_unmap(struct argo_ring_info *ring_info)
 {
     unsigned int i;
@@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
     smp_wmb();
 }
 
+static int
+memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
+                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
+                     uint32_t len)
+{
+    unsigned int mfns_index = offset >> PAGE_SHIFT;
+    void *dst;
+    int ret;
+    unsigned int src_offset = 0;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    offset &= ~PAGE_MASK;
+
+    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
+        return -EFAULT;
+
+    while ( (offset + len) > PAGE_SIZE )
+    {
+        unsigned int head_len = PAGE_SIZE - offset;
+
+        ret = ring_map_page(ring_info, mfns_index, &dst);
+        if ( ret )
+            return ret;
+
+        if ( src )
+        {
+            memcpy(dst + offset, src + src_offset, head_len);
+            src_offset += head_len;
+        }
+        else
+        {
+            ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
+                    -EFAULT : 0;
+            if ( ret )
+                return ret;
+
+            guest_handle_add_offset(src_hnd, head_len);
+        }
+
+        mfns_index++;
+        len -= head_len;
+        offset = 0;
+    }
+
+    ret = ring_map_page(ring_info, mfns_index, &dst);
+    if ( ret )
+    {
+        argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page"
+                     " %d of %d\n", ring_info->id.domain_id, ring_info->id.port,
+                     ring_info->id.partner_id, ring_info, mfns_index,
+                     ring_info->nmfns);
+        return ret;
+    }
+
+    if ( src )
+        memcpy(dst + offset, src + src_offset, len);
+    else
+        ret = copy_from_guest(dst + offset, src_hnd, len) ? -EFAULT : 0;
+
+    return ret;
+}
+
+/*
+ * Use this with caution: rx_ptr is under guest control and may be bogus.
+ * See get_sanitized_ring for a safer alternative.
+ */
+static int
+get_rx_ptr(struct argo_ring_info *ring_info, uint32_t *rx_ptr)
+{
+    void *src;
+    xen_argo_ring_t *ringp;
+    int ret;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( !ring_info->nmfns || ring_info->nmfns < ring_info->npage )
+        return -EINVAL;
+
+    ret = ring_map_page(ring_info, 0, &src);
+    if ( ret )
+        return ret;
+
+    ringp = (xen_argo_ring_t *)src;
+
+    *rx_ptr = read_atomic(&ringp->rx_ptr);
+
+    return 0;
+}
+
+/*
+ * get_sanitized_ring creates a modified copy of the ring pointers where
+ * the rx_ptr is rounded up to ensure it is aligned, and then ring
+ * wrap is handled. Simplifies safe use of the rx_ptr for available
+ * space calculation.
+ */
+static int
+get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
+{
+    uint32_t rx_ptr;
+    int ret;
+
+    ret = get_rx_ptr(ring_info, &rx_ptr);
+    if ( ret )
+        return ret;
+
+    ring->tx_ptr = ring_info->tx_ptr;
+
+    rx_ptr = ROUNDUP_MESSAGE(rx_ptr);
+    if ( rx_ptr >= ring_info->len )
+        rx_ptr = 0;
+
+    ring->rx_ptr = rx_ptr;
+    return 0;
+}
+
+/*
+ * iov_count returns its count on success via an out variable to avoid
+ * potential for a negative return value to be used incorrectly
+ * (eg. coerced into an unsigned variable resulting in a large incorrect value)
+ */
+static int
+iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
+{
+    uint32_t sum_iov_lens = 0;
+
+    if ( niov > XEN_ARGO_MAXIOV )
+        return -EINVAL;
+
+    while ( niov-- )
+    {
+        /* valid iovs must have the padding field set to zero */
+        if ( piov->pad )
+        {
+            argo_dprintk("invalid iov: padding is not zero\n");
+            return -EINVAL;
+        }
+
+        /* check each to protect sum against integer overflow */
+        if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE )
+        {
+            argo_dprintk("invalid iov_len: too big (%u)>%llu\n",
+                         piov->iov_len, XEN_ARGO_MAX_RING_SIZE);
+            return -EINVAL;
+        }
+
+        sum_iov_lens += piov->iov_len;
+
+        /*
+         * Again protect sum from integer overflow
+         * and ensure total msg size will be within bounds.
+         */
+        if ( sum_iov_lens > MAX_ARGO_MESSAGE_SIZE )
+        {
+            argo_dprintk("invalid iov series: total message too big\n");
+            return -EMSGSIZE;
+        }
+
+        piov++;
+    }
+
+    *count = sum_iov_lens;
+
+    return 0;
+}
+
+static int
+ringbuf_insert(struct domain *d, struct argo_ring_info *ring_info,
+               const struct argo_ring_id *src_id,
+               XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd,
+               unsigned long niov, uint32_t message_type,
+               unsigned long *out_len)
+{
+    xen_argo_ring_t ring;
+    struct xen_argo_ring_message_header mh = { 0 };
+    int32_t sp;
+    int32_t ret;
+    uint32_t len = 0;
+    xen_argo_iov_t iovs[XEN_ARGO_MAXIOV];
+    xen_argo_iov_t *piov;
+    XEN_GUEST_HANDLE(uint8_t) NULL_hnd =
+       guest_handle_from_param(guest_handle_from_ptr(NULL, uint8_t), uint8_t);
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    ret = __copy_from_guest(iovs, iovs_hnd, niov) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    /*
+     * Obtain the total size of data to transmit -- sets the 'len' variable
+     * -- and sanity check that the iovs conform to size and number limits.
+     * Enforced below: no more than 'len' bytes of guest data
+     * (plus the message header) will be sent in this operation.
+     */
+    ret = iov_count(iovs, niov, &len);
+    if ( ret )
+        goto out;
+
+    /*
+     * Size bounds check against ring size and static maximum message limit.
+     * The message must not fill the ring; there must be at least one slot
+     * remaining so we can distinguish a full ring from an empty one.
+     */
+    if ( ((ROUNDUP_MESSAGE(len) +
+            sizeof(struct xen_argo_ring_message_header)) >= ring_info->len) ||
+         (len > MAX_ARGO_MESSAGE_SIZE) )
+    {
+        ret = -EMSGSIZE;
+        goto out;
+    }
+
+    ret = get_sanitized_ring(&ring, ring_info);
+    if ( ret )
+        goto out;
+
+    argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring len=%d"
+                 " ring_info->tx_ptr=%d\n",
+                 ring.tx_ptr, ring.rx_ptr, ring_info->len, ring_info->tx_ptr);
+
+    if ( ring.rx_ptr == ring.tx_ptr )
+        sp = ring_info->len;
+    else
+    {
+        sp = ring.rx_ptr - ring.tx_ptr;
+        if ( sp < 0 )
+            sp += ring_info->len;
+    }
+
+    /*
+     * Size bounds check against currently available space in the ring.
+     * Again: the message must not fill the ring leaving no space remaining.
+     */
+    if ( (ROUNDUP_MESSAGE(len) +
+            sizeof(struct xen_argo_ring_message_header)) >= sp )
+    {
+        argo_dprintk("EAGAIN\n");
+        ret = -EAGAIN;
+        goto out;
+    }
+
+    mh.len = len + sizeof(struct xen_argo_ring_message_header);
+    mh.source.port = src_id->port;
+    mh.source.domain_id = src_id->domain_id;
+    mh.message_type = message_type;
+
+    /*
+     * For this copy to the guest ring, tx_ptr is always 16-byte aligned
+     * and the message header is 16 bytes long.
+     */
+    BUILD_BUG_ON(
+        sizeof(struct xen_argo_ring_message_header) != ROUNDUP_MESSAGE(1));
+
+    /*
+     * First data write into the destination ring: fixed size, message header.
+     * This cannot overrun because the available free space (value in 'sp')
+     * is checked above and must be at least this size.
+     */
+    ret = memcpy_to_guest_ring(ring_info, ring.tx_ptr + sizeof(xen_argo_ring_t),
+                               &mh, NULL_hnd, sizeof(mh));
+    if ( ret )
+    {
+        gprintk(XENLOG_ERR,
+                "argo: failed to write message header to ring (vm%u:%x vm%d)\n",
+                ring_info->id.domain_id, ring_info->id.port,
+                ring_info->id.partner_id);
+
+        goto out;
+    }
+
+    ring.tx_ptr += sizeof(mh);
+    if ( ring.tx_ptr == ring_info->len )
+        ring.tx_ptr = 0;
+
+    piov = iovs;
+
+    while ( niov-- )
+    {
+        XEN_GUEST_HANDLE_64(uint8_t) buf_hnd = piov->iov_hnd;
+        uint32_t iov_len = piov->iov_len;
+
+        /* If no data is provided in this iov, moan and skip on to the next */
+        if ( !iov_len )
+        {
+            gprintk(XENLOG_ERR,
+                    "argo: no data iov_len=0 iov_hnd=%p ring (vm%u:%x vm%d)\n",
+                    buf_hnd.p, ring_info->id.domain_id, ring_info->id.port,
+                    ring_info->id.partner_id);
+
+            piov++;
+            continue;
+        }
+
+        if ( unlikely(!guest_handle_okay(buf_hnd, iov_len)) )
+        {
+            gprintk(XENLOG_ERR,
+                    "argo: bad iov handle [%p, %"PRIx32"] (vm%u:%x vm%d)\n",
+                    buf_hnd.p, iov_len,
+                    ring_info->id.domain_id, ring_info->id.port,
+                    ring_info->id.partner_id);
+
+            ret = -EFAULT;
+            goto out;
+        }
+
+        sp = ring_info->len - ring.tx_ptr;
+
+        /* Check: iov data size versus free space at the tail of the ring */
+        if ( iov_len > sp )
+        {
+            /*
+             * Second possible data write: ring-tail-wrap-write.
+             * Populate the ring tail and update the internal tx_ptr to handle
+             * wrapping at the end of ring.
+             * Size of data written here: sp
+             * which is the exact full amount of free space available at the
+             * tail of the ring, so this cannot overrun.
+             */
+            ret = memcpy_to_guest_ring(ring_info,
+                                       ring.tx_ptr + sizeof(xen_argo_ring_t),
+                                       NULL, buf_hnd, sp);
+            if ( ret )
+            {
+                gprintk(XENLOG_ERR,
+                        "argo: failed to copy {%p, %"PRIx32"} (vm%u:%x vm%d)\n",
+                        buf_hnd.p, sp,
+                        ring_info->id.domain_id, ring_info->id.port,
+                        ring_info->id.partner_id);
+
+                goto out;
+            }
+
+            ring.tx_ptr = 0;
+            iov_len -= sp;
+            guest_handle_add_offset(buf_hnd, sp);
+
+            ASSERT(iov_len <= ring_info->len);
+        }
+
+        /*
+         * Third possible data write: all data remaining for this iov.
+         * Size of data written here: iov_len
+         *
+         * Case 1: if the ring-tail-wrap-write above was performed, then
+         *         iov_len has been decreased by 'sp' and ring.tx_ptr is zero.
+         *
+         *    We know from checking the result of iov_count:
+         *      len + sizeof(message_header) <= ring_info->len
+         *    We also know that len is the total of summing all iov_lens, so:
+         *       iov_len <= len
+         *    so by transitivity:
+         *       iov_len <= len <= (ring_info->len - sizeof(msgheader))
+         *    and therefore:
+         *       (iov_len + sizeof(msgheader) <= ring_info->len) &&
+         *       (ring.tx_ptr == 0)
+         *    so this write cannot overrun here.
+         *
+         * Case 2: ring-tail-wrap-write above was not performed
+         *    -> so iov_len is the guest-supplied value and: (iov_len <= sp)
+         *    ie. less than available space at the tail of the ring:
+         *        so this write cannot overrun.
+         */
+        ret = memcpy_to_guest_ring(ring_info,
+                                   ring.tx_ptr + sizeof(xen_argo_ring_t),
+                                   NULL, buf_hnd, iov_len);
+        if ( ret )
+        {
+            gprintk(XENLOG_ERR,
+                    "argo: failed to copy [%p, %"PRIx32"] (vm%u:%x vm%d)\n",
+                    buf_hnd.p, iov_len, ring_info->id.domain_id,
+                    ring_info->id.port, ring_info->id.partner_id);
+
+            goto out;
+        }
+
+        ring.tx_ptr += iov_len;
+
+        if ( ring.tx_ptr == ring_info->len )
+            ring.tx_ptr = 0;
+
+        piov++;
+    }
+
+    ring.tx_ptr = ROUNDUP_MESSAGE(ring.tx_ptr);
+
+    if ( ring.tx_ptr >= ring_info->len )
+        ring.tx_ptr -= ring_info->len;
+
+    update_tx_ptr(ring_info, ring.tx_ptr);
+
+ out:
+    /*
+     * At this point it is possible to unmap the ring_info, ie:
+     *   ring_unmap(ring_info);
+     * but performance should be improved by not doing so, and retaining
+     * the mapping.
+     * An XSM policy control over level of confidentiality required
+     * versus performance cost could be added to decide that here.
+     * See the similar comment in ring_map_page re: write-only mappings.
+     */
+
+    if ( !ret )
+        *out_len = len;
+
+    return ret;
+}
+
 static void
 wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
 {
@@ -359,6 +781,22 @@ wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
 }
 
 static void
+wildcard_pending_list_insert(domid_t domain_id, struct pending_ent *ent)
+{
+    struct domain *d = get_domain_by_id(domain_id);
+    if ( !d )
+        return;
+
+    if ( d->argo )
+    {
+        spin_lock(&d->argo->wildcard_lock);
+        hlist_add_head(&ent->wildcard_node, &d->argo->wildcard_pend_list);
+        spin_unlock(&d->argo->wildcard_lock);
+    }
+    put_domain(d);
+}
+
+static void
 pending_remove_all(struct argo_ring_info *ring_info)
 {
     struct hlist_node *node, *next;
@@ -374,6 +812,67 @@ pending_remove_all(struct argo_ring_info *ring_info)
     ring_info->npending = 0;
 }
 
+static int
+pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
+              unsigned int len)
+{
+    struct pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    if ( ring_info->npending >= MAX_PENDING_PER_RING )
+        return -ENOSPC;
+
+    ent = xmalloc(struct pending_ent);
+
+    if ( !ent )
+        return -ENOMEM;
+
+    ent->len = len;
+    ent->domain_id = src_id;
+    ent->ring_info = ring_info;
+
+    if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+        wildcard_pending_list_insert(src_id, ent);
+    hlist_add_head(&ent->node, &ring_info->pending);
+    ring_info->npending++;
+
+    return 0;
+}
+
+static int
+pending_requeue(struct argo_ring_info *ring_info, domid_t src_id,
+                unsigned int len)
+{
+    struct hlist_node *node;
+    struct pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    hlist_for_each_entry(ent, node, &ring_info->pending, node)
+    {
+        if ( ent->domain_id == src_id )
+        {
+            /*
+             * Reuse an existing queue entry for a notification rather than add
+             * another. If the existing entry is waiting for a smaller size than
+             * the current message then adjust the record to wait for the
+             * current (larger) size to be available before triggering a
+             * notification.
+             * This assists the waiting sender by ensuring that whenever a
+             * notification is triggered, there is sufficient space available
+             * for (at least) any one of the messages awaiting transmission.
+             */
+            if ( ent->len < len )
+                ent->len = len;
+
+            return 0;
+        }
+    }
+
+    return pending_queue(ring_info, src_id, len);
+}
+
 static void
 wildcard_rings_pending_remove(struct domain *d)
 {
@@ -667,6 +1166,28 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
     return NULL;
 }
 
+static struct argo_ring_info *
+ring_find_info_by_match(const struct domain *d, uint32_t port,
+                        domid_t partner_id)
+{
+    struct argo_ring_id id;
+    struct argo_ring_info *ring_info;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    id.port = port;
+    id.domain_id = d->domain_id;
+    id.partner_id = partner_id;
+
+    ring_info = ring_find_info(d, &id);
+    if ( ring_info )
+        return ring_info;
+
+    id.partner_id = XEN_ARGO_DOMID_ANY;
+
+    return ring_find_info(d, &id);
+}
+
 static struct argo_send_info *
 send_find_info(const struct domain *d, const struct argo_ring_id *id)
 {
@@ -1005,6 +1526,95 @@ register_ring(struct domain *currd,
     return ret;
 }
 
+static long
+sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
+      const xen_argo_addr_t *dst_addr,
+      XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd, unsigned long niov,
+      uint32_t message_type)
+{
+    struct domain *dst_d = NULL;
+    struct argo_ring_id src_id;
+    struct argo_ring_info *ring_info;
+    int ret = 0;
+    unsigned long len = 0;
+
+    ASSERT(src_d->domain_id == src_addr->domain_id);
+
+    argo_dprintk("sendv: (%d:%x)->(%d:%x) niov:%lu iov:%p type:%u\n",
+                 src_addr->domain_id, src_addr->port,
+                 dst_addr->domain_id, dst_addr->port,
+                 niov, iovs_hnd.p, message_type);
+
+    read_lock(&argo_lock);
+
+    if ( !src_d->argo )
+    {
+        ret = -ENODEV;
+        goto out_unlock;
+    }
+
+    src_id.port = src_addr->port;
+    src_id.domain_id = src_d->domain_id;
+    src_id.partner_id = dst_addr->domain_id;
+
+    dst_d = get_domain_by_id(dst_addr->domain_id);
+    if ( !dst_d )
+    {
+        argo_dprintk("!dst_d, ESRCH\n");
+        ret = -ESRCH;
+        goto out_unlock;
+    }
+
+    if ( !dst_d->argo )
+    {
+        argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
+        ret = -ECONNREFUSED;
+        goto out_unlock;
+    }
+
+    read_lock(&dst_d->argo->lock);
+
+    ring_info = ring_find_info_by_match(dst_d, dst_addr->port,
+                                        src_addr->domain_id);
+    if ( !ring_info )
+    {
+        gprintk(XENLOG_ERR,
+                "argo: vm%u connection refused, src (vm%u:%x) dst (vm%u:%x)\n",
+                current->domain->domain_id, src_id.domain_id, src_id.port,
+                dst_addr->domain_id, dst_addr->port);
+
+        ret = -ECONNREFUSED;
+        goto out_unlock2;
+    }
+
+    spin_lock(&ring_info->lock);
+
+    ret = ringbuf_insert(dst_d, ring_info, &src_id, iovs_hnd, niov,
+                         message_type, &len);
+    if ( ret == -EAGAIN )
+    {
+        argo_dprintk("argo_ringbuf_sendv failed, EAGAIN\n");
+        /* requeue to issue a notification when space is there */
+        ret = pending_requeue(ring_info, src_addr->domain_id, len);
+    }
+
+    spin_unlock(&ring_info->lock);
+
+    if ( ret >= 0 )
+        signal_domain(dst_d);
+
+ out_unlock2:
+    read_unlock(&dst_d->argo->lock);
+
+ out_unlock:
+    if ( dst_d )
+        put_domain(dst_d);
+
+    read_unlock(&argo_lock);
+
+    return ( ret < 0 ) ? ret : len;
+}
+
 long
 do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
            XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
@@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
         break;
     }
 
+    case XEN_ARGO_OP_sendv:
+    {
+        xen_argo_send_addr_t send_addr;
+
+        XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd =
+            guest_handle_cast(arg1, xen_argo_send_addr_t);
+        XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd =
+            guest_handle_cast(arg2, xen_argo_iov_t);
+        /* arg3 is niov */
+        /* arg4 is message_type. Must be a 32-bit value. */
+
+        rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0;
+        if ( rc )
+            break;
+
+        if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY )
+            send_addr.src.domain_id = currd->domain_id;
+
+        /* No domain is currently authorized to send on behalf of another */
+        if ( unlikely(send_addr.src.domain_id != currd->domain_id) )
+        {
+            rc = -EPERM;
+            break;
+        }
+
+        /* Reject niov or message_type values that are outside 32 bit range. */
+        if ( unlikely((arg3 > XEN_ARGO_MAXIOV) || (arg4 & ~0xffffffffUL)) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        /*
+         * Check access to the whole array here so we can use the faster __copy
+         * operations to read each element later.
+         */
+        if ( unlikely(!guest_handle_okay(iovs_hnd, arg3)) )
+            break;
+
+        rc = sendv(currd, &send_addr.src, &send_addr.dst, iovs_hnd, arg3, arg4);
+        break;
+    }
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index f34d4f0..6fbe346 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -746,7 +746,7 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
     spin_unlock_irqrestore(&v->virq_lock, flags);
 }
 
-static void send_guest_global_virq(struct domain *d, uint32_t virq)
+void send_guest_global_virq(struct domain *d, uint32_t virq)
 {
     unsigned long flags;
     int port;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 6117bf2..8f7d05d 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -41,6 +41,34 @@
 #define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
 
 /*
+ * XEN_ARGO_MAXIOV : maximum number of iovs accepted in a single sendv.
+ * Rationale for the value:
+ * A low value since the full array of iov structs is read onto the hypervisor
+ * stack to work with while processing the message data.
+ * The Linux argo driver never passes more than two iovs.
+ *
+ * This value should not exceed 128 to ensure that the total amount of data
+ * posted in a single Argo sendv operation cannot exceed 2^31 bytes, to reduce
+ * risk of integer overflow defects:
+ * Each argo iov can hold ~ 2^24 bytes, so XEN_ARGO_MAXIOV <= 2^(31-24),
+ * ie. keep XEN_ARGO_MAXIOV <= 128.
+*/
+#define XEN_ARGO_MAXIOV          8U
+
+DEFINE_XEN_GUEST_HANDLE(uint8_t);
+
+typedef struct xen_argo_iov
+{
+#ifdef XEN_GUEST_HANDLE_64
+    XEN_GUEST_HANDLE_64(uint8_t) iov_hnd;
+#else
+    uint64_t iov_hnd;
+#endif
+    uint32_t iov_len;
+    uint32_t pad;
+} xen_argo_iov_t;
+
+/*
  * Page descriptor: encoding both page address and size in a 64-bit value.
  * Intended to allow ABI to support use of different granularity pages.
  * example of how to populate:
@@ -58,6 +86,12 @@ typedef struct xen_argo_addr
     uint16_t pad;
 } xen_argo_addr_t;
 
+typedef struct xen_argo_send_addr
+{
+    xen_argo_addr_t src;
+    xen_argo_addr_t dst;
+} xen_argo_send_addr_t;
+
 typedef struct xen_argo_ring
 {
     /* Guests should use atomic operations to access rx_ptr */
@@ -147,4 +181,30 @@ struct xen_argo_ring_message_header
  */
 #define XEN_ARGO_OP_unregister_ring     2
 
+/*
+ * XEN_ARGO_OP_sendv
+ *
+ * Send a list of buffers contained in iovs.
+ *
+ * The send address struct specifies the source and destination addresses
+ * for the message being sent, which are used to find the destination ring:
+ * Xen first looks for a most-specific match with a registered ring with
+ *  (id.addr == dst) and (id.partner == sending_domain) ;
+ * if that fails, it then looks for a wildcard match (aka multicast receiver)
+ * where (id.addr == dst) and (id.partner == DOMID_ANY).
+ *
+ * For each iov entry, send iov_len bytes from iov_base to the destination ring.
+ * If insufficient space exists in the destination ring, it will return -EAGAIN
+ * and Xen will notify the caller when sufficient space becomes available.
+ *
+ * The message type is a 32-bit data field available to communicate message
+ * context data (eg. kernel-to-kernel, rather than application layer).
+ *
+ * arg1: XEN_GUEST_HANDLE(xen_argo_send_addr_t) source and dest addresses
+ * arg2: XEN_GUEST_HANDLE(xen_argo_iov_t) iovs
+ * arg3: unsigned long niov
+ * arg4: unsigned long message type
+ */
+#define XEN_ARGO_OP_sendv               3
+
 #endif
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index b3f6491..b650aba 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -178,7 +178,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_CON_RING   8  /* G. (DOM0) Bytes received on console            */
 #define VIRQ_PCPU_STATE 9  /* G. (DOM0) PCPU state changed                   */
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occurred          */
-#define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
+#define VIRQ_ARGO_MESSAGE 11 /* G. Argo interdomain message notification     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
 #define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index ebb879e..4650887 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -29,6 +29,13 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq);
 void send_global_virq(uint32_t virq);
 
 /*
+ * send_guest_global_virq:
+ *  @d:        Domain to which VIRQ should be sent
+ *  @virq:     Virtual IRQ number (VIRQ_*), must be global
+ */
+void send_guest_global_virq(struct domain *d, uint32_t virq);
+
+/*
  * sent_global_virq_handler: Set a global VIRQ handler.
  *  @d:        New target domain for this VIRQ
  *  @virq:     Virtual IRQ number (VIRQ_*), must be global
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 411c661..3723980 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -152,3 +152,5 @@
 ?	argo_ring			argo.h
 ?	argo_register_ring		argo.h
 ?	argo_unregister_ring		argo.h
+?	argo_iov			argo.h
+?	argo_send_addr			argo.h
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 10/15] argo: implement the notify op
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (8 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-10 12:21   ` Roger Pau Monné
  2019-01-07  7:42 ` [PATCH v3 11/15] xsm, argo: XSM control for argo register Christopher Clark
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Queries for data about space availability in registered rings and
causes notification to be sent when space has become available.

The hypercall op populates a supplied data structure with information about
ring state, and if insufficent space is currently available in a given ring,
the hypervisor will record the domain's expressed interest and notify it
when it observes that space has become available.

Checks for free space occur when this notify op is invoked, so it may be
intentionally invoked with no data structure to populate
(ie. a NULL argument) to trigger such a check and consequent notifications.

Limit the maximum number of notify requests in a single operation to a
simple fixed limit of 256.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 feedback Jan: drop cookie, implement teardown
v2 notify: add flag to indicate ring is shared
v2 argument name for fill_ring_data arg is now currd
v2 self: check ring size vs request and flag error rather than queue signal
v2 feedback Jan: drop 'message' from 'argo_message_op'
v2 self: simplify signal_domid, drop unnecessary label + goto
v2 self: skip the cookie check in pending_cancel
v2 self: implement npending limit on number of pending entries
v1 feedback #16 Jan: sanitize_ring in ringbuf_payload_space
v2 self: inline fill_ring_data_array
v2 self: avoid retesting dst_d for put_domain
v2 self/Jan: remove use of magic verification field and tidy up
v1 feedback #16 Jan: remove testing of magic in guest-supplied structure
v2 self: s/argo_pending_ent/pending_ent/g
v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
v1 feedback Roger, Jan: drop argo prefix on static functions
v2 self: reduce indentation via goto out if arg NULL
v1 feedback #13 Jan: resolve checking of array handle and use of __copy

v1 #5 (#16) feedback Paul: notify op: use currd in do_argo_message_op
v1 #5 (#16) feedback Paul: notify op: use currd in argo_notify
v1 #5 (#16) feedback Paul: notify op: use currd in argo_notify_check_pending
v1 #5 (#16) feedback Paul: notify op: use currd in argo_fill_ring_data_array
v1 #13 (#16) feedback Paul: notify op: do/while: reindent only
v1 #13 (#16) feedback Paul: notify op: do/while: goto
v1 : add compat xlat.lst entries
v1: add definition for copy_field_from_guest_errno
v1 #13 feedback Jan: make 'ring data' comment comply with single-line style
v1 feedback #13 Jan: use __copy; so define and use __copy_field_to_guest_errno
v1: #13 feedback Jan: public namespace: prefix with xen
v1: #13 feedback Jan: add blank line after case in do_argo_message_op
v1: self: rename ent id to domain_id
v1: self: ent id-> domain_id
v1: self: drop signal if domain_cookie mismatches
v1. feedback #15 Jan: make loop i unsigned
v1. self: drop unnecessary mb() in argo_notify_check_pending
v1. self: add blank line
v1 #16 feedback Jan: const domain arg to +argo_fill_ring_data
v1. feedback #15 Jan: check unusued hypercall args are zero
v1 feedback #16 Jan: add comment on space available signal policy
v1. feedback #16 Jan: move declr, drop braces, lower indent
v1. feedback #18 Jan: meld the resource limits into the main commit
v1. feedback #16 Jan: clarify use of magic field
v1. self: use single copy to read notify ring data struct
v1: argo_fill_ring_data: fix dprintk types for port field
v1: self: use %x for printing port as per other print sites
v1. feedback Jan: add comments explaining ring full vs empty
v1. following Jan: fix argo_ringbuf_payload_space calculation for empty ring

 xen/common/argo.c         | 359 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/argo.h |  67 +++++++++
 xen/include/xlat.lst      |   2 +
 3 files changed, 428 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 4548435..37eb291 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -29,6 +29,7 @@
 #include <public/argo.h>
 
 #define MAX_RINGS_PER_DOMAIN            128U
+#define MAX_NOTIFY_COUNT                256U
 #define MAX_PENDING_PER_RING             32U
 
 /* All messages on the ring are padded to a multiple of the slot size. */
@@ -43,6 +44,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_iov_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_data_t);
+DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_send_addr_t);
 DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
 
@@ -231,6 +234,13 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
 #define argo_dprintk(format, ... ) ((void)0)
 #endif
 
+static struct argo_ring_info *
+ring_find_info(const struct domain *d, const struct argo_ring_id *id);
+
+static struct argo_ring_info *
+ring_find_info_by_match(const struct domain *d, uint32_t port,
+                        domid_t partner_id);
+
 /*
  * This hash function is used to distribute rings within the per-domain
  * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
@@ -265,6 +275,17 @@ signal_domain(struct domain *d)
 }
 
 static void
+signal_domid(domid_t domain_id)
+{
+    struct domain *d = get_domain_by_id(domain_id);
+    if ( !d )
+        return;
+
+    signal_domain(d);
+    put_domain(d);
+}
+
+static void
 ring_unmap(struct argo_ring_info *ring_info)
 {
     unsigned int i;
@@ -473,6 +494,62 @@ get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
     return 0;
 }
 
+static uint32_t
+ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
+{
+    xen_argo_ring_t ring;
+    uint32_t len;
+    int32_t ret;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    len = ring_info->len;
+    if ( !len )
+        return 0;
+
+    ret = get_sanitized_ring(&ring, ring_info);
+    if ( ret )
+        return 0;
+
+    argo_dprintk("sanitized ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
+                 ring.tx_ptr, ring.rx_ptr);
+
+    /*
+     * rx_ptr == tx_ptr means that the ring has been emptied, so return
+     * the maximum payload size that can be accepted -- see message size
+     * checking logic in the entry to ringbuf_insert which ensures that
+     * there is always one message slot (of size ROUNDUP_MESSAGE(1)) left
+     * available, preventing a ring from being entirely filled. This ensures
+     * that matching ring indexes always indicate an empty ring and not a
+     * full one.
+     * The subtraction here will not underflow due to minimum size constraints
+     * enforced on ring size elsewhere.
+     */
+    if ( ring.rx_ptr == ring.tx_ptr )
+        return len - sizeof(struct xen_argo_ring_message_header)
+                   - ROUNDUP_MESSAGE(1);
+
+    ret = ring.rx_ptr - ring.tx_ptr;
+    if ( ret < 0 )
+        ret += len;
+
+    /*
+     * The maximum size payload for a message that will be accepted is:
+     * (the available space between the ring indexes)
+     *    minus (space for a message header)
+     *    minus (space for one message slot)
+     * since ringbuf_insert requires that one message slot be left
+     * unfilled, to avoid filling the ring to capacity and confusing a full
+     * ring with an empty one.
+     * Since the ring indexes are sanitized, the value in ret is aligned, so
+     * the simple subtraction here works to return the aligned value needed:
+     */
+    ret -= sizeof(struct xen_argo_ring_message_header);
+    ret -= ROUNDUP_MESSAGE(1);
+
+    return (ret < 0) ? 0 : ret;
+}
+
 /*
  * iov_count returns its count on success via an out variable to avoid
  * potential for a negative return value to be used incorrectly
@@ -812,6 +889,61 @@ pending_remove_all(struct argo_ring_info *ring_info)
     ring_info->npending = 0;
 }
 
+static void
+pending_notify(struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    hlist_for_each_entry_safe(ent, node, next, to_notify, node)
+    {
+        hlist_del(&ent->node);
+        signal_domid(ent->domain_id);
+        xfree(ent);
+    }
+}
+
+static void
+pending_find(const struct domain *d, struct argo_ring_info *ring_info,
+             uint32_t payload_space, struct hlist_head *to_notify)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    /*
+     * TODO: Current policy here is to signal _all_ of the waiting domains
+     *       interested in sending a message of size less than payload_space.
+     *
+     * This is likely to be suboptimal, since once one of them has added
+     * their message to the ring, there may well be insufficient room
+     * available for any of the others to transmit, meaning that they were
+     * woken in vain, which created extra work just to requeue their wait.
+     *
+     * Retain this simple policy for now since it at least avoids starving a
+     * domain of available space notifications because of a policy that only
+     * notified other domains instead. Improvement may be possible;
+     * investigation required.
+     */
+
+    spin_lock(&ring_info->lock);
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( payload_space >= ent->len )
+        {
+            if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+                wildcard_pending_list_remove(ent->domain_id, ent);
+            hlist_del(&ent->node);
+            ring_info->npending--;
+            hlist_add_head(&ent->node, to_notify);
+        }
+    }
+    spin_unlock(&ring_info->lock);
+}
+
 static int
 pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
               unsigned int len)
@@ -874,6 +1006,27 @@ pending_requeue(struct argo_ring_info *ring_info, domid_t src_id,
 }
 
 static void
+pending_cancel(struct argo_ring_info *ring_info, domid_t src_id)
+{
+    struct hlist_node *node, *next;
+    struct pending_ent *ent;
+
+    ASSERT(spin_is_locked(&ring_info->lock));
+
+    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
+    {
+        if ( ent->domain_id == src_id )
+        {
+            if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+                wildcard_pending_list_remove(ent->domain_id, ent);
+            hlist_del(&ent->node);
+            xfree(ent);
+            ring_info->npending--;
+        }
+    }
+}
+
+static void
 wildcard_rings_pending_remove(struct domain *d)
 {
     struct hlist_node *node, *next;
@@ -994,6 +1147,92 @@ partner_rings_remove(struct domain *src_d)
 }
 
 static int
+fill_ring_data(const struct domain *currd,
+               XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t) data_ent_hnd)
+{
+    xen_argo_ring_data_ent_t ent;
+    struct domain *dst_d;
+    struct argo_ring_info *ring_info;
+    int ret;
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    ret = __copy_from_guest(&ent, data_ent_hnd, 1) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    argo_dprintk("fill_ring_data: ent.ring.domain=%u,ent.ring.port=%x\n",
+                 ent.ring.domain_id, ent.ring.port);
+
+    ent.flags = 0;
+
+    dst_d = get_domain_by_id(ent.ring.domain_id);
+    if ( dst_d )
+    {
+        if ( dst_d->argo )
+        {
+            read_lock(&dst_d->argo->lock);
+
+            ring_info = ring_find_info_by_match(dst_d, ent.ring.port,
+                                                currd->domain_id);
+            if ( ring_info )
+            {
+                uint32_t space_avail;
+
+                ent.flags |= XEN_ARGO_RING_DATA_F_EXISTS;
+                ent.max_message_size = ring_info->len -
+                                   sizeof(struct xen_argo_ring_message_header) -
+                                   ROUNDUP_MESSAGE(1);
+
+                if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
+                    ent.flags |= XEN_ARGO_RING_DATA_F_SHARED;
+
+                spin_lock(&ring_info->lock);
+
+                space_avail = ringbuf_payload_space(dst_d, ring_info);
+
+                argo_dprintk("fill_ring_data: port=%x space_avail=%u"
+                             " space_wanted=%u\n",
+                             ring_info->id.port, space_avail,
+                             ent.space_required);
+
+                /* Do not queue a notification for an unachievable size */
+                if ( ent.space_required > ent.max_message_size )
+                    ent.flags |= XEN_ARGO_RING_DATA_F_EMSGSIZE;
+                else if ( space_avail >= ent.space_required )
+                {
+                    pending_cancel(ring_info, currd->domain_id);
+                    ent.flags |= XEN_ARGO_RING_DATA_F_SUFFICIENT;
+                }
+                else
+                {
+                    pending_requeue(ring_info, currd->domain_id,
+                                    ent.space_required);
+                    ent.flags |= XEN_ARGO_RING_DATA_F_PENDING;
+                }
+
+                spin_unlock(&ring_info->lock);
+
+                if ( space_avail == ent.max_message_size )
+                    ent.flags |= XEN_ARGO_RING_DATA_F_EMPTY;
+
+            }
+            read_unlock(&dst_d->argo->lock);
+        }
+        put_domain(dst_d);
+    }
+
+    ret = __copy_field_to_guest(data_ent_hnd, &ent, flags) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    ret = __copy_field_to_guest(data_ent_hnd, &ent, max_message_size) ?
+                -EFAULT : 0;
+ out:
+    return ret;
+}
+
+static int
 find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
 {
     p2m_type_t p2mt;
@@ -1526,6 +1765,111 @@ register_ring(struct domain *currd,
     return ret;
 }
 
+static void
+notify_ring(struct domain *d, struct argo_ring_info *ring_info,
+            struct hlist_head *to_notify)
+{
+    uint32_t space;
+
+    ASSERT(rw_is_locked(&argo_lock));
+    ASSERT(rw_is_locked(&d->argo->lock));
+
+    spin_lock(&ring_info->lock);
+
+    if ( ring_info->len )
+        space = ringbuf_payload_space(d, ring_info);
+    else
+        space = 0;
+
+    spin_unlock(&ring_info->lock);
+
+    if ( space )
+        pending_find(d, ring_info, space, to_notify);
+}
+
+static void
+notify_check_pending(struct domain *currd)
+{
+    unsigned int i;
+    HLIST_HEAD(to_notify);
+
+    ASSERT(rw_is_locked(&argo_lock));
+
+    read_lock(&currd->argo->lock);
+
+    for ( i = 0; i < ARGO_HTABLE_SIZE; i++ )
+    {
+        struct hlist_node *node, *next;
+        struct argo_ring_info *ring_info;
+
+        hlist_for_each_entry_safe(ring_info, node, next,
+                                  &currd->argo->ring_hash[i], node)
+        {
+            notify_ring(currd, ring_info, &to_notify);
+        }
+    }
+
+    read_unlock(&currd->argo->lock);
+
+    if ( !hlist_empty(&to_notify) )
+        pending_notify(&to_notify);
+}
+
+static long
+notify(struct domain *currd,
+       XEN_GUEST_HANDLE_PARAM(xen_argo_ring_data_t) ring_data_hnd)
+{
+    XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t) ent_hnd;
+    xen_argo_ring_data_t ring_data;
+    int ret = 0;
+
+    read_lock(&argo_lock);
+
+    if ( !currd->argo )
+    {
+        argo_dprintk("!d->argo, ENODEV\n");
+        ret = -ENODEV;
+        goto out;
+    }
+
+    notify_check_pending(currd);
+
+    if ( guest_handle_is_null(ring_data_hnd) )
+        goto out;
+
+    ret = copy_from_guest(&ring_data, ring_data_hnd, 1) ? -EFAULT : 0;
+    if ( ret )
+        goto out;
+
+    if ( ring_data.nent > MAX_NOTIFY_COUNT )
+    {
+        gprintk(XENLOG_ERR,
+                "argo: notify entry count(%u) exceeds max(%u)\n",
+                ring_data.nent, MAX_NOTIFY_COUNT);
+        ret = -EACCES;
+        goto out;
+    }
+
+    ent_hnd = guest_handle_for_field(ring_data_hnd,
+                                     xen_argo_ring_data_ent_t, data[0]);
+    if ( unlikely(!guest_handle_okay(ent_hnd, ring_data.nent)) )
+    {
+        ret = -EFAULT;
+        goto out;
+    }
+
+    while ( !ret && ring_data.nent-- )
+    {
+        ret = fill_ring_data(currd, ent_hnd);
+        guest_handle_add_offset(ent_hnd, 1);
+    }
+
+ out:
+    read_unlock(&argo_lock);
+
+    return ret;
+}
+
 static long
 sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
       const xen_argo_addr_t *dst_addr,
@@ -1726,6 +2070,21 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
         break;
     }
 
+    case XEN_ARGO_OP_notify:
+    {
+        XEN_GUEST_HANDLE_PARAM(xen_argo_ring_data_t) ring_data_hnd =
+                   guest_handle_cast(arg1, xen_argo_ring_data_t);
+
+        if ( unlikely((!guest_handle_is_null(arg2)) || arg3 || arg4) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = notify(currd, ring_data_hnd);
+        break;
+    }
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 8f7d05d..25fed82 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -128,6 +128,42 @@ typedef struct xen_argo_unregister_ring
 /* Messages on the ring are padded to a multiple of this size. */
 #define XEN_ARGO_MSG_SLOT_SIZE 0x10
 
+/*
+ * Notify flags
+ */
+/* Ring is empty */
+#define XEN_ARGO_RING_DATA_F_EMPTY       (1U << 0)
+/* Ring exists */
+#define XEN_ARGO_RING_DATA_F_EXISTS      (1U << 1)
+/* Pending interrupt exists. Do not rely on this field - for profiling only */
+#define XEN_ARGO_RING_DATA_F_PENDING     (1U << 2)
+/* Sufficient space to queue space_required bytes exists */
+#define XEN_ARGO_RING_DATA_F_SUFFICIENT  (1U << 3)
+/* Insufficient ring size for space_required bytes */
+#define XEN_ARGO_RING_DATA_F_EMSGSIZE    (1U << 4)
+/* Ring is shared, not unicast */
+#define XEN_ARGO_RING_DATA_F_SHARED      (1U << 5)
+
+typedef struct xen_argo_ring_data_ent
+{
+    xen_argo_addr_t ring;
+    uint16_t flags;
+    uint16_t pad;
+    uint32_t space_required;
+    uint32_t max_message_size;
+} xen_argo_ring_data_ent_t;
+
+typedef struct xen_argo_ring_data
+{
+    uint32_t nent;
+    uint32_t pad;
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+    xen_argo_ring_data_ent_t data[];
+#elif defined(__GNUC__)
+    xen_argo_ring_data_ent_t data[0];
+#endif
+} xen_argo_ring_data_t;
+
 struct xen_argo_ring_message_header
 {
     uint32_t len;
@@ -207,4 +243,35 @@ struct xen_argo_ring_message_header
  */
 #define XEN_ARGO_OP_sendv               3
 
+/*
+ * XEN_ARGO_OP_notify
+ *
+ * Asks Xen for information about other rings in the system.
+ *
+ * ent->ring is the xen_argo_addr_t of the ring you want information on.
+ * Uses the same ring matching rules as XEN_ARGO_OP_sendv.
+ *
+ * ent->space_required : if this field is not null then Xen will check
+ * that there is space in the destination ring for this many bytes of payload.
+ * If the ring is too small for the requested space_required, it will set the
+ * XEN_ARGO_RING_DATA_F_EMSGSIZE flag on return.
+ * If sufficient space is available, it will set XEN_ARGO_RING_DATA_F_SUFFICIENT
+ * and CANCEL any pending notification for that ent->ring; otherwise it
+ * will schedule a notification event and the flag will not be set.
+ *
+ * These flags are set by Xen when notify replies:
+ * XEN_ARGO_RING_DATA_F_EMPTY      ring is empty
+ * XEN_ARGO_RING_DATA_F_PENDING    notify event is pending *don't rely on this*
+ * XEN_ARGO_RING_DATA_F_SUFFICIENT sufficient space for space_required is there
+ * XEN_ARGO_RING_DATA_F_EXISTS     ring exists
+ * XEN_ARGO_RING_DATA_F_EMSGSIZE   space_required too large for the ring size
+ * XEN_ARGO_RING_DATA_F_SHARED     ring is registered for wildcard partner
+ *
+ * arg1: XEN_GUEST_HANDLE(xen_argo_ring_data_t) ring_data (may be NULL)
+ * arg2: NULL
+ * arg3: 0 (ZERO)
+ * arg4: 0 (ZERO)
+ */
+#define XEN_ARGO_OP_notify              4
+
 #endif
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 3723980..e45b60e 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -154,3 +154,5 @@
 ?	argo_unregister_ring		argo.h
 ?	argo_iov			argo.h
 ?	argo_send_addr			argo.h
+?	argo_ring_data_ent		argo.h
+?	argo_ring_data			argo.h
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 11/15] xsm, argo: XSM control for argo register
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (9 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 10/15] argo: implement the notify op Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 12/15] xsm, argo: XSM control for argo message send operation Christopher Clark
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, James McKenzie, Eric Chanudet, Roger Pau Monne

XSM controls for argo ring registration with two distinct cases, where
the ring being registered is:

1) Single source:  registering a ring for communication to receive messages
                   from a specified single other domain.
   Default policy: allow.

2) Any source:     registering a ring for communication to receive messages
                   from any, or all, other domains (ie. wildcard).
   Default policy: deny, with runtime policy configuration via bootparam.

The existing argo-mac boot parameter indicates administrator preference for
either permissive or strict access control, which will allow or deny
registration of any-sender rings.

This commit modifies the signature of core XSM hook functions in order to
apply 'const' to arguments, needed in order for 'const' to be accepted in
signature of functions that invoke them.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 feedback #9 Jan: refactor to use argo-mac bootparam at point of introduction
v1 feedback Paul: replace use of strncmp with strcmp
v1 feedback #16 Jan: apply const to function signatures
v1 feedback #14 Jan: add blank line before return in parse_argo_mac_param

 xen/common/argo.c                     | 14 ++++++++++----
 xen/include/xsm/dummy.h               | 15 +++++++++++++++
 xen/include/xsm/xsm.h                 | 19 +++++++++++++++++++
 xen/xsm/dummy.c                       |  4 ++++
 xen/xsm/flask/hooks.c                 | 27 ++++++++++++++++++++++++---
 xen/xsm/flask/policy/access_vectors   | 11 +++++++++++
 xen/xsm/flask/policy/security_classes |  1 +
 7 files changed, 84 insertions(+), 7 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 37eb291..1674f18 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -26,6 +26,7 @@
 #include <xen/lib.h>
 #include <xen/nospec.h>
 #include <xen/time.h>
+#include <xsm/xsm.h>
 #include <public/argo.h>
 
 #define MAX_RINGS_PER_DOMAIN            128U
@@ -1582,11 +1583,9 @@ register_ring(struct domain *currd,
 
     if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
     {
-        if ( opt_argo_mac_enforcing )
-        {
-            ret = -EPERM;
+        ret = xsm_argo_register_any_source(currd, opt_argo_mac_enforcing);
+        if ( ret )
             goto out_unlock;
-        }
     }
     else
     {
@@ -1598,6 +1597,13 @@ register_ring(struct domain *currd,
             goto out_unlock;
         }
 
+        ret = xsm_argo_register_single_source(currd, dst_d);
+        if ( ret )
+        {
+            put_domain(dst_d);
+            goto out_unlock;
+        }
+
         if ( !dst_d->argo )
         {
             argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index a29d1ef..55113c3 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -720,6 +720,21 @@ static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 
 #endif /* CONFIG_X86 */
 
+#ifdef CONFIG_ARGO
+static XSM_INLINE int xsm_argo_register_single_source(struct domain *d,
+                                                      struct domain *t)
+{
+    return 0;
+}
+
+static XSM_INLINE int xsm_argo_register_any_source(struct domain *d,
+                                                   bool strict)
+{
+    return strict ? -EPERM : 0;
+}
+
+#endif /* CONFIG_ARGO */
+
 #include <public/version.h>
 static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3b192b5..e775a6d 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -181,6 +181,11 @@ struct xsm_operations {
 #endif
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
+#ifdef CONFIG_ARGO
+    int (*argo_register_single_source) (const struct domain *d,
+                                        const struct domain *t);
+    int (*argo_register_any_source) (const struct domain *d);
+#endif
 };
 
 #ifdef CONFIG_XSM
@@ -698,6 +703,20 @@ static inline int xsm_domain_resource_map(xsm_default_t def, struct domain *d)
     return xsm_ops->domain_resource_map(d);
 }
 
+#ifdef CONFIG_ARGO
+static inline xsm_argo_register_single_source(const struct domain *d,
+                                              const struct domain *t)
+{
+    return xsm_ops->argo_register_single_source(d, t);
+}
+
+static inline xsm_argo_register_any_source(const struct domain *d, bool strict)
+{
+    return xsm_ops->argo_register_any_source(d);
+}
+
+#endif /* CONFIG_ARGO */
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 5701047..ed236b0 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -152,4 +152,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
 #endif
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
+#ifdef CONFIG_ARGO
+    set_to_dummy_if_null(ops, argo_register_single_source);
+    set_to_dummy_if_null(ops, argo_register_any_source);
+#endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 96d31aa..fcb7487 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -36,13 +36,14 @@
 #include <objsec.h>
 #include <conditional.h>
 
-static u32 domain_sid(struct domain *dom)
+static u32 domain_sid(const struct domain *dom)
 {
     struct domain_security_struct *dsec = dom->ssid;
     return dsec->sid;
 }
 
-static u32 domain_target_sid(struct domain *src, struct domain *dst)
+static u32 domain_target_sid(const struct domain *src,
+                             const struct domain *dst)
 {
     struct domain_security_struct *ssec = src->ssid;
     struct domain_security_struct *dsec = dst->ssid;
@@ -58,7 +59,8 @@ static u32 evtchn_sid(const struct evtchn *chn)
     return chn->ssid.flask_sid;
 }
 
-static int domain_has_perm(struct domain *dom1, struct domain *dom2, 
+static int domain_has_perm(const struct domain *dom1,
+                           const struct domain *dom2,
                            u16 class, u32 perms)
 {
     u32 ssid, tsid;
@@ -1717,6 +1719,21 @@ static int flask_domain_resource_map(struct domain *d)
     return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__RESOURCE_MAP);
 }
 
+#ifdef CONFIG_ARGO
+static int flask_argo_register_single_source(const struct domain *d,
+                                             const struct domain *t)
+{
+    return domain_has_perm(d, t, SECCLASS_ARGO,
+                           ARGO__REGISTER_SINGLE_SOURCE);
+}
+
+static int flask_argo_register_any_source(const struct domain *d)
+{
+    return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
+                        ARGO__REGISTER_ANY_SOURCE, NULL);
+}
+#endif
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1851,6 +1868,10 @@ static struct xsm_operations flask_ops = {
 #endif
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
+#ifdef CONFIG_ARGO
+    .argo_register_single_source = flask_argo_register_single_source,
+    .argo_register_any_source = flask_argo_register_any_source,
+#endif
 };
 
 void __init flask_init(const void *policy_buffer, size_t policy_size)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 6fecfda..fb95c97 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -531,3 +531,14 @@ class version
 # Xen build id
     xen_build_id
 }
+
+# Class argo is used to describe the Argo interdomain communication system.
+class argo
+{
+    # Domain requesting registration of a communication ring
+    # to receive messages from a specific other domain.
+    register_single_source
+    # Domain requesting registration of a communication ring
+    # to receive messages from any other domain.
+    register_any_source
+}
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index cde4e1a..50ecbab 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -19,5 +19,6 @@ class event
 class grant
 class security
 class version
+class argo
 
 # FLASK
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 12/15] xsm, argo: XSM control for argo message send operation
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (10 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 11/15] xsm, argo: XSM control for argo register Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 13/15] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, James McKenzie, Eric Chanudet, Roger Pau Monne

Default policy: allow.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2: reordered commit sequence to after sendv implementation
v1 feedback Jan #16: apply const to function signatures
v1 version was:
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

 xen/common/argo.c                   | 8 ++++++++
 xen/include/xsm/dummy.h             | 6 ++++++
 xen/include/xsm/xsm.h               | 6 ++++++
 xen/xsm/dummy.c                     | 1 +
 xen/xsm/flask/hooks.c               | 7 +++++++
 xen/xsm/flask/policy/access_vectors | 2 ++
 6 files changed, 30 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 1674f18..2c0348a 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -1922,6 +1922,14 @@ sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
         goto out_unlock;
     }
 
+    ret = xsm_argo_send(src_d, dst_d);
+    if ( ret )
+    {
+        gprintk(XENLOG_ERR, "argo: XSM REJECTED %i -> %i\n",
+                src_addr->domain_id, dst_addr->domain_id);
+        goto out_unlock;
+    }
+
     read_lock(&dst_d->argo->lock);
 
     ring_info = ring_find_info_by_match(dst_d, dst_addr->port,
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 55113c3..05d10b5 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -733,6 +733,12 @@ static XSM_INLINE int xsm_argo_register_any_source(struct domain *d,
     return strict ? -EPERM : 0;
 }
 
+static XSM_INLINE int xsm_argo_send(const struct domain *d,
+                                    const struct domain *t)
+{
+    return 0;
+}
+
 #endif /* CONFIG_ARGO */
 
 #include <public/version.h>
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index e775a6d..4d4a60c 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -185,6 +185,7 @@ struct xsm_operations {
     int (*argo_register_single_source) (const struct domain *d,
                                         const struct domain *t);
     int (*argo_register_any_source) (const struct domain *d);
+    int (*argo_send) (const struct domain *d, const struct domain *t);
 #endif
 };
 
@@ -715,6 +716,11 @@ static inline xsm_argo_register_any_source(const struct domain *d, bool strict)
     return xsm_ops->argo_register_any_source(d);
 }
 
+static inline int xsm_argo_send(const struct domain *d, const struct domain *t)
+{
+    return xsm_ops->argo_send(d, t);
+}
+
 #endif /* CONFIG_ARGO */
 
 #endif /* XSM_NO_WRAPPERS */
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index ed236b0..ffac774 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -155,5 +155,6 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
 #ifdef CONFIG_ARGO
     set_to_dummy_if_null(ops, argo_register_single_source);
     set_to_dummy_if_null(ops, argo_register_any_source);
+    set_to_dummy_if_null(ops, argo_send);
 #endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index fcb7487..76c012c 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1732,6 +1732,12 @@ static int flask_argo_register_any_source(const struct domain *d)
     return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
                         ARGO__REGISTER_ANY_SOURCE, NULL);
 }
+
+static int flask_argo_send(const struct domain *d, const struct domain *t)
+{
+    return domain_has_perm(d, t, SECCLASS_ARGO, ARGO__SEND);
+}
+
 #endif
 
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
@@ -1871,6 +1877,7 @@ static struct xsm_operations flask_ops = {
 #ifdef CONFIG_ARGO
     .argo_register_single_source = flask_argo_register_single_source,
     .argo_register_any_source = flask_argo_register_any_source,
+    .argo_send = flask_argo_send,
 #endif
 };
 
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index fb95c97..f6c5377 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -541,4 +541,6 @@ class argo
     # Domain requesting registration of a communication ring
     # to receive messages from any other domain.
     register_any_source
+    # Domain sending a message to another domain.
+    send
 }
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 13/15] xsm, argo: XSM control for any access to argo by a domain
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (11 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 12/15] xsm, argo: XSM control for argo message send operation Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 14/15] xsm, argo: notify: don't describe rings that cannot be sent to Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery Christopher Clark
  14 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, James McKenzie, Eric Chanudet, Roger Pau Monne

Will inhibit initialization of the domain's argo data structure to
prevent receiving any messages or notifications and access to any of
the argo hypercall operations.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
v2 self: fix xsm use in soft-reset prior to introduction
v1 #5 (#17) feedback Paul: XSM control for any access: use currd
v1 #16 feedback Jan: apply const to function signatures

 xen/common/argo.c                   | 6 +++---
 xen/include/xsm/dummy.h             | 5 +++++
 xen/include/xsm/xsm.h               | 6 ++++++
 xen/xsm/dummy.c                     | 1 +
 xen/xsm/flask/hooks.c               | 7 +++++++
 xen/xsm/flask/policy/access_vectors | 3 +++
 6 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 2c0348a..31535bd 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -1984,7 +1984,7 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
     argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
                  (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
 
-    if ( unlikely(!opt_argo_enabled) )
+    if ( unlikely(!opt_argo_enabled || xsm_argo_enable(currd)) )
     {
         rc = -EOPNOTSUPP;
         return rc;
@@ -2134,7 +2134,7 @@ argo_init(struct domain *d)
 {
     struct argo_domain *argo;
 
-    if ( !opt_argo_enabled )
+    if ( !opt_argo_enabled || xsm_argo_enable(d) )
     {
         argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
         return 0;
@@ -2190,7 +2190,7 @@ argo_soft_reset(struct domain *d)
         partner_rings_remove(d);
         wildcard_rings_pending_remove(d);
 
-        if ( !opt_argo_enabled )
+        if ( !opt_argo_enabled || xsm_argo_enable(d) )
         {
             xfree(d->argo);
             d->argo = NULL;
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 05d10b5..91a21c3 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -721,6 +721,11 @@ static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 #endif /* CONFIG_X86 */
 
 #ifdef CONFIG_ARGO
+static XSM_INLINE int xsm_argo_enable(struct domain *d)
+{
+    return 0;
+}
+
 static XSM_INLINE int xsm_argo_register_single_source(struct domain *d,
                                                       struct domain *t)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 4d4a60c..e300ebc 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -182,6 +182,7 @@ struct xsm_operations {
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
 #ifdef CONFIG_ARGO
+    int (*argo_enable) (const struct domain *d);
     int (*argo_register_single_source) (const struct domain *d,
                                         const struct domain *t);
     int (*argo_register_any_source) (const struct domain *d);
@@ -705,6 +706,11 @@ static inline int xsm_domain_resource_map(xsm_default_t def, struct domain *d)
 }
 
 #ifdef CONFIG_ARGO
+static inline xsm_argo_enable(const struct domain *d)
+{
+    return xsm_ops->argo_enable(d);
+}
+
 static inline xsm_argo_register_single_source(const struct domain *d,
                                               const struct domain *t)
 {
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index ffac774..1fe0e74 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -153,6 +153,7 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
 #ifdef CONFIG_ARGO
+    set_to_dummy_if_null(ops, argo_enable);
     set_to_dummy_if_null(ops, argo_register_single_source);
     set_to_dummy_if_null(ops, argo_register_any_source);
     set_to_dummy_if_null(ops, argo_send);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 76c012c..3d00c74 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1720,6 +1720,12 @@ static int flask_domain_resource_map(struct domain *d)
 }
 
 #ifdef CONFIG_ARGO
+static int flask_argo_enable(const struct domain *d)
+{
+    return avc_has_perm(domain_sid(d), SECINITSID_XEN, SECCLASS_ARGO,
+                        ARGO__ENABLE, NULL);
+}
+
 static int flask_argo_register_single_source(const struct domain *d,
                                              const struct domain *t)
 {
@@ -1875,6 +1881,7 @@ static struct xsm_operations flask_ops = {
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
 #ifdef CONFIG_ARGO
+    .argo_enable = flask_argo_enable,
     .argo_register_single_source = flask_argo_register_single_source,
     .argo_register_any_source = flask_argo_register_any_source,
     .argo_send = flask_argo_send,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index f6c5377..e00448b 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -535,6 +535,9 @@ class version
 # Class argo is used to describe the Argo interdomain communication system.
 class argo
 {
+    # Enable initialization of a domain's argo subsystem and
+    # permission to access the argo hypercall operations.
+    enable
     # Domain requesting registration of a communication ring
     # to receive messages from a specific other domain.
     register_single_source
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 14/15] xsm, argo: notify: don't describe rings that cannot be sent to
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (12 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 13/15] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-07  7:42 ` [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery Christopher Clark
  14 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	Daniel De Graaf, James McKenzie, Eric Chanudet, Roger Pau Monne

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/argo.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index 31535bd..0dc064d 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -1172,6 +1172,17 @@ fill_ring_data(const struct domain *currd,
     {
         if ( dst_d->argo )
         {
+            /*
+             * Don't supply information about rings that a guest is not
+             * allowed to send to.
+             */
+            ret = xsm_argo_send(currd, dst_d);
+            if ( ret )
+            {
+                put_domain(dst_d);
+                goto out;
+            }
+
             read_lock(&dst_d->argo->lock);
 
             ring_info = ring_find_info_by_match(dst_d, ent.ring.port,
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
                   ` (13 preceding siblings ...)
  2019-01-07  7:42 ` [PATCH v3 14/15] xsm, argo: notify: don't describe rings that cannot be sent to Christopher Clark
@ 2019-01-07  7:42 ` Christopher Clark
  2019-01-14 12:57   ` Jan Beulich
  14 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-07  7:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Argo doesn't use compat hypercall or argument translation but can use some
of the infrastructure for validating the hypercall argument structures to
ensure that the struct sizes, offsets and compositions don't vary between 32
and 64bit, so add that here in a new dedicated source file for this purpose.

Some of the argo hypercall argument structures contain elements that are
hypercall argument structure types themselves, and the standard compat
structure validation does not handle this, since the types differ in compat
vs. non-compat versions; so for some of the tests the exact-type-match check
is replaced with a weaker, but still sufficient, sizeof check.

Then there are additional hypercall argument structures that contain
elements that do not have a fixed size (last element, variable length array
fields), so we have to then disable that size check too for validating those
structures; the coverage of offset of elements is still retained.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
---
 xen/common/Makefile      |  2 +-
 xen/common/compat/argo.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/compat/argo.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 8c65c6f..88b9b2f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -70,7 +70,7 @@ obj-y += xmalloc_tlsf.o
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
 
-obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
+obj-$(CONFIG_COMPAT) += $(addprefix compat/,argo.o domain.o kernel.o memory.o multicall.o xlat.o)
 
 tmem-y := tmem.o tmem_xen.o tmem_control.o
 tmem-$(CONFIG_COMPAT) += compat/tmem_xen.o
diff --git a/xen/common/compat/argo.c b/xen/common/compat/argo.c
new file mode 100644
index 0000000..68f485d
--- /dev/null
+++ b/xen/common/compat/argo.c
@@ -0,0 +1,61 @@
+/******************************************************************************
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Copyright (c) 2018, BAE Systems
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/types.h>
+#include <xen/lib.h>
+#include <public/argo.h>
+#include <compat/argo.h>
+
+CHECK_argo_addr;
+CHECK_argo_register_ring;
+CHECK_argo_unregister_ring;
+
+/*
+ * Disable strict type checking in this compat validation macro for the
+ * following struct checks because it cannot handle fields within structs that
+ * have types that differ in the compat versus non-compat structs.
+ * Replace it with a field size check which is sufficient here.
+ */
+
+#undef CHECK_FIELD_COMMON_
+#define CHECK_FIELD_COMMON_(k, name, n, f) \
+static inline int __maybe_unused name(k xen_ ## n *x, k compat_ ## n *c) \
+{ \
+    BUILD_BUG_ON(offsetof(k xen_ ## n, f) != \
+                 offsetof(k compat_ ## n, f)); \
+    return sizeof(x->f) == sizeof(c->f); \
+}
+
+CHECK_argo_send_addr;
+CHECK_argo_ring_data_ent;
+CHECK_argo_iov;
+
+/*
+ * Disable sizeof type checking for the following struct checks because
+ * these structs have fields with variable size that the size check
+ * cannot validate.
+ */
+
+#undef CHECK_FIELD_COMMON_
+#define CHECK_FIELD_COMMON_(k, name, n, f) \
+static inline int __maybe_unused name(k xen_ ## n *x, k compat_ ## n *c) \
+{ \
+    BUILD_BUG_ON(offsetof(k xen_ ## n, f) != \
+                 offsetof(k compat_ ## n, f)); \
+    return 1; \
+}
+
+CHECK_argo_ring;
+CHECK_argo_ring_data;
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo
  2019-01-07  7:42 ` [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
@ 2019-01-08 15:46   ` Jan Beulich
  0 siblings, 0 replies; 104+ messages in thread
From: Jan Beulich @ 2019-01-08 15:46 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> Defines CONFIG_ARGO when enabled. Default: disabled.
> 
> When the Kconfig option is enabled, the Argo hypercall implementation
> will be included, allowing use of the hypervisor-mediated interdomain
> communication mechanism.
> 
> Argo is implemented for x86 and ARM hardware platforms.
> 
> Availability of the option depends on EXPERT and Argo is currently an
> experimental feature.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
but only for committing together with at least one patch actually
using the symbol.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging
  2019-01-07  7:42 ` [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
@ 2019-01-08 15:50   ` Jan Beulich
  2019-01-10  9:28   ` Roger Pau Monné
  1 sibling, 0 replies; 104+ messages in thread
From: Jan Beulich @ 2019-01-08 15:50 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> A convenience for working on development of the argo subsystem:
> setting a #define variable enables additional debug messages.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
with one further remark:

> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -19,6 +19,15 @@
>  #include <xen/errno.h>
>  #include <xen/guest_access.h>
>  
> +/* Change this to #define ARGO_DEBUG here to enable more debug messages */
> +#undef ARGO_DEBUG
> +
> +#ifdef ARGO_DEBUG
> +#define argo_dprintk(format, args...) printk("argo: " format, ## args )
> +#else
> +#define argo_dprintk(format, ... ) ((void)0)

This would better be an inline function, such that arguments passed in
actually get evaluated. Otherwise you risk overlooking variables used
for such logging only, and in particular the compiler then issuing
warnings (breaking the build due to -Werror).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field()
  2019-01-07  7:42 ` [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field() Christopher Clark
@ 2019-01-08 22:03   ` Stefano Stabellini
  0 siblings, 0 replies; 104+ messages in thread
From: Stefano Stabellini @ 2019-01-08 22:03 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Ross Philipson, Jason Andryuk, Daniel Smith,
	Rich Persaud, James McKenzie, Julien Grall, Paul Durrant,
	Jan Beulich, xen-devel, Eric Chanudet, Roger Pau Monne

On Sun, 6 Jan 2019, Christopher Clark wrote:
> ARM port of c/s bb544585: "introduce guest_handle_for_field()"
> 
> This helper turns a field of a GUEST_HANDLE into a GUEST_HANDLE.
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

> ---
>  xen/include/asm-arm/guest_access.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> index 224d2a0..8997a1c 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -63,6 +63,9 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
>      _y;                                                     \
>  })
>  
> +#define guest_handle_for_field(hnd, type, fld)          \
> +    ((XEN_GUEST_HANDLE(type)) { &(hnd).p->fld })
> +
>  #define guest_handle_from_ptr(ptr, type)        \
>      ((XEN_GUEST_HANDLE_PARAM(type)) { (type *)ptr })
>  #define const_guest_handle_from_ptr(ptr, type)  \
> -- 
> 2.7.4
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
@ 2019-01-08 22:08   ` Ross Philipson
  2019-01-08 22:23     ` Christopher Clark
  2019-01-08 22:54   ` Jason Andryuk
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 104+ messages in thread
From: Ross Philipson @ 2019-01-08 22:08 UTC (permalink / raw)
  To: Christopher Clark, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Jason Andryuk, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, James McKenzie, Eric Chanudet,
	Roger Pau Monne

On 01/07/2019 02:42 AM, Christopher Clark wrote:
> Initialises basic data structures and performs teardown of argo state
> for domain shutdown.
> 
> Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
> 
> Introduces a new Xen command line parameter 'argo': bool to enable/disable
> the argo hypercall. Defaults to disabled.
> 
> New headers:
>   public/argo.h: with definions of addresses and ring structure, including
>   indexes for atomic update for communication between domain and hypervisor.
> 
>   xen/argo.h: to expose the hooks for integration into domain lifecycle:
>     argo_init: per-domain init of argo data structures for domain_create.
>     argo_destroy: teardown for domain_destroy and the error exit
>                   path of domain_create.
>     argo_soft_reset: reset of domain state for domain_soft_reset.
> 
> Adds two new fields to struct domain:
>     rwlock_t argo_lock;
>     struct argo_domain *argo;
> 
> In accordance with recent work on _domain_destroy, argo_destroy is
> idempotent. It will tear down: all rings registered by this domain, all
> rings where this domain is the single sender (ie. specified partner,
> non-wildcard rings), and all pending notifications where this domain is
> awaiting signal about available space in the rings of other domains.
> 
> A count will be maintained of the number of rings that a domain has
> registered in order to limit it below the fixed maximum limit defined here.
> 
> The software license on the public header is the BSD license, standard
> procedure for the public Xen headers. The public header was originally
> posted under a GPL license at: [1]:
> https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
> 
> The following ACK by Lars Kurth is to confirm that only people being
> employees of Citrix contributed to the header files in the series posted at
> [1] and that thus the copyright of the files in question is fully owned by
> Citrix. The ACK also confirms that Citrix is happy for the header files to
> be published under a BSD license in this series (which is based on [1]).
> 
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> Acked-by: Lars Kurth <lars.kurth@citrix.com>

Other than an indentation issue in domain_rings_remove_all, this LGTM.

Reviewed-by: Ross Philipson <ross.philipson@oracle.com>

> ---
> v2 rewrite locking explanation comment
> v2 header copyright line now includes 2019
> v2 self: use ring_info backpointer in pending_ent to maintain npending
> v2 self: rename all_rings_remove_info to domain_rings_remove_all
> v2 feedback Jan: drop cookie, implement teardown
> v2 self: add npending to track number of pending entries per ring
> v2 self: amend comment on locking; drop section comments
> v2 cookie_eq: test low bits first and use likely on high bits
> v2 self: OVERHAUL
> v2 self: s/argo_pending_ent/pending_ent/g
> v2 self: drop pending_remove_ent, inline at single call site
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v2 #4 Lars: add Acked-by and details to commit message.
> v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> v2 bugfix: xsm use in soft-reset prior to introduction
> v2 feedback #9 Jan: drop 'message' from do_argo_message_op
> v1 #5 feedback Paul: init/destroy unsigned, brackets and whitespace fixes
> v1 #5 feedback Paul: Use mfn_eq for comparing mfns.
> v1 #5 feedback Paul: init/destroy : use currd
> v1 #6 (#5) feedback Jan: init/destroy: s/ENOSYS/EOPNOTSUPP/
> v1 #6 feedback Paul: Folded patch 6 into patch 5.
> v1 #6 feedback Jan: drop opt_argo_enabled initializer
> v1 $6 feedback Jan: s/ENOSYS/EOPNOTSUPP/g and drop useless dprintk
> v1. #5 feedback Paul: change the license on public header to BSD
> - ack from Lars at Citrix.
> v1. self, Jan: drop unnecessary xen include from sched.h
> v1. self, Jan: drop inclusion of public argo.h in private one
> v1. self, Jan: add include of public argo.h to argo.c
> v1. self, Jan: drop fwd decl of argo_domain in priv header
> v1. Paul/self/Jan: add data structures to xlat.lst and compat/argo.h to Makefile
> v1. self: removed allocation of event channel since switching to VIRQ
> v1. self: drop types.h include from private argo.h
> v1: reorder public argo include position
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: self: rename pending ent "id" to "domain_id"
> v1: self: add domain_cookie to ent struct
> v1. #15 feedback Jan: make cmd unsigned
> v1. #15 feedback Jan: make i loop variable unsigned
> v1: self: adjust dprintks in init, destroy
> v1: #18 feedback Jan: meld max ring count limit
> v1: self: use type not struct in public defn, affects compat gen header
> v1: feedback #15 Jan: handle upper-halves of hypercall args
> v1: add comment explaining the 'magic' field
> v1: self + Jan feedback: implement soft reset
> v1: feedback #13 Roger: use ASSERT_UNREACHABLE
> 
>  docs/misc/xen-command-line.pandoc |  11 +
>  xen/common/argo.c                 | 461 +++++++++++++++++++++++++++++++++++++-
>  xen/common/domain.c               |  20 ++
>  xen/include/Makefile              |   1 +
>  xen/include/public/argo.h         |  59 +++++
>  xen/include/xen/argo.h            |  23 ++
>  xen/include/xen/sched.h           |   6 +
>  xen/include/xlat.lst              |   2 +
>  8 files changed, 582 insertions(+), 1 deletion(-)
>  create mode 100644 xen/include/public/argo.h
>  create mode 100644 xen/include/xen/argo.h
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index a755a67..aea13eb 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>  in combination with cpuidle.  This option is only expected to be useful for
>  developers wishing Xen to fall back to older timing methods on newer hardware.
>  
> +### argo
> +> `= <boolean>`
> +
> +> Default: `false`
> +
> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> +
> +This allows domains access to the Argo hypercall, which supports registration
> +of memory rings with the hypervisor to receive messages, sending messages to
> +other domains by hypercall and querying the ring status of other domains.
> +
>  ### asid (x86)
>  > `= <boolean>`
>  
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 6f782f7..86195d3 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -17,7 +17,177 @@
>   */
>  
>  #include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/argo.h>
> +#include <xen/event.h>
> +#include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/time.h>
> +#include <public/argo.h>
> +
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled;
> +boolean_param("argo", opt_argo_enabled);
> +
> +typedef struct argo_ring_id
> +{
> +    uint32_t port;
> +    domid_t partner_id;
> +    domid_t domain_id;
> +} argo_ring_id;
> +
> +/* Data about a domain's own ring that it has registered */
> +struct argo_ring_info
> +{
> +    /* next node in the hash, protected by L2 */
> +    struct hlist_node node;
> +    /* this ring's id, protected by L2 */
> +    struct argo_ring_id id;
> +    /* L3 */
> +    spinlock_t lock;
> +    /* length of the ring, protected by L3 */
> +    uint32_t len;
> +    /* number of pages in the ring, protected by L3 */
> +    uint32_t npage;
> +    /* number of pages translated into mfns, protected by L3 */
> +    uint32_t nmfns;
> +    /* cached tx pointer location, protected by L3 */
> +    uint32_t tx_ptr;
> +    /* mapped ring pages protected by L3 */
> +    uint8_t **mfn_mapping;
> +    /* list of mfns of guest ring, protected by L3 */
> +    mfn_t *mfns;
> +    /* list of struct pending_ent for this ring, protected by L3 */
> +    struct hlist_head pending;
> +    /* number of pending entries queued for this ring, protected by L3 */
> +    uint32_t npending;
> +};
> +
> +/* Data about a single-sender ring, held by the sender (partner) domain */
> +struct argo_send_info
> +{
> +    /* next node in the hash, protected by Lsend */
> +    struct hlist_node node;
> +    /* this ring's id, protected by Lsend */
> +    struct argo_ring_id id;
> +};
> +
> +/* A space-available notification that is awaiting sufficient space */
> +struct pending_ent
> +{
> +    /* List node within argo_ring_info's pending list */
> +    struct hlist_node node;
> +    /*
> +     * List node within argo_domain's wildcard_pend_list. Only used if the
> +     * ring is one with a wildcard partner (ie. that any domain may send to)
> +     * to enable cancelling signals on wildcard rings on domain destroy.
> +     */
> +    struct hlist_node wildcard_node;
> +    /*
> +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> +     * ring_info->npending is decremented when ents for wildcard rings are
> +     * cancelled for domain destroy.
> +     * Caution: Must hold the correct locks before accessing ring_info via this.
> +     */
> +    struct argo_ring_info *ring_info;
> +    /* domain to be notified when space is available */
> +    domid_t domain_id;
> +    uint16_t pad;
> +    /* minimum ring space available that this signal is waiting upon */
> +    uint32_t len;
> +};
> +
> +/*
> + * The value of the argo element in a struct domain is
> + * protected by the global lock argo_lock: L1
> + */
> +#define ARGO_HTABLE_SIZE 32
> +struct argo_domain
> +{
> +    /* L2 */
> +    rwlock_t lock;
> +    /*
> +     * Hash table of argo_ring_info about rings this domain has registered.
> +     * Protected by L2.
> +     */
> +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> +    /* Counter of rings registered by this domain. Protected by L2. */
> +    uint32_t ring_count;
> +
> +    /* Lsend */
> +    spinlock_t send_lock;
> +    /*
> +     * Hash table of argo_send_info about rings other domains have registered
> +     * for this domain to send to. Single partner, non-wildcard rings.
> +     * Protected by Lsend.
> +     */
> +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> +
> +    /* Lwildcard */
> +    spinlock_t wildcard_lock;
> +    /*
> +     * List of pending space-available signals for this domain about wildcard
> +     * rings registered by other domains. Protected by Lwildcard.
> +     */
> +    struct hlist_head wildcard_pend_list;
> +};
> +
> +/*
> + * Locking is organized as follows:
> + *
> + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> + *              W(<lock>) means taking a write lock on it.
> + *
> + * L1 : The global lock: argo_lock
> + * Protects the argo elements of all struct domain *d in the system.
> + * It does not protect any of the elements of d->argo, only their
> + * addresses.
> + *
> + * By extension since the destruction of a domain with a non-NULL
> + * d->argo will need to free the d->argo pointer, holding W(L1)
> + * guarantees that no domains pointers that argo is interested in
> + * become invalid whilst this lock is held.
> + */
> +
> +static DEFINE_RWLOCK(argo_lock); /* L1 */
> +
> +/*
> + * L2 : The per-domain ring hash lock: d->argo->lock
> + * Holding a read lock on L2 protects the ring hash table and
> + * the elements in the hash_table d->argo->ring_hash, and
> + * the node and id fields in struct argo_ring_info in the
> + * hash table.
> + * Holding a write lock on L2 protects all of the elements of
> + * struct argo_ring_info.
> + *
> + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> + *
> + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> + * Protects all the fields within the argo_ring_info, aside from the ones that
> + * L2 already protects: node, id, lock.
> + *
> + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> + *
> + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> + * Protects the per-domain send hash table : d->argo->send_hash
> + * and the elements in the hash table, and the node and id fields
> + * in struct argo_send_info in the hash table.
> + *
> + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> + * Do not attempt to acquire a L2 on any domain after taking and while
> + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> + *
> + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> + * Protects the per-domain list of outstanding signals for space availability
> + * on wildcard rings.
> + *
> + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> + * No other locks are acquired after obtaining Lwildcard.
> + */
>  
>  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
>  #undef ARGO_DEBUG
> @@ -28,10 +198,299 @@
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>  
> +static void
> +ring_unmap(struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    if ( !ring_info->mfn_mapping )
> +        return;
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +    {
> +        if ( !ring_info->mfn_mapping[i] )
> +            continue;
> +        if ( ring_info->mfns )
> +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
> +                         mfn_x(ring_info->mfns[i]),
> +                         ring_info->mfn_mapping[i]);
> +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> +        ring_info->mfn_mapping[i] = NULL;
> +    }
> +}
> +
> +static void
> +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> +{
> +    struct domain *d = get_domain_by_id(domain_id);
> +    if ( !d )
> +        return;
> +
> +    if ( d->argo )
> +    {
> +        spin_lock(&d->argo->wildcard_lock);
> +        hlist_del(&ent->wildcard_node);
> +        spin_unlock(&d->argo->wildcard_lock);
> +    }
> +    put_domain(d);
> +}
> +
> +static void
> +pending_remove_all(struct argo_ring_info *ring_info)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> +    {
> +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +            wildcard_pending_list_remove(ent->domain_id, ent);
> +        hlist_del(&ent->node);
> +        xfree(ent);
> +    }
> +    ring_info->npending = 0;
> +}
> +
> +static void
> +wildcard_rings_pending_remove(struct domain *d)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    ASSERT(rw_is_write_locked(&argo_lock));
> +
> +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> +                              node)
> +    {
> +        hlist_del(&ent->node);
> +        ent->ring_info->npending--;
> +        hlist_del(&ent->wildcard_node);
> +        xfree(ent);
> +    }
> +}
> +
> +static void
> +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));
> +
> +    if ( !ring_info->mfns )
> +        return;
> +
> +    if ( !ring_info->mfn_mapping )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    ring_unmap(ring_info);
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> +
> +    xfree(ring_info->mfns);
> +    ring_info->mfns = NULL;
> +    ring_info->npage = 0;
> +    xfree(ring_info->mfn_mapping);
> +    ring_info->mfn_mapping = NULL;
> +    ring_info->nmfns = 0;
> +}
> +
> +static void
> +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));
> +
> +    pending_remove_all(ring_info);
> +    hlist_del(&ring_info->node);
> +    ring_remove_mfns(d, ring_info);
> +    xfree(ring_info);
> +}
> +
> +static void
> +domain_rings_remove_all(struct domain *d)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_ring_info *ring_info;
> +
> +        hlist_for_each_entry_safe(ring_info, node, next,
> +                                  &d->argo->ring_hash[i], node)
> +            ring_remove_info(d, ring_info);
> +    }
> +    d->argo->ring_count = 0;
> +}
> +
> +/*
> + * Tear down all rings of other domains where src_d domain is the partner.
> + * (ie. it is the single domain that can send to those rings.)
> + * This will also cancel any pending notifications about those rings.
> + */
> +static void
> +partner_rings_remove(struct domain *src_d)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&argo_lock));
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_send_info *send_info;
> +
> +        hlist_for_each_entry_safe(send_info, node, next,
> +                                  &src_d->argo->send_hash[i], node)
> +        {
> +            struct argo_ring_info *ring_info;
> +            struct domain *dst_d;
> +
> +            dst_d = get_domain_by_id(send_info->id.domain_id);
> +            if ( dst_d )
> +            {
> +                ring_info = ring_find_info(dst_d, &send_info->id);
> +                if ( ring_info )
> +                {
> +                    ring_remove_info(dst_d, ring_info);
> +                    dst_d->argo->ring_count--;
> +                }
> +
> +                put_domain(dst_d);
> +            }
> +
> +            hlist_del(&send_info->node);
> +            xfree(send_info);
> +        }
> +    }
> +}
> +
>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>             unsigned long arg4)
>  {
> -    return -ENOSYS;
> +    struct domain *currd = current->domain;
> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> +
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        return rc;
> +    }
> +
> +    domain_lock(currd);
> +
> +    switch (cmd)
> +    {
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    domain_unlock(currd);
> +
> +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> +
> +    return rc;
> +}
> +
> +static void
> +argo_domain_init(struct argo_domain *argo)
> +{
> +    unsigned int i;
> +
> +    rwlock_init(&argo->lock);
> +    spin_lock_init(&argo->send_lock);
> +    spin_lock_init(&argo->wildcard_lock);
> +    argo->ring_count = 0;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> +    }
> +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> +}
> +
> +int
> +argo_init(struct domain *d)
> +{
> +    struct argo_domain *argo;
> +
> +    if ( !opt_argo_enabled )
> +    {
> +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> +        return 0;
> +    }
> +
> +    argo_dprintk("init: domid: %d\n", d->domain_id);
> +
> +    argo = xmalloc(struct argo_domain);
> +    if ( !argo )
> +        return -ENOMEM;
> +
> +    write_lock(&argo_lock);
> +
> +    argo_domain_init(argo);
> +
> +    d->argo = argo;
> +
> +    write_unlock(&argo_lock);
> +
> +    return 0;
> +}
> +
> +void
> +argo_destroy(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +        xfree(d->argo);
> +        d->argo = NULL;
> +    }
> +    write_unlock(&argo_lock);
> +}
> +
> +void
> +argo_soft_reset(struct domain *d)
> +{
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +
> +        if ( !opt_argo_enabled )
> +        {
> +            xfree(d->argo);
> +            d->argo = NULL;
> +        }
> +        else
> +            argo_domain_init(d->argo);
> +    }
> +
> +    write_unlock(&argo_lock);
>  }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index c623dae..9596840 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -32,6 +32,7 @@
>  #include <xen/grant_table.h>
>  #include <xen/xenoprof.h>
>  #include <xen/irq.h>
> +#include <xen/argo.h>
>  #include <asm/debugger.h>
>  #include <asm/p2m.h>
>  #include <asm/processor.h>
> @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
>  
>      xfree(d->pbuf);
>  
> +#ifdef CONFIG_ARGO
> +    argo_destroy(d);
> +#endif
> +
>      rangeset_domain_destroy(d);
>  
>      free_cpumask_var(d->dirty_cpumask);
> @@ -376,6 +381,9 @@ struct domain *domain_create(domid_t domid,
>      spin_lock_init(&d->hypercall_deadlock_mutex);
>      INIT_PAGE_LIST_HEAD(&d->page_list);
>      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
> +#ifdef CONFIG_ARGO
> +    rwlock_init(&d->argo_lock);
> +#endif
>  
>      spin_lock_init(&d->node_affinity_lock);
>      d->node_affinity = NODE_MASK_ALL;
> @@ -445,6 +453,11 @@ struct domain *domain_create(domid_t domid,
>              goto fail;
>          init_status |= INIT_gnttab;
>  
> +#ifdef CONFIG_ARGO
> +        if ( (err = argo_init(d)) != 0 )
> +            goto fail;
> +#endif
> +
>          err = -ENOMEM;
>  
>          d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
> @@ -717,6 +730,9 @@ int domain_kill(struct domain *d)
>          if ( d->is_dying != DOMDYING_alive )
>              return domain_kill(d);
>          d->is_dying = DOMDYING_dying;
> +#ifdef CONFIG_ARGO
> +        argo_destroy(d);
> +#endif
>          evtchn_destroy(d);
>          gnttab_release_mappings(d);
>          tmem_destroy(d->tmem_client);
> @@ -1175,6 +1191,10 @@ int domain_soft_reset(struct domain *d)
>  
>      grant_table_warn_active_grants(d);
>  
> +#ifdef CONFIG_ARGO
> +    argo_soft_reset(d);
> +#endif
> +
>      for_each_vcpu ( d, v )
>      {
>          set_xen_guest_handle(runstate_guest(v), NULL);
> diff --git a/xen/include/Makefile b/xen/include/Makefile
> index f7895e4..3d14532 100644
> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -5,6 +5,7 @@ ifneq ($(CONFIG_COMPAT),)
>  compat-arch-$(CONFIG_X86) := x86_32
>  
>  headers-y := \
> +    compat/argo.h \
>      compat/callback.h \
>      compat/elfnote.h \
>      compat/event_channel.h \
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> new file mode 100644
> index 0000000..4818684
> --- /dev/null
> +++ b/xen/include/public/argo.h
> @@ -0,0 +1,59 @@
> +/******************************************************************************
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Derived from v4v, the version 2 of v2v.
> + *
> + * Copyright (c) 2010, Citrix Systems
> + * Copyright (c) 2018-2019, BAE Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __XEN_PUBLIC_ARGO_H__
> +#define __XEN_PUBLIC_ARGO_H__
> +
> +#include "xen.h"
> +
> +typedef struct xen_argo_addr
> +{
> +    uint32_t port;
> +    domid_t domain_id;
> +    uint16_t pad;
> +} xen_argo_addr_t;
> +
> +typedef struct xen_argo_ring
> +{
> +    /* Guests should use atomic operations to access rx_ptr */
> +    uint32_t rx_ptr;
> +    /* Guests should use atomic operations to access tx_ptr */
> +    uint32_t tx_ptr;
> +    /*
> +     * Header space reserved for later use. Align the start of the ring to a
> +     * multiple of the message slot size.
> +     */
> +    uint8_t reserved[56];
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +    uint8_t ring[];
> +#elif defined(__GNUC__)
> +    uint8_t ring[0];
> +#endif
> +} xen_argo_ring_t;
> +
> +#endif
> diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
> new file mode 100644
> index 0000000..29d32a9
> --- /dev/null
> +++ b/xen/include/xen/argo.h
> @@ -0,0 +1,23 @@
> +/******************************************************************************
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#ifndef __XEN_ARGO_H__
> +#define __XEN_ARGO_H__
> +
> +int argo_init(struct domain *d);
> +void argo_destroy(struct domain *d);
> +void argo_soft_reset(struct domain *d);
> +
> +#endif
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 4956a77..20418e7 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -490,6 +490,12 @@ struct domain
>          unsigned int guest_request_enabled       : 1;
>          unsigned int guest_request_sync          : 1;
>      } monitor;
> +
> +#ifdef CONFIG_ARGO
> +    /* Argo interdomain communication support */
> +    rwlock_t argo_lock;
> +    struct argo_domain *argo;
> +#endif
>  };
>  
>  /* Protect updates/reads (resp.) of domain_list and domain_hash. */
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index 5273320..9f616e4 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -148,3 +148,5 @@
>  ?	flask_setenforce		xsm/flask_op.h
>  !	flask_sid_context		xsm/flask_op.h
>  ?	flask_transition		xsm/flask_op.h
> +?	argo_addr			argo.h
> +?	argo_ring			argo.h
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-08 22:08   ` Ross Philipson
@ 2019-01-08 22:23     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-08 22:23 UTC (permalink / raw)
  To: Ross Philipson
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Jason Andryuk, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Tue, Jan 8, 2019 at 2:09 PM Ross Philipson <ross.philipson@gmail.com> wrote:
>
> On 01/07/2019 02:42 AM, Christopher Clark wrote:
> > Initialises basic data structures and performs teardown of argo state
> > for domain shutdown.
> >
> > Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
> >
> > Introduces a new Xen command line parameter 'argo': bool to enable/disable
> > the argo hypercall. Defaults to disabled.
> >
> > New headers:
> >   public/argo.h: with definions of addresses and ring structure, including
> >   indexes for atomic update for communication between domain and hypervisor.
> >
> >   xen/argo.h: to expose the hooks for integration into domain lifecycle:
> >     argo_init: per-domain init of argo data structures for domain_create.
> >     argo_destroy: teardown for domain_destroy and the error exit
> >                   path of domain_create.
> >     argo_soft_reset: reset of domain state for domain_soft_reset.
> >
> > Adds two new fields to struct domain:
> >     rwlock_t argo_lock;
> >     struct argo_domain *argo;
> >
> > In accordance with recent work on _domain_destroy, argo_destroy is
> > idempotent. It will tear down: all rings registered by this domain, all
> > rings where this domain is the single sender (ie. specified partner,
> > non-wildcard rings), and all pending notifications where this domain is
> > awaiting signal about available space in the rings of other domains.
> >
> > A count will be maintained of the number of rings that a domain has
> > registered in order to limit it below the fixed maximum limit defined here.
> >
> > The software license on the public header is the BSD license, standard
> > procedure for the public Xen headers. The public header was originally
> > posted under a GPL license at: [1]:
> > https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
> >
> > The following ACK by Lars Kurth is to confirm that only people being
> > employees of Citrix contributed to the header files in the series posted at
> > [1] and that thus the copyright of the files in question is fully owned by
> > Citrix. The ACK also confirms that Citrix is happy for the header files to
> > be published under a BSD license in this series (which is based on [1]).
> >
> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > Acked-by: Lars Kurth <lars.kurth@citrix.com>
>
> Other than an indentation issue in domain_rings_remove_all, this LGTM.
>
> Reviewed-by: Ross Philipson <ross.philipson@oracle.com>

Thanks for the review, Ross.

I don't think there is an indentation issue there -- it's just the
hlist_for_each_entry_safe macro, which is operating on
ring_remove_info, so it should be ok.

thanks,

Christopher


>
> > ---
> > v2 rewrite locking explanation comment
> > v2 header copyright line now includes 2019
> > v2 self: use ring_info backpointer in pending_ent to maintain npending
> > v2 self: rename all_rings_remove_info to domain_rings_remove_all
> > v2 feedback Jan: drop cookie, implement teardown
> > v2 self: add npending to track number of pending entries per ring
> > v2 self: amend comment on locking; drop section comments
> > v2 cookie_eq: test low bits first and use likely on high bits
> > v2 self: OVERHAUL
> > v2 self: s/argo_pending_ent/pending_ent/g
> > v2 self: drop pending_remove_ent, inline at single call site
> > v1 feedback Roger, Jan: drop argo prefix on static functions
> > v2 #4 Lars: add Acked-by and details to commit message.
> > v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> > v2 bugfix: xsm use in soft-reset prior to introduction
> > v2 feedback #9 Jan: drop 'message' from do_argo_message_op
> > v1 #5 feedback Paul: init/destroy unsigned, brackets and whitespace fixes
> > v1 #5 feedback Paul: Use mfn_eq for comparing mfns.
> > v1 #5 feedback Paul: init/destroy : use currd
> > v1 #6 (#5) feedback Jan: init/destroy: s/ENOSYS/EOPNOTSUPP/
> > v1 #6 feedback Paul: Folded patch 6 into patch 5.
> > v1 #6 feedback Jan: drop opt_argo_enabled initializer
> > v1 $6 feedback Jan: s/ENOSYS/EOPNOTSUPP/g and drop useless dprintk
> > v1. #5 feedback Paul: change the license on public header to BSD
> > - ack from Lars at Citrix.
> > v1. self, Jan: drop unnecessary xen include from sched.h
> > v1. self, Jan: drop inclusion of public argo.h in private one
> > v1. self, Jan: add include of public argo.h to argo.c
> > v1. self, Jan: drop fwd decl of argo_domain in priv header
> > v1. Paul/self/Jan: add data structures to xlat.lst and compat/argo.h to Makefile
> > v1. self: removed allocation of event channel since switching to VIRQ
> > v1. self: drop types.h include from private argo.h
> > v1: reorder public argo include position
> > v1: #13 feedback Jan: public namespace: prefix with xen
> > v1: self: rename pending ent "id" to "domain_id"
> > v1: self: add domain_cookie to ent struct
> > v1. #15 feedback Jan: make cmd unsigned
> > v1. #15 feedback Jan: make i loop variable unsigned
> > v1: self: adjust dprintks in init, destroy
> > v1: #18 feedback Jan: meld max ring count limit
> > v1: self: use type not struct in public defn, affects compat gen header
> > v1: feedback #15 Jan: handle upper-halves of hypercall args
> > v1: add comment explaining the 'magic' field
> > v1: self + Jan feedback: implement soft reset
> > v1: feedback #13 Roger: use ASSERT_UNREACHABLE
> >
> >  docs/misc/xen-command-line.pandoc |  11 +
> >  xen/common/argo.c                 | 461 +++++++++++++++++++++++++++++++++++++-
> >  xen/common/domain.c               |  20 ++
> >  xen/include/Makefile              |   1 +
> >  xen/include/public/argo.h         |  59 +++++
> >  xen/include/xen/argo.h            |  23 ++
> >  xen/include/xen/sched.h           |   6 +
> >  xen/include/xlat.lst              |   2 +
> >  8 files changed, 582 insertions(+), 1 deletion(-)
> >  create mode 100644 xen/include/public/argo.h
> >  create mode 100644 xen/include/xen/argo.h
> >
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index a755a67..aea13eb 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
> >  in combination with cpuidle.  This option is only expected to be useful for
> >  developers wishing Xen to fall back to older timing methods on newer hardware.
> >
> > +### argo
> > +> `= <boolean>`
> > +
> > +> Default: `false`
> > +
> > +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> > +
> > +This allows domains access to the Argo hypercall, which supports registration
> > +of memory rings with the hypervisor to receive messages, sending messages to
> > +other domains by hypercall and querying the ring status of other domains.
> > +
> >  ### asid (x86)
> >  > `= <boolean>`
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 6f782f7..86195d3 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -17,7 +17,177 @@
> >   */
> >
> >  #include <xen/errno.h>
> > +#include <xen/sched.h>
> > +#include <xen/domain.h>
> > +#include <xen/argo.h>
> > +#include <xen/event.h>
> > +#include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/time.h>
> > +#include <public/argo.h>
> > +
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> > +
> > +/* Xen command line option to enable argo */
> > +static bool __read_mostly opt_argo_enabled;
> > +boolean_param("argo", opt_argo_enabled);
> > +
> > +typedef struct argo_ring_id
> > +{
> > +    uint32_t port;
> > +    domid_t partner_id;
> > +    domid_t domain_id;
> > +} argo_ring_id;
> > +
> > +/* Data about a domain's own ring that it has registered */
> > +struct argo_ring_info
> > +{
> > +    /* next node in the hash, protected by L2 */
> > +    struct hlist_node node;
> > +    /* this ring's id, protected by L2 */
> > +    struct argo_ring_id id;
> > +    /* L3 */
> > +    spinlock_t lock;
> > +    /* length of the ring, protected by L3 */
> > +    uint32_t len;
> > +    /* number of pages in the ring, protected by L3 */
> > +    uint32_t npage;
> > +    /* number of pages translated into mfns, protected by L3 */
> > +    uint32_t nmfns;
> > +    /* cached tx pointer location, protected by L3 */
> > +    uint32_t tx_ptr;
> > +    /* mapped ring pages protected by L3 */
> > +    uint8_t **mfn_mapping;
> > +    /* list of mfns of guest ring, protected by L3 */
> > +    mfn_t *mfns;
> > +    /* list of struct pending_ent for this ring, protected by L3 */
> > +    struct hlist_head pending;
> > +    /* number of pending entries queued for this ring, protected by L3 */
> > +    uint32_t npending;
> > +};
> > +
> > +/* Data about a single-sender ring, held by the sender (partner) domain */
> > +struct argo_send_info
> > +{
> > +    /* next node in the hash, protected by Lsend */
> > +    struct hlist_node node;
> > +    /* this ring's id, protected by Lsend */
> > +    struct argo_ring_id id;
> > +};
> > +
> > +/* A space-available notification that is awaiting sufficient space */
> > +struct pending_ent
> > +{
> > +    /* List node within argo_ring_info's pending list */
> > +    struct hlist_node node;
> > +    /*
> > +     * List node within argo_domain's wildcard_pend_list. Only used if the
> > +     * ring is one with a wildcard partner (ie. that any domain may send to)
> > +     * to enable cancelling signals on wildcard rings on domain destroy.
> > +     */
> > +    struct hlist_node wildcard_node;
> > +    /*
> > +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> > +     * ring_info->npending is decremented when ents for wildcard rings are
> > +     * cancelled for domain destroy.
> > +     * Caution: Must hold the correct locks before accessing ring_info via this.
> > +     */
> > +    struct argo_ring_info *ring_info;
> > +    /* domain to be notified when space is available */
> > +    domid_t domain_id;
> > +    uint16_t pad;
> > +    /* minimum ring space available that this signal is waiting upon */
> > +    uint32_t len;
> > +};
> > +
> > +/*
> > + * The value of the argo element in a struct domain is
> > + * protected by the global lock argo_lock: L1
> > + */
> > +#define ARGO_HTABLE_SIZE 32
> > +struct argo_domain
> > +{
> > +    /* L2 */
> > +    rwlock_t lock;
> > +    /*
> > +     * Hash table of argo_ring_info about rings this domain has registered.
> > +     * Protected by L2.
> > +     */
> > +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> > +    /* Counter of rings registered by this domain. Protected by L2. */
> > +    uint32_t ring_count;
> > +
> > +    /* Lsend */
> > +    spinlock_t send_lock;
> > +    /*
> > +     * Hash table of argo_send_info about rings other domains have registered
> > +     * for this domain to send to. Single partner, non-wildcard rings.
> > +     * Protected by Lsend.
> > +     */
> > +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> > +
> > +    /* Lwildcard */
> > +    spinlock_t wildcard_lock;
> > +    /*
> > +     * List of pending space-available signals for this domain about wildcard
> > +     * rings registered by other domains. Protected by Lwildcard.
> > +     */
> > +    struct hlist_head wildcard_pend_list;
> > +};
> > +
> > +/*
> > + * Locking is organized as follows:
> > + *
> > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > + *              W(<lock>) means taking a write lock on it.
> > + *
> > + * L1 : The global lock: argo_lock
> > + * Protects the argo elements of all struct domain *d in the system.
> > + * It does not protect any of the elements of d->argo, only their
> > + * addresses.
> > + *
> > + * By extension since the destruction of a domain with a non-NULL
> > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > + * guarantees that no domains pointers that argo is interested in
> > + * become invalid whilst this lock is held.
> > + */
> > +
> > +static DEFINE_RWLOCK(argo_lock); /* L1 */
> > +
> > +/*
> > + * L2 : The per-domain ring hash lock: d->argo->lock
> > + * Holding a read lock on L2 protects the ring hash table and
> > + * the elements in the hash_table d->argo->ring_hash, and
> > + * the node and id fields in struct argo_ring_info in the
> > + * hash table.
> > + * Holding a write lock on L2 protects all of the elements of
> > + * struct argo_ring_info.
> > + *
> > + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> > + *
> > + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> > + * Protects all the fields within the argo_ring_info, aside from the ones that
> > + * L2 already protects: node, id, lock.
> > + *
> > + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> > + *
> > + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> > + * Protects the per-domain send hash table : d->argo->send_hash
> > + * and the elements in the hash table, and the node and id fields
> > + * in struct argo_send_info in the hash table.
> > + *
> > + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> > + * Do not attempt to acquire a L2 on any domain after taking and while
> > + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> > + *
> > + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> > + * Protects the per-domain list of outstanding signals for space availability
> > + * on wildcard rings.
> > + *
> > + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> > + * No other locks are acquired after obtaining Lwildcard.
> > + */
> >
> >  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
> >  #undef ARGO_DEBUG
> > @@ -28,10 +198,299 @@
> >  #define argo_dprintk(format, ... ) ((void)0)
> >  #endif
> >
> > +static void
> > +ring_unmap(struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +        return;
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +    {
> > +        if ( !ring_info->mfn_mapping[i] )
> > +            continue;
> > +        if ( ring_info->mfns )
> > +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
> > +                         mfn_x(ring_info->mfns[i]),
> > +                         ring_info->mfn_mapping[i]);
> > +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> > +        ring_info->mfn_mapping[i] = NULL;
> > +    }
> > +}
> > +
> > +static void
> > +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> > +{
> > +    struct domain *d = get_domain_by_id(domain_id);
> > +    if ( !d )
> > +        return;
> > +
> > +    if ( d->argo )
> > +    {
> > +        spin_lock(&d->argo->wildcard_lock);
> > +        hlist_del(&ent->wildcard_node);
> > +        spin_unlock(&d->argo->wildcard_lock);
> > +    }
> > +    put_domain(d);
> > +}
> > +
> > +static void
> > +pending_remove_all(struct argo_ring_info *ring_info)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> > +    {
> > +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> > +            wildcard_pending_list_remove(ent->domain_id, ent);
> > +        hlist_del(&ent->node);
> > +        xfree(ent);
> > +    }
> > +    ring_info->npending = 0;
> > +}
> > +
> > +static void
> > +wildcard_rings_pending_remove(struct domain *d)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    ASSERT(rw_is_write_locked(&argo_lock));
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> > +                              node)
> > +    {
> > +        hlist_del(&ent->node);
> > +        ent->ring_info->npending--;
> > +        hlist_del(&ent->wildcard_node);
> > +        xfree(ent);
> > +    }
> > +}
> > +
> > +static void
> > +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
> > +
> > +    if ( !ring_info->mfns )
> > +        return;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    ring_unmap(ring_info);
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> > +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> > +
> > +    xfree(ring_info->mfns);
> > +    ring_info->mfns = NULL;
> > +    ring_info->npage = 0;
> > +    xfree(ring_info->mfn_mapping);
> > +    ring_info->mfn_mapping = NULL;
> > +    ring_info->nmfns = 0;
> > +}
> > +
> > +static void
> > +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
> > +
> > +    pending_remove_all(ring_info);
> > +    hlist_del(&ring_info->node);
> > +    ring_remove_mfns(d, ring_info);
> > +    xfree(ring_info);
> > +}
> > +
> > +static void
> > +domain_rings_remove_all(struct domain *d)
> > +{
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        struct hlist_node *node, *next;
> > +        struct argo_ring_info *ring_info;
> > +
> > +        hlist_for_each_entry_safe(ring_info, node, next,
> > +                                  &d->argo->ring_hash[i], node)
> > +            ring_remove_info(d, ring_info);
> > +    }
> > +    d->argo->ring_count = 0;
> > +}
> > +
> > +/*
> > + * Tear down all rings of other domains where src_d domain is the partner.
> > + * (ie. it is the single domain that can send to those rings.)
> > + * This will also cancel any pending notifications about those rings.
> > + */
> > +static void
> > +partner_rings_remove(struct domain *src_d)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&argo_lock));
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        struct hlist_node *node, *next;
> > +        struct argo_send_info *send_info;
> > +
> > +        hlist_for_each_entry_safe(send_info, node, next,
> > +                                  &src_d->argo->send_hash[i], node)
> > +        {
> > +            struct argo_ring_info *ring_info;
> > +            struct domain *dst_d;
> > +
> > +            dst_d = get_domain_by_id(send_info->id.domain_id);
> > +            if ( dst_d )
> > +            {
> > +                ring_info = ring_find_info(dst_d, &send_info->id);
> > +                if ( ring_info )
> > +                {
> > +                    ring_remove_info(dst_d, ring_info);
> > +                    dst_d->argo->ring_count--;
> > +                }
> > +
> > +                put_domain(dst_d);
> > +            }
> > +
> > +            hlist_del(&send_info->node);
> > +            xfree(send_info);
> > +        }
> > +    }
> > +}
> > +
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    struct domain *currd = current->domain;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +    {
> > +        rc = -EOPNOTSUPP;
> > +        return rc;
> > +    }
> > +
> > +    domain_lock(currd);
> > +
> > +    switch (cmd)
> > +    {
> > +    default:
> > +        rc = -EOPNOTSUPP;
> > +        break;
> > +    }
> > +
> > +    domain_unlock(currd);
> > +
> > +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> > +
> > +    return rc;
> > +}
> > +
> > +static void
> > +argo_domain_init(struct argo_domain *argo)
> > +{
> > +    unsigned int i;
> > +
> > +    rwlock_init(&argo->lock);
> > +    spin_lock_init(&argo->send_lock);
> > +    spin_lock_init(&argo->wildcard_lock);
> > +    argo->ring_count = 0;
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> > +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> > +    }
> > +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> > +}
> > +
> > +int
> > +argo_init(struct domain *d)
> > +{
> > +    struct argo_domain *argo;
> > +
> > +    if ( !opt_argo_enabled )
> > +    {
> > +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> > +        return 0;
> > +    }
> > +
> > +    argo_dprintk("init: domid: %d\n", d->domain_id);
> > +
> > +    argo = xmalloc(struct argo_domain);
> > +    if ( !argo )
> > +        return -ENOMEM;
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_domain_init(argo);
> > +
> > +    d->argo = argo;
> > +
> > +    write_unlock(&argo_lock);
> > +
> > +    return 0;
> > +}
> > +
> > +void
> > +argo_destroy(struct domain *d)
> > +{
> > +    BUG_ON(!d->is_dying);
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +        xfree(d->argo);
> > +        d->argo = NULL;
> > +    }
> > +    write_unlock(&argo_lock);
> > +}
> > +
> > +void
> > +argo_soft_reset(struct domain *d)
> > +{
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +
> > +        if ( !opt_argo_enabled )
> > +        {
> > +            xfree(d->argo);
> > +            d->argo = NULL;
> > +        }
> > +        else
> > +            argo_domain_init(d->argo);
> > +    }
> > +
> > +    write_unlock(&argo_lock);
> >  }
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index c623dae..9596840 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -32,6 +32,7 @@
> >  #include <xen/grant_table.h>
> >  #include <xen/xenoprof.h>
> >  #include <xen/irq.h>
> > +#include <xen/argo.h>
> >  #include <asm/debugger.h>
> >  #include <asm/p2m.h>
> >  #include <asm/processor.h>
> > @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
> >
> >      xfree(d->pbuf);
> >
> > +#ifdef CONFIG_ARGO
> > +    argo_destroy(d);
> > +#endif
> > +
> >      rangeset_domain_destroy(d);
> >
> >      free_cpumask_var(d->dirty_cpumask);
> > @@ -376,6 +381,9 @@ struct domain *domain_create(domid_t domid,
> >      spin_lock_init(&d->hypercall_deadlock_mutex);
> >      INIT_PAGE_LIST_HEAD(&d->page_list);
> >      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
> > +#ifdef CONFIG_ARGO
> > +    rwlock_init(&d->argo_lock);
> > +#endif
> >
> >      spin_lock_init(&d->node_affinity_lock);
> >      d->node_affinity = NODE_MASK_ALL;
> > @@ -445,6 +453,11 @@ struct domain *domain_create(domid_t domid,
> >              goto fail;
> >          init_status |= INIT_gnttab;
> >
> > +#ifdef CONFIG_ARGO
> > +        if ( (err = argo_init(d)) != 0 )
> > +            goto fail;
> > +#endif
> > +
> >          err = -ENOMEM;
> >
> >          d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
> > @@ -717,6 +730,9 @@ int domain_kill(struct domain *d)
> >          if ( d->is_dying != DOMDYING_alive )
> >              return domain_kill(d);
> >          d->is_dying = DOMDYING_dying;
> > +#ifdef CONFIG_ARGO
> > +        argo_destroy(d);
> > +#endif
> >          evtchn_destroy(d);
> >          gnttab_release_mappings(d);
> >          tmem_destroy(d->tmem_client);
> > @@ -1175,6 +1191,10 @@ int domain_soft_reset(struct domain *d)
> >
> >      grant_table_warn_active_grants(d);
> >
> > +#ifdef CONFIG_ARGO
> > +    argo_soft_reset(d);
> > +#endif
> > +
> >      for_each_vcpu ( d, v )
> >      {
> >          set_xen_guest_handle(runstate_guest(v), NULL);
> > diff --git a/xen/include/Makefile b/xen/include/Makefile
> > index f7895e4..3d14532 100644
> > --- a/xen/include/Makefile
> > +++ b/xen/include/Makefile
> > @@ -5,6 +5,7 @@ ifneq ($(CONFIG_COMPAT),)
> >  compat-arch-$(CONFIG_X86) := x86_32
> >
> >  headers-y := \
> > +    compat/argo.h \
> >      compat/callback.h \
> >      compat/elfnote.h \
> >      compat/event_channel.h \
> > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> > new file mode 100644
> > index 0000000..4818684
> > --- /dev/null
> > +++ b/xen/include/public/argo.h
> > @@ -0,0 +1,59 @@
> > +/******************************************************************************
> > + * Argo : Hypervisor-Mediated data eXchange
> > + *
> > + * Derived from v4v, the version 2 of v2v.
> > + *
> > + * Copyright (c) 2010, Citrix Systems
> > + * Copyright (c) 2018-2019, BAE Systems
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> > + * sell copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + *
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_ARGO_H__
> > +#define __XEN_PUBLIC_ARGO_H__
> > +
> > +#include "xen.h"
> > +
> > +typedef struct xen_argo_addr
> > +{
> > +    uint32_t port;
> > +    domid_t domain_id;
> > +    uint16_t pad;
> > +} xen_argo_addr_t;
> > +
> > +typedef struct xen_argo_ring
> > +{
> > +    /* Guests should use atomic operations to access rx_ptr */
> > +    uint32_t rx_ptr;
> > +    /* Guests should use atomic operations to access tx_ptr */
> > +    uint32_t tx_ptr;
> > +    /*
> > +     * Header space reserved for later use. Align the start of the ring to a
> > +     * multiple of the message slot size.
> > +     */
> > +    uint8_t reserved[56];
> > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> > +    uint8_t ring[];
> > +#elif defined(__GNUC__)
> > +    uint8_t ring[0];
> > +#endif
> > +} xen_argo_ring_t;
> > +
> > +#endif
> > diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
> > new file mode 100644
> > index 0000000..29d32a9
> > --- /dev/null
> > +++ b/xen/include/xen/argo.h
> > @@ -0,0 +1,23 @@
> > +/******************************************************************************
> > + * Argo : Hypervisor-Mediated data eXchange
> > + *
> > + * Copyright (c) 2018, BAE Systems
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> > + */
> > +
> > +#ifndef __XEN_ARGO_H__
> > +#define __XEN_ARGO_H__
> > +
> > +int argo_init(struct domain *d);
> > +void argo_destroy(struct domain *d);
> > +void argo_soft_reset(struct domain *d);
> > +
> > +#endif
> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index 4956a77..20418e7 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -490,6 +490,12 @@ struct domain
> >          unsigned int guest_request_enabled       : 1;
> >          unsigned int guest_request_sync          : 1;
> >      } monitor;
> > +
> > +#ifdef CONFIG_ARGO
> > +    /* Argo interdomain communication support */
> > +    rwlock_t argo_lock;
> > +    struct argo_domain *argo;
> > +#endif
> >  };
> >
> >  /* Protect updates/reads (resp.) of domain_list and domain_hash. */
> > diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> > index 5273320..9f616e4 100644
> > --- a/xen/include/xlat.lst
> > +++ b/xen/include/xlat.lst
> > @@ -148,3 +148,5 @@
> >  ?    flask_setenforce                xsm/flask_op.h
> >  !    flask_sid_context               xsm/flask_op.h
> >  ?    flask_transition                xsm/flask_op.h
> > +?    argo_addr                       argo.h
> > +?    argo_ring                       argo.h
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
  2019-01-08 22:08   ` Ross Philipson
@ 2019-01-08 22:54   ` Jason Andryuk
  2019-01-09  6:48     ` Christopher Clark
  2019-01-09  9:35     ` Jan Beulich
  2019-01-10 10:19   ` Roger Pau Monné
                     ` (4 subsequent siblings)
  6 siblings, 2 replies; 104+ messages in thread
From: Jason Andryuk @ 2019-01-08 22:54 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> Initialises basic data structures and performs teardown of argo state
> for domain shutdown.
>
> Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
>
> Introduces a new Xen command line parameter 'argo': bool to enable/disable
> the argo hypercall. Defaults to disabled.
>
> New headers:
>   public/argo.h: with definions of addresses and ring structure, including
>   indexes for atomic update for communication between domain and hypervisor.
>
>   xen/argo.h: to expose the hooks for integration into domain lifecycle:
>     argo_init: per-domain init of argo data structures for domain_create.
>     argo_destroy: teardown for domain_destroy and the error exit
>                   path of domain_create.
>     argo_soft_reset: reset of domain state for domain_soft_reset.
>
> Adds two new fields to struct domain:
>     rwlock_t argo_lock;
>     struct argo_domain *argo;
>
> In accordance with recent work on _domain_destroy, argo_destroy is
> idempotent. It will tear down: all rings registered by this domain, all
> rings where this domain is the single sender (ie. specified partner,
> non-wildcard rings), and all pending notifications where this domain is
> awaiting signal about available space in the rings of other domains.
>
> A count will be maintained of the number of rings that a domain has
> registered in order to limit it below the fixed maximum limit defined here.
>
> The software license on the public header is the BSD license, standard
> procedure for the public Xen headers. The public header was originally
> posted under a GPL license at: [1]:
> https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
>
> The following ACK by Lars Kurth is to confirm that only people being
> employees of Citrix contributed to the header files in the series posted at
> [1] and that thus the copyright of the files in question is fully owned by
> Citrix. The ACK also confirms that Citrix is happy for the header files to
> be published under a BSD license in this series (which is based on [1]).
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> Acked-by: Lars Kurth <lars.kurth@citrix.com>
> ---
> v2 rewrite locking explanation comment
> v2 header copyright line now includes 2019
> v2 self: use ring_info backpointer in pending_ent to maintain npending
> v2 self: rename all_rings_remove_info to domain_rings_remove_all
> v2 feedback Jan: drop cookie, implement teardown
> v2 self: add npending to track number of pending entries per ring
> v2 self: amend comment on locking; drop section comments
> v2 cookie_eq: test low bits first and use likely on high bits
> v2 self: OVERHAUL
> v2 self: s/argo_pending_ent/pending_ent/g
> v2 self: drop pending_remove_ent, inline at single call site
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v2 #4 Lars: add Acked-by and details to commit message.
> v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> v2 bugfix: xsm use in soft-reset prior to introduction
> v2 feedback #9 Jan: drop 'message' from do_argo_message_op
> v1 #5 feedback Paul: init/destroy unsigned, brackets and whitespace fixes
> v1 #5 feedback Paul: Use mfn_eq for comparing mfns.
> v1 #5 feedback Paul: init/destroy : use currd
> v1 #6 (#5) feedback Jan: init/destroy: s/ENOSYS/EOPNOTSUPP/
> v1 #6 feedback Paul: Folded patch 6 into patch 5.
> v1 #6 feedback Jan: drop opt_argo_enabled initializer
> v1 $6 feedback Jan: s/ENOSYS/EOPNOTSUPP/g and drop useless dprintk
> v1. #5 feedback Paul: change the license on public header to BSD
> - ack from Lars at Citrix.
> v1. self, Jan: drop unnecessary xen include from sched.h
> v1. self, Jan: drop inclusion of public argo.h in private one
> v1. self, Jan: add include of public argo.h to argo.c
> v1. self, Jan: drop fwd decl of argo_domain in priv header
> v1. Paul/self/Jan: add data structures to xlat.lst and compat/argo.h to Makefile
> v1. self: removed allocation of event channel since switching to VIRQ
> v1. self: drop types.h include from private argo.h
> v1: reorder public argo include position
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: self: rename pending ent "id" to "domain_id"
> v1: self: add domain_cookie to ent struct
> v1. #15 feedback Jan: make cmd unsigned
> v1. #15 feedback Jan: make i loop variable unsigned
> v1: self: adjust dprintks in init, destroy
> v1: #18 feedback Jan: meld max ring count limit
> v1: self: use type not struct in public defn, affects compat gen header
> v1: feedback #15 Jan: handle upper-halves of hypercall args
> v1: add comment explaining the 'magic' field
> v1: self + Jan feedback: implement soft reset
> v1: feedback #13 Roger: use ASSERT_UNREACHABLE
>
>  docs/misc/xen-command-line.pandoc |  11 +
>  xen/common/argo.c                 | 461 +++++++++++++++++++++++++++++++++++++-
>  xen/common/domain.c               |  20 ++
>  xen/include/Makefile              |   1 +
>  xen/include/public/argo.h         |  59 +++++
>  xen/include/xen/argo.h            |  23 ++
>  xen/include/xen/sched.h           |   6 +
>  xen/include/xlat.lst              |   2 +
>  8 files changed, 582 insertions(+), 1 deletion(-)
>  create mode 100644 xen/include/public/argo.h
>  create mode 100644 xen/include/xen/argo.h
>
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index a755a67..aea13eb 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>  in combination with cpuidle.  This option is only expected to be useful for
>  developers wishing Xen to fall back to older timing methods on newer hardware.
>
> +### argo
> +> `= <boolean>`
> +
> +> Default: `false`
> +
> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> +
> +This allows domains access to the Argo hypercall, which supports registration
> +of memory rings with the hypervisor to receive messages, sending messages to
> +other domains by hypercall and querying the ring status of other domains.
> +

Do we want to say it's only available when Xen is compiled with CONFIG_ARGO?

>  ### asid (x86)
>  > `= <boolean>`
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 6f782f7..86195d3 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -17,7 +17,177 @@
>   */
>
>  #include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/argo.h>
> +#include <xen/event.h>
> +#include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/time.h>
> +#include <public/argo.h>
> +
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled;
> +boolean_param("argo", opt_argo_enabled);
> +
> +typedef struct argo_ring_id
> +{
> +    uint32_t port;
> +    domid_t partner_id;
> +    domid_t domain_id;
> +} argo_ring_id;
> +
> +/* Data about a domain's own ring that it has registered */
> +struct argo_ring_info
> +{
> +    /* next node in the hash, protected by L2 */
> +    struct hlist_node node;
> +    /* this ring's id, protected by L2 */
> +    struct argo_ring_id id;
> +    /* L3 */
> +    spinlock_t lock;
> +    /* length of the ring, protected by L3 */
> +    uint32_t len;
> +    /* number of pages in the ring, protected by L3 */
> +    uint32_t npage;
> +    /* number of pages translated into mfns, protected by L3 */
> +    uint32_t nmfns;
> +    /* cached tx pointer location, protected by L3 */
> +    uint32_t tx_ptr;
> +    /* mapped ring pages protected by L3 */
> +    uint8_t **mfn_mapping;
> +    /* list of mfns of guest ring, protected by L3 */
> +    mfn_t *mfns;
> +    /* list of struct pending_ent for this ring, protected by L3 */
> +    struct hlist_head pending;
> +    /* number of pending entries queued for this ring, protected by L3 */
> +    uint32_t npending;
> +};
> +
> +/* Data about a single-sender ring, held by the sender (partner) domain */
> +struct argo_send_info
> +{
> +    /* next node in the hash, protected by Lsend */
> +    struct hlist_node node;
> +    /* this ring's id, protected by Lsend */
> +    struct argo_ring_id id;
> +};
> +
> +/* A space-available notification that is awaiting sufficient space */
> +struct pending_ent
> +{
> +    /* List node within argo_ring_info's pending list */
> +    struct hlist_node node;
> +    /*
> +     * List node within argo_domain's wildcard_pend_list. Only used if the
> +     * ring is one with a wildcard partner (ie. that any domain may send to)
> +     * to enable cancelling signals on wildcard rings on domain destroy.
> +     */
> +    struct hlist_node wildcard_node;
> +    /*
> +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> +     * ring_info->npending is decremented when ents for wildcard rings are
> +     * cancelled for domain destroy.
> +     * Caution: Must hold the correct locks before accessing ring_info via this.

It would be clearer if this stated the correct locks.

> +     */
> +    struct argo_ring_info *ring_info;
> +    /* domain to be notified when space is available */
> +    domid_t domain_id;
> +    uint16_t pad;

Can we order domain_id after len and drop the pad?

> +    /* minimum ring space available that this signal is waiting upon */
> +    uint32_t len;
> +};
> +
> +/*
> + * The value of the argo element in a struct domain is
> + * protected by the global lock argo_lock: L1
> + */
> +#define ARGO_HTABLE_SIZE 32
> +struct argo_domain
> +{
> +    /* L2 */
> +    rwlock_t lock;
> +    /*
> +     * Hash table of argo_ring_info about rings this domain has registered.
> +     * Protected by L2.
> +     */
> +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> +    /* Counter of rings registered by this domain. Protected by L2. */
> +    uint32_t ring_count;
> +
> +    /* Lsend */
> +    spinlock_t send_lock;
> +    /*
> +     * Hash table of argo_send_info about rings other domains have registered
> +     * for this domain to send to. Single partner, non-wildcard rings.
> +     * Protected by Lsend.
> +     */
> +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> +
> +    /* Lwildcard */
> +    spinlock_t wildcard_lock;
> +    /*
> +     * List of pending space-available signals for this domain about wildcard
> +     * rings registered by other domains. Protected by Lwildcard.
> +     */
> +    struct hlist_head wildcard_pend_list;
> +};
> +
> +/*
> + * Locking is organized as follows:
> + *
> + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> + *              W(<lock>) means taking a write lock on it.
> + *
> + * L1 : The global lock: argo_lock
> + * Protects the argo elements of all struct domain *d in the system.
> + * It does not protect any of the elements of d->argo, only their
> + * addresses.
> + *
> + * By extension since the destruction of a domain with a non-NULL
> + * d->argo will need to free the d->argo pointer, holding W(L1)
> + * guarantees that no domains pointers that argo is interested in
> + * become invalid whilst this lock is held.
> + */
> +
> +static DEFINE_RWLOCK(argo_lock); /* L1 */
> +
> +/*
> + * L2 : The per-domain ring hash lock: d->argo->lock
> + * Holding a read lock on L2 protects the ring hash table and
> + * the elements in the hash_table d->argo->ring_hash, and
> + * the node and id fields in struct argo_ring_info in the
> + * hash table.
> + * Holding a write lock on L2 protects all of the elements of
> + * struct argo_ring_info.
> + *
> + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> + *
> + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> + * Protects all the fields within the argo_ring_info, aside from the ones that
> + * L2 already protects: node, id, lock.
> + *
> + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> + *
> + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> + * Protects the per-domain send hash table : d->argo->send_hash
> + * and the elements in the hash table, and the node and id fields
> + * in struct argo_send_info in the hash table.
> + *
> + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> + * Do not attempt to acquire a L2 on any domain after taking and while
> + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> + *
> + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> + * Protects the per-domain list of outstanding signals for space availability
> + * on wildcard rings.
> + *
> + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> + * No other locks are acquired after obtaining Lwildcard.
> + */
>
>  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
>  #undef ARGO_DEBUG
> @@ -28,10 +198,299 @@
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>
> +static void
> +ring_unmap(struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    if ( !ring_info->mfn_mapping )
> +        return;
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +    {
> +        if ( !ring_info->mfn_mapping[i] )
> +            continue;
> +        if ( ring_info->mfns )
> +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
> +                         mfn_x(ring_info->mfns[i]),
> +                         ring_info->mfn_mapping[i]);
> +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> +        ring_info->mfn_mapping[i] = NULL;
> +    }
> +}
> +
> +static void
> +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> +{
> +    struct domain *d = get_domain_by_id(domain_id);
> +    if ( !d )
> +        return;
> +
> +    if ( d->argo )
> +    {
> +        spin_lock(&d->argo->wildcard_lock);
> +        hlist_del(&ent->wildcard_node);
> +        spin_unlock(&d->argo->wildcard_lock);
> +    }
> +    put_domain(d);
> +}
> +
> +static void
> +pending_remove_all(struct argo_ring_info *ring_info)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> +    {
> +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +            wildcard_pending_list_remove(ent->domain_id, ent);
> +        hlist_del(&ent->node);
> +        xfree(ent);
> +    }
> +    ring_info->npending = 0;
> +}
> +
> +static void
> +wildcard_rings_pending_remove(struct domain *d)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    ASSERT(rw_is_write_locked(&argo_lock));
> +
> +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> +                              node)
> +    {
> +        hlist_del(&ent->node);
> +        ent->ring_info->npending--;
> +        hlist_del(&ent->wildcard_node);
> +        xfree(ent);
> +    }
> +}
> +

Maybe move ring_unmap() here so it's closer to where it is used?

> +static void
> +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));
> +
> +    if ( !ring_info->mfns )
> +        return;
> +
> +    if ( !ring_info->mfn_mapping )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    ring_unmap(ring_info);
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> +
> +    xfree(ring_info->mfns);
> +    ring_info->mfns = NULL;
> +    ring_info->npage = 0;
> +    xfree(ring_info->mfn_mapping);
> +    ring_info->mfn_mapping = NULL;
> +    ring_info->nmfns = 0;
> +}
> +
> +static void
> +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));
> +
> +    pending_remove_all(ring_info);
> +    hlist_del(&ring_info->node);
> +    ring_remove_mfns(d, ring_info);
> +    xfree(ring_info);
> +}
> +
> +static void
> +domain_rings_remove_all(struct domain *d)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_ring_info *ring_info;
> +
> +        hlist_for_each_entry_safe(ring_info, node, next,
> +                                  &d->argo->ring_hash[i], node)
> +            ring_remove_info(d, ring_info);
> +    }
> +    d->argo->ring_count = 0;
> +}
> +
> +/*
> + * Tear down all rings of other domains where src_d domain is the partner.
> + * (ie. it is the single domain that can send to those rings.)
> + * This will also cancel any pending notifications about those rings.
> + */
> +static void
> +partner_rings_remove(struct domain *src_d)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&argo_lock));
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_send_info *send_info;
> +
> +        hlist_for_each_entry_safe(send_info, node, next,
> +                                  &src_d->argo->send_hash[i], node)
> +        {
> +            struct argo_ring_info *ring_info;
> +            struct domain *dst_d;
> +
> +            dst_d = get_domain_by_id(send_info->id.domain_id);
> +            if ( dst_d )
> +            {
> +                ring_info = ring_find_info(dst_d, &send_info->id);
> +                if ( ring_info )
> +                {
> +                    ring_remove_info(dst_d, ring_info);
> +                    dst_d->argo->ring_count--;
> +                }
> +
> +                put_domain(dst_d);
> +            }
> +
> +            hlist_del(&send_info->node);
> +            xfree(send_info);
> +        }
> +    }
> +}
> +
>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>             unsigned long arg4)
>  {
> -    return -ENOSYS;
> +    struct domain *currd = current->domain;
> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> +
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        return rc;
> +    }
> +
> +    domain_lock(currd);
> +
> +    switch (cmd)
> +    {
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    domain_unlock(currd);
> +
> +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> +
> +    return rc;
> +}
> +
> +static void
> +argo_domain_init(struct argo_domain *argo)
> +{
> +    unsigned int i;
> +
> +    rwlock_init(&argo->lock);
> +    spin_lock_init(&argo->send_lock);
> +    spin_lock_init(&argo->wildcard_lock);
> +    argo->ring_count = 0;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> +    }
> +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> +}
> +
> +int
> +argo_init(struct domain *d)
> +{
> +    struct argo_domain *argo;
> +
> +    if ( !opt_argo_enabled )
> +    {
> +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> +        return 0;
> +    }
> +
> +    argo_dprintk("init: domid: %d\n", d->domain_id);
> +
> +    argo = xmalloc(struct argo_domain);
> +    if ( !argo )
> +        return -ENOMEM;
> +
> +    write_lock(&argo_lock);
> +
> +    argo_domain_init(argo);
> +
> +    d->argo = argo;
> +
> +    write_unlock(&argo_lock);
> +
> +    return 0;
> +}
> +
> +void
> +argo_destroy(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +        xfree(d->argo);
> +        d->argo = NULL;
> +    }
> +    write_unlock(&argo_lock);
> +}
> +
> +void
> +argo_soft_reset(struct domain *d)
> +{
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +
> +        if ( !opt_argo_enabled )

Shouldn't this function just exit early if argo is disabled?

> +        {
> +            xfree(d->argo);
> +            d->argo = NULL;
> +        }
> +        else
> +            argo_domain_init(d->argo);
> +    }
> +
> +    write_unlock(&argo_lock);
>  }

<snip>

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-08 22:54   ` Jason Andryuk
@ 2019-01-09  6:48     ` Christopher Clark
  2019-01-09 14:15       ` Jason Andryuk
  2019-01-09  9:35     ` Jan Beulich
  1 sibling, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-09  6:48 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Tue, Jan 8, 2019 at 2:54 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > Initialises basic data structures and performs teardown of argo state
> > for domain shutdown.
> >
> > Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
> >
> > Introduces a new Xen command line parameter 'argo': bool to enable/disable
> > the argo hypercall. Defaults to disabled.
> >
> > New headers:
> >   public/argo.h: with definions of addresses and ring structure, including
> >   indexes for atomic update for communication between domain and hypervisor.
> >
> >   xen/argo.h: to expose the hooks for integration into domain lifecycle:
> >     argo_init: per-domain init of argo data structures for domain_create.
> >     argo_destroy: teardown for domain_destroy and the error exit
> >                   path of domain_create.
> >     argo_soft_reset: reset of domain state for domain_soft_reset.
> >
> > Adds two new fields to struct domain:
> >     rwlock_t argo_lock;
> >     struct argo_domain *argo;
> >
> > In accordance with recent work on _domain_destroy, argo_destroy is
> > idempotent. It will tear down: all rings registered by this domain, all
> > rings where this domain is the single sender (ie. specified partner,
> > non-wildcard rings), and all pending notifications where this domain is
> > awaiting signal about available space in the rings of other domains.
> >
> > A count will be maintained of the number of rings that a domain has
> > registered in order to limit it below the fixed maximum limit defined here.
> >
> > The software license on the public header is the BSD license, standard
> > procedure for the public Xen headers. The public header was originally
> > posted under a GPL license at: [1]:
> > https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
> >
> > The following ACK by Lars Kurth is to confirm that only people being
> > employees of Citrix contributed to the header files in the series posted at
> > [1] and that thus the copyright of the files in question is fully owned by
> > Citrix. The ACK also confirms that Citrix is happy for the header files to
> > be published under a BSD license in this series (which is based on [1]).
> >
> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > Acked-by: Lars Kurth <lars.kurth@citrix.com>
> > ---
> > v2 rewrite locking explanation comment
> > v2 header copyright line now includes 2019
> > v2 self: use ring_info backpointer in pending_ent to maintain npending
> > v2 self: rename all_rings_remove_info to domain_rings_remove_all
> > v2 feedback Jan: drop cookie, implement teardown
> > v2 self: add npending to track number of pending entries per ring
> > v2 self: amend comment on locking; drop section comments
> > v2 cookie_eq: test low bits first and use likely on high bits
> > v2 self: OVERHAUL
> > v2 self: s/argo_pending_ent/pending_ent/g
> > v2 self: drop pending_remove_ent, inline at single call site
> > v1 feedback Roger, Jan: drop argo prefix on static functions
> > v2 #4 Lars: add Acked-by and details to commit message.
> > v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> > v2 bugfix: xsm use in soft-reset prior to introduction
> > v2 feedback #9 Jan: drop 'message' from do_argo_message_op
> > v1 #5 feedback Paul: init/destroy unsigned, brackets and whitespace fixes
> > v1 #5 feedback Paul: Use mfn_eq for comparing mfns.
> > v1 #5 feedback Paul: init/destroy : use currd
> > v1 #6 (#5) feedback Jan: init/destroy: s/ENOSYS/EOPNOTSUPP/
> > v1 #6 feedback Paul: Folded patch 6 into patch 5.
> > v1 #6 feedback Jan: drop opt_argo_enabled initializer
> > v1 $6 feedback Jan: s/ENOSYS/EOPNOTSUPP/g and drop useless dprintk
> > v1. #5 feedback Paul: change the license on public header to BSD
> > - ack from Lars at Citrix.
> > v1. self, Jan: drop unnecessary xen include from sched.h
> > v1. self, Jan: drop inclusion of public argo.h in private one
> > v1. self, Jan: add include of public argo.h to argo.c
> > v1. self, Jan: drop fwd decl of argo_domain in priv header
> > v1. Paul/self/Jan: add data structures to xlat.lst and compat/argo.h to Makefile
> > v1. self: removed allocation of event channel since switching to VIRQ
> > v1. self: drop types.h include from private argo.h
> > v1: reorder public argo include position
> > v1: #13 feedback Jan: public namespace: prefix with xen
> > v1: self: rename pending ent "id" to "domain_id"
> > v1: self: add domain_cookie to ent struct
> > v1. #15 feedback Jan: make cmd unsigned
> > v1. #15 feedback Jan: make i loop variable unsigned
> > v1: self: adjust dprintks in init, destroy
> > v1: #18 feedback Jan: meld max ring count limit
> > v1: self: use type not struct in public defn, affects compat gen header
> > v1: feedback #15 Jan: handle upper-halves of hypercall args
> > v1: add comment explaining the 'magic' field
> > v1: self + Jan feedback: implement soft reset
> > v1: feedback #13 Roger: use ASSERT_UNREACHABLE
> >
> >  docs/misc/xen-command-line.pandoc |  11 +
> >  xen/common/argo.c                 | 461 +++++++++++++++++++++++++++++++++++++-
> >  xen/common/domain.c               |  20 ++
> >  xen/include/Makefile              |   1 +
> >  xen/include/public/argo.h         |  59 +++++
> >  xen/include/xen/argo.h            |  23 ++
> >  xen/include/xen/sched.h           |   6 +
> >  xen/include/xlat.lst              |   2 +
> >  8 files changed, 582 insertions(+), 1 deletion(-)
> >  create mode 100644 xen/include/public/argo.h
> >  create mode 100644 xen/include/xen/argo.h
> >
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index a755a67..aea13eb 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
> >  in combination with cpuidle.  This option is only expected to be useful for
> >  developers wishing Xen to fall back to older timing methods on newer hardware.
> >
> > +### argo
> > +> `= <boolean>`
> > +
> > +> Default: `false`
> > +
> > +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> > +
> > +This allows domains access to the Argo hypercall, which supports registration
> > +of memory rings with the hypervisor to receive messages, sending messages to
> > +other domains by hypercall and querying the ring status of other domains.
> > +
>
> Do we want to say it's only available when Xen is compiled with CONFIG_ARGO?

It's a fair question and I can respin with that added, if needed.
However, there are other boot options that are dependent on KCONFIG
options that don't make such a statement (altp2m, efi, ept, flask,
hardware_dom, tmem, ...), so I'm not sure if it is required.


>
> >  ### asid (x86)
> >  > `= <boolean>`
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 6f782f7..86195d3 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -17,7 +17,177 @@
> >   */
> >
> >  #include <xen/errno.h>
> > +#include <xen/sched.h>
> > +#include <xen/domain.h>
> > +#include <xen/argo.h>
> > +#include <xen/event.h>
> > +#include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/time.h>
> > +#include <public/argo.h>
> > +
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> > +
> > +/* Xen command line option to enable argo */
> > +static bool __read_mostly opt_argo_enabled;
> > +boolean_param("argo", opt_argo_enabled);
> > +
> > +typedef struct argo_ring_id
> > +{
> > +    uint32_t port;
> > +    domid_t partner_id;
> > +    domid_t domain_id;
> > +} argo_ring_id;
> > +
> > +/* Data about a domain's own ring that it has registered */
> > +struct argo_ring_info
> > +{
> > +    /* next node in the hash, protected by L2 */
> > +    struct hlist_node node;
> > +    /* this ring's id, protected by L2 */
> > +    struct argo_ring_id id;
> > +    /* L3 */
> > +    spinlock_t lock;
> > +    /* length of the ring, protected by L3 */
> > +    uint32_t len;
> > +    /* number of pages in the ring, protected by L3 */
> > +    uint32_t npage;
> > +    /* number of pages translated into mfns, protected by L3 */
> > +    uint32_t nmfns;
> > +    /* cached tx pointer location, protected by L3 */
> > +    uint32_t tx_ptr;
> > +    /* mapped ring pages protected by L3 */
> > +    uint8_t **mfn_mapping;
> > +    /* list of mfns of guest ring, protected by L3 */
> > +    mfn_t *mfns;
> > +    /* list of struct pending_ent for this ring, protected by L3 */
> > +    struct hlist_head pending;
> > +    /* number of pending entries queued for this ring, protected by L3 */
> > +    uint32_t npending;
> > +};
> > +
> > +/* Data about a single-sender ring, held by the sender (partner) domain */
> > +struct argo_send_info
> > +{
> > +    /* next node in the hash, protected by Lsend */
> > +    struct hlist_node node;
> > +    /* this ring's id, protected by Lsend */
> > +    struct argo_ring_id id;
> > +};
> > +
> > +/* A space-available notification that is awaiting sufficient space */
> > +struct pending_ent
> > +{
> > +    /* List node within argo_ring_info's pending list */
> > +    struct hlist_node node;
> > +    /*
> > +     * List node within argo_domain's wildcard_pend_list. Only used if the
> > +     * ring is one with a wildcard partner (ie. that any domain may send to)
> > +     * to enable cancelling signals on wildcard rings on domain destroy.
> > +     */
> > +    struct hlist_node wildcard_node;
> > +    /*
> > +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> > +     * ring_info->npending is decremented when ents for wildcard rings are
> > +     * cancelled for domain destroy.
> > +     * Caution: Must hold the correct locks before accessing ring_info via this.
>
> It would be clearer if this stated the correct locks.

ok - it would mean duplicating the statement about which locks are
needed though, since it is explained elsewhere in the file, which means
it will need updating in two places if the locking requirements change.
That was why I worded it that way, as an indicator to go and find where
it is already described, to avoid that.


>
> > +     */
> > +    struct argo_ring_info *ring_info;
> > +    /* domain to be notified when space is available */
> > +    domid_t domain_id;
> > +    uint16_t pad;
>
> Can we order domain_id after len and drop the pad?

I'm not sure that would be right to do that. I think that the pad
ensures that len is word aligned to 32-bit boundary.  I was asked to
insert a pad field for such a struct like this in an earlier review here:

https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00239.html

>
> > +    /* minimum ring space available that this signal is waiting upon */
> > +    uint32_t len;
> > +};
> > +
> > +/*
> > + * The value of the argo element in a struct domain is
> > + * protected by the global lock argo_lock: L1
> > + */
> > +#define ARGO_HTABLE_SIZE 32
> > +struct argo_domain
> > +{
> > +    /* L2 */
> > +    rwlock_t lock;
> > +    /*
> > +     * Hash table of argo_ring_info about rings this domain has registered.
> > +     * Protected by L2.
> > +     */
> > +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> > +    /* Counter of rings registered by this domain. Protected by L2. */
> > +    uint32_t ring_count;
> > +
> > +    /* Lsend */
> > +    spinlock_t send_lock;
> > +    /*
> > +     * Hash table of argo_send_info about rings other domains have registered
> > +     * for this domain to send to. Single partner, non-wildcard rings.
> > +     * Protected by Lsend.
> > +     */
> > +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> > +
> > +    /* Lwildcard */
> > +    spinlock_t wildcard_lock;
> > +    /*
> > +     * List of pending space-available signals for this domain about wildcard
> > +     * rings registered by other domains. Protected by Lwildcard.
> > +     */
> > +    struct hlist_head wildcard_pend_list;
> > +};
> > +
> > +/*
> > + * Locking is organized as follows:
> > + *
> > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > + *              W(<lock>) means taking a write lock on it.
> > + *
> > + * L1 : The global lock: argo_lock
> > + * Protects the argo elements of all struct domain *d in the system.
> > + * It does not protect any of the elements of d->argo, only their
> > + * addresses.
> > + *
> > + * By extension since the destruction of a domain with a non-NULL
> > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > + * guarantees that no domains pointers that argo is interested in
> > + * become invalid whilst this lock is held.
> > + */
> > +
> > +static DEFINE_RWLOCK(argo_lock); /* L1 */
> > +
> > +/*
> > + * L2 : The per-domain ring hash lock: d->argo->lock
> > + * Holding a read lock on L2 protects the ring hash table and
> > + * the elements in the hash_table d->argo->ring_hash, and
> > + * the node and id fields in struct argo_ring_info in the
> > + * hash table.
> > + * Holding a write lock on L2 protects all of the elements of
> > + * struct argo_ring_info.
> > + *
> > + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> > + *
> > + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> > + * Protects all the fields within the argo_ring_info, aside from the ones that
> > + * L2 already protects: node, id, lock.
> > + *
> > + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> > + *
> > + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> > + * Protects the per-domain send hash table : d->argo->send_hash
> > + * and the elements in the hash table, and the node and id fields
> > + * in struct argo_send_info in the hash table.
> > + *
> > + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> > + * Do not attempt to acquire a L2 on any domain after taking and while
> > + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> > + *
> > + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> > + * Protects the per-domain list of outstanding signals for space availability
> > + * on wildcard rings.
> > + *
> > + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> > + * No other locks are acquired after obtaining Lwildcard.
> > + */
> >
> >  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
> >  #undef ARGO_DEBUG
> > @@ -28,10 +198,299 @@
> >  #define argo_dprintk(format, ... ) ((void)0)
> >  #endif
> >
> > +static void
> > +ring_unmap(struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +        return;
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +    {
> > +        if ( !ring_info->mfn_mapping[i] )
> > +            continue;
> > +        if ( ring_info->mfns )
> > +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
> > +                         mfn_x(ring_info->mfns[i]),
> > +                         ring_info->mfn_mapping[i]);
> > +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> > +        ring_info->mfn_mapping[i] = NULL;
> > +    }
> > +}
> > +
> > +static void
> > +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> > +{
> > +    struct domain *d = get_domain_by_id(domain_id);
> > +    if ( !d )
> > +        return;
> > +
> > +    if ( d->argo )
> > +    {
> > +        spin_lock(&d->argo->wildcard_lock);
> > +        hlist_del(&ent->wildcard_node);
> > +        spin_unlock(&d->argo->wildcard_lock);
> > +    }
> > +    put_domain(d);
> > +}
> > +
> > +static void
> > +pending_remove_all(struct argo_ring_info *ring_info)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> > +    {
> > +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> > +            wildcard_pending_list_remove(ent->domain_id, ent);
> > +        hlist_del(&ent->node);
> > +        xfree(ent);
> > +    }
> > +    ring_info->npending = 0;
> > +}
> > +
> > +static void
> > +wildcard_rings_pending_remove(struct domain *d)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    ASSERT(rw_is_write_locked(&argo_lock));
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> > +                              node)
> > +    {
> > +        hlist_del(&ent->node);
> > +        ent->ring_info->npending--;
> > +        hlist_del(&ent->wildcard_node);
> > +        xfree(ent);
> > +    }
> > +}
> > +
>
> Maybe move ring_unmap() here so it's closer to where it is used?

I'm fine with moving it if it needs it, but it's located where it is in
order to put it right next to the corresponding ring_map_page function -
the two are paired really, with one doing map_domain_page_global and the
other undoing it with unmap_domain_page_global. That's how it ends up
when the full series is applied.

>
> > +static void
> > +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
> > +
> > +    if ( !ring_info->mfns )
> > +        return;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    ring_unmap(ring_info);
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> > +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> > +
> > +    xfree(ring_info->mfns);
> > +    ring_info->mfns = NULL;
> > +    ring_info->npage = 0;
> > +    xfree(ring_info->mfn_mapping);
> > +    ring_info->mfn_mapping = NULL;
> > +    ring_info->nmfns = 0;
> > +}
> > +
> > +static void
> > +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
> > +
> > +    pending_remove_all(ring_info);
> > +    hlist_del(&ring_info->node);
> > +    ring_remove_mfns(d, ring_info);
> > +    xfree(ring_info);
> > +}
> > +
> > +static void
> > +domain_rings_remove_all(struct domain *d)
> > +{
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        struct hlist_node *node, *next;
> > +        struct argo_ring_info *ring_info;
> > +
> > +        hlist_for_each_entry_safe(ring_info, node, next,
> > +                                  &d->argo->ring_hash[i], node)
> > +            ring_remove_info(d, ring_info);
> > +    }
> > +    d->argo->ring_count = 0;
> > +}
> > +
> > +/*
> > + * Tear down all rings of other domains where src_d domain is the partner.
> > + * (ie. it is the single domain that can send to those rings.)
> > + * This will also cancel any pending notifications about those rings.
> > + */
> > +static void
> > +partner_rings_remove(struct domain *src_d)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&argo_lock));
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        struct hlist_node *node, *next;
> > +        struct argo_send_info *send_info;
> > +
> > +        hlist_for_each_entry_safe(send_info, node, next,
> > +                                  &src_d->argo->send_hash[i], node)
> > +        {
> > +            struct argo_ring_info *ring_info;
> > +            struct domain *dst_d;
> > +
> > +            dst_d = get_domain_by_id(send_info->id.domain_id);
> > +            if ( dst_d )
> > +            {
> > +                ring_info = ring_find_info(dst_d, &send_info->id);
> > +                if ( ring_info )
> > +                {
> > +                    ring_remove_info(dst_d, ring_info);
> > +                    dst_d->argo->ring_count--;
> > +                }
> > +
> > +                put_domain(dst_d);
> > +            }
> > +
> > +            hlist_del(&send_info->node);
> > +            xfree(send_info);
> > +        }
> > +    }
> > +}
> > +
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    struct domain *currd = current->domain;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +    {
> > +        rc = -EOPNOTSUPP;
> > +        return rc;
> > +    }
> > +
> > +    domain_lock(currd);
> > +
> > +    switch (cmd)
> > +    {
> > +    default:
> > +        rc = -EOPNOTSUPP;
> > +        break;
> > +    }
> > +
> > +    domain_unlock(currd);
> > +
> > +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> > +
> > +    return rc;
> > +}
> > +
> > +static void
> > +argo_domain_init(struct argo_domain *argo)
> > +{
> > +    unsigned int i;
> > +
> > +    rwlock_init(&argo->lock);
> > +    spin_lock_init(&argo->send_lock);
> > +    spin_lock_init(&argo->wildcard_lock);
> > +    argo->ring_count = 0;
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> > +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> > +    }
> > +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> > +}
> > +
> > +int
> > +argo_init(struct domain *d)
> > +{
> > +    struct argo_domain *argo;
> > +
> > +    if ( !opt_argo_enabled )
> > +    {
> > +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> > +        return 0;
> > +    }
> > +
> > +    argo_dprintk("init: domid: %d\n", d->domain_id);
> > +
> > +    argo = xmalloc(struct argo_domain);
> > +    if ( !argo )
> > +        return -ENOMEM;
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_domain_init(argo);
> > +
> > +    d->argo = argo;
> > +
> > +    write_unlock(&argo_lock);
> > +
> > +    return 0;
> > +}
> > +
> > +void
> > +argo_destroy(struct domain *d)
> > +{
> > +    BUG_ON(!d->is_dying);
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +        xfree(d->argo);
> > +        d->argo = NULL;
> > +    }
> > +    write_unlock(&argo_lock);
> > +}
> > +
> > +void
> > +argo_soft_reset(struct domain *d)
> > +{
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +
> > +        if ( !opt_argo_enabled )
>
> Shouldn't this function just exit early if argo is disabled?

There has been support added to Xen with a hypercall to make a subset of
boot parameters modifiable at runtime. Argo-enabled isn't currently one
of them, but that may be changed later so I did not want to bake into
this function the assumption that the enabled/disabled configuration
could not change after being initially evaluated at the time the domain
was launched. That's possibly a conservative choice though.

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=82cf78468e96de1e4d1400bbf5508f8b111650c3

Thanks for the review.

Christopher

>
> > +        {
> > +            xfree(d->argo);
> > +            d->argo = NULL;
> > +        }
> > +        else
> > +            argo_domain_init(d->argo);
> > +    }
> > +
> > +    write_unlock(&argo_lock);
> >  }
>
> <snip>
>
> Regards,
> Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-08 22:54   ` Jason Andryuk
  2019-01-09  6:48     ` Christopher Clark
@ 2019-01-09  9:35     ` Jan Beulich
  2019-01-09 14:26       ` Jason Andryuk
  1 sibling, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-09  9:35 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper, Ian Jackson,
	Christopher Clark, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, xen-devel, eric chanudet,
	Roger Pau Monne

>>> On 08.01.19 at 23:54, <jandryuk@gmail.com> wrote:

First of all - please trim your replies.

> On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>>  in combination with cpuidle.  This option is only expected to be useful for
>>  developers wishing Xen to fall back to older timing methods on newer hardware.
>>
>> +### argo
>> +> `= <boolean>`
>> +
>> +> Default: `false`
>> +
>> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
>> +
>> +This allows domains access to the Argo hypercall, which supports registration
>> +of memory rings with the hypervisor to receive messages, sending messages to
>> +other domains by hypercall and querying the ring status of other domains.
>> +
> 
> Do we want to say it's only available when Xen is compiled with CONFIG_ARGO?

We don't do so elsewhere, so I'm with Christopher.

>> +     */
>> +    struct argo_ring_info *ring_info;
>> +    /* domain to be notified when space is available */
>> +    domid_t domain_id;
>> +    uint16_t pad;
> 
> Can we order domain_id after len and drop the pad?

That would still call for a pad field - we prefer to have explicit padding,
and also to check it's zero, the latter to allow for assigning meaning to
the field down the road.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-09  6:48     ` Christopher Clark
@ 2019-01-09 14:15       ` Jason Andryuk
  2019-01-09 23:24         ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Andryuk @ 2019-01-09 14:15 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 1:48 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> On Tue, Jan 8, 2019 at 2:54 PM Jason Andryuk <jandryuk@gmail.com> wrote:
> >
> > On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:

> > > +
> > > +/* A space-available notification that is awaiting sufficient space */
> > > +struct pending_ent
> > > +{
> > > +    /* List node within argo_ring_info's pending list */
> > > +    struct hlist_node node;
> > > +    /*
> > > +     * List node within argo_domain's wildcard_pend_list. Only used if the
> > > +     * ring is one with a wildcard partner (ie. that any domain may send to)
> > > +     * to enable cancelling signals on wildcard rings on domain destroy.
> > > +     */
> > > +    struct hlist_node wildcard_node;
> > > +    /*
> > > +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> > > +     * ring_info->npending is decremented when ents for wildcard rings are
> > > +     * cancelled for domain destroy.
> > > +     * Caution: Must hold the correct locks before accessing ring_info via this.
> >
> > It would be clearer if this stated the correct locks.
>
> ok - it would mean duplicating the statement about which locks are
> needed though, since it is explained elsewhere in the file, which means
> it will need updating in two places if the locking requirements change.
> That was why I worded it that way, as an indicator to go and find where
> it is already described, to avoid that.

"Caution" made me think *ring_info points from domain A's pending_ent
to domain B's ring_info.  Reading patch 10 (notify op) I see that it
really just points back to domain A's ring_info.  So the "Caution" is
just that you still have to lock ring_info (L3) even though you can
get to the pointer via L2.  Is that correct?

I agree a single location for the locking documentation is better than
splitting or duplicating.  As long as no cross-domain locking is
required, this is fine.

> > > +     */
> > > +    struct argo_ring_info *ring_info;
> > > +    /* domain to be notified when space is available */
> > > +    domid_t domain_id;
> > > +    uint16_t pad;
> >
> > Can we order domain_id after len and drop the pad?
>
> I'm not sure that would be right to do that. I think that the pad
> ensures that len is word aligned to 32-bit boundary.  I was asked to
> insert a pad field for such a struct like this in an earlier review here:
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00239.html

I'll respond to this in the other email from Jan.

> > > +
> > > +static void
> > > +wildcard_rings_pending_remove(struct domain *d)
> > > +{
> > > +    struct hlist_node *node, *next;
> > > +    struct pending_ent *ent;
> > > +
> > > +    ASSERT(rw_is_write_locked(&argo_lock));
> > > +
> > > +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> > > +                              node)
> > > +    {
> > > +        hlist_del(&ent->node);
> > > +        ent->ring_info->npending--;
> > > +        hlist_del(&ent->wildcard_node);
> > > +        xfree(ent);
> > > +    }
> > > +}
> > > +
> >
> > Maybe move ring_unmap() here so it's closer to where it is used?
>
> I'm fine with moving it if it needs it, but it's located where it is in
> order to put it right next to the corresponding ring_map_page function -
> the two are paired really, with one doing map_domain_page_global and the
> other undoing it with unmap_domain_page_global. That's how it ends up
> when the full series is applied.

Okay.  My comment came only from reading only this single patch.

> > > +void
> > > +argo_soft_reset(struct domain *d)
> > > +{
> > > +    write_lock(&argo_lock);
> > > +
> > > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > > +
> > > +    if ( d->argo )
> > > +    {
> > > +        domain_rings_remove_all(d);
> > > +        partner_rings_remove(d);
> > > +        wildcard_rings_pending_remove(d);
> > > +
> > > +        if ( !opt_argo_enabled )
> >
> > Shouldn't this function just exit early if argo is disabled?
>
> There has been support added to Xen with a hypercall to make a subset of
> boot parameters modifiable at runtime. Argo-enabled isn't currently one
> of them, but that may be changed later so I did not want to bake into
> this function the assumption that the enabled/disabled configuration
> could not change after being initially evaluated at the time the domain
> was launched. That's possibly a conservative choice though.
>
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=82cf78468e96de1e4d1400bbf5508f8b111650c3

Okay.  I was not aware of this functionality.

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-09  9:35     ` Jan Beulich
@ 2019-01-09 14:26       ` Jason Andryuk
  2019-01-09 14:38         ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Andryuk @ 2019-01-09 14:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper, Ian Jackson,
	Christopher Clark, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, xen-devel, eric chanudet,
	Roger Pau Monne

On Wed, Jan 9, 2019 at 4:35 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 08.01.19 at 23:54, <jandryuk@gmail.com> wrote:
>
> First of all - please trim your replies.

Sorry.  Will do.

> > On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
> >> --- a/docs/misc/xen-command-line.pandoc
> >> +++ b/docs/misc/xen-command-line.pandoc
> >> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
> >>  in combination with cpuidle.  This option is only expected to be useful for
> >>  developers wishing Xen to fall back to older timing methods on newer hardware.
> >>
> >> +### argo
> >> +> `= <boolean>`
> >> +
> >> +> Default: `false`
> >> +
> >> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> >> +
> >> +This allows domains access to the Argo hypercall, which supports registration
> >> +of memory rings with the hypervisor to receive messages, sending messages to
> >> +other domains by hypercall and querying the ring status of other domains.
> >> +
> >
> > Do we want to say it's only available when Xen is compiled with CONFIG_ARGO?
>
> We don't do so elsewhere, so I'm with Christopher.

Okay.

> >> +     */
> >> +    struct argo_ring_info *ring_info;
> >> +    /* domain to be notified when space is available */
> >> +    domid_t domain_id;
> >> +    uint16_t pad;
> >
> > Can we order domain_id after len and drop the pad?
>
> That would still call for a pad field - we prefer to have explicit padding,
> and also to check it's zero, the latter to allow for assigning meaning to
> the field down the road.

This struct is internal to Xen and argo, so do we still need explicit padding?

There are other public argo structs with padding fields.  I haven't
gotten through all the patches, but I think at least some of those are
missing zero checks.

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-09 14:26       ` Jason Andryuk
@ 2019-01-09 14:38         ` Jan Beulich
  2019-01-10 23:29           ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-09 14:38 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Konrad Rzeszutek Wilk, Daniel Smith, Andrew Cooper, Ian Jackson,
	Christopher Clark, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, xen-devel, eric chanudet,
	Roger Pau Monne

>>> On 09.01.19 at 15:26, <jandryuk@gmail.com> wrote:
> On Wed, Jan 9, 2019 at 4:35 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 08.01.19 at 23:54, <jandryuk@gmail.com> wrote:
>> > On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>> >> +     */
>> >> +    struct argo_ring_info *ring_info;
>> >> +    /* domain to be notified when space is available */
>> >> +    domid_t domain_id;
>> >> +    uint16_t pad;
>> >
>> > Can we order domain_id after len and drop the pad?
>>
>> That would still call for a pad field - we prefer to have explicit padding,
>> and also to check it's zero, the latter to allow for assigning meaning to
>> the field down the road.
> 
> This struct is internal to Xen and argo, so do we still need explicit 
> padding?

Oh, internal structures don't need any explicit padding. Where the
domain_id field gets placed still doesn't matter then, though.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
@ 2019-01-09 15:55   ` Wei Liu
  2019-01-09 16:00     ` Christopher Clark
  2019-01-10 11:24   ` Roger Pau Monné
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 104+ messages in thread
From: Wei Liu @ 2019-01-09 15:55 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
> The register op is used by a domain to register a region of memory for
> receiving messages from either a specified other domain, or, if specifying a
> wildcard, any domain.
> 
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits,
> the hypervisor will use this mapping to copy data from a sending domain into
> this registered ring, making it accessible to the domain that registered the
> ring to receive data.
> 
> Wildcard any-sender rings are default disabled and registration will be
> refused with EPERM unless they have been specifically enabled with the
> argo-mac boot option introduced here. The reason why the default for
> wildcard rings is 'deny' is that there is currently no means to protect the
> ring from DoS by a noisy domain spamming the ring, affecting other domains
> ability to send to it. This will be addressed with XSM policy controls in
> subsequent work.
> 
> Since denying access to any-sender rings is a significant functional
> constraint, a new bootparam is provided to enable overriding this:
>  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
> Even though this is a boolean variable, use these descriptive strings in
> order to make it obvious to an administrator that this has potential
> security impact.
> 
> The p2m type of the memory supplied by the guest for the ring must be
> p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
> is registered.
> 
> xen_argo_page_descr_t type is introduced as a page descriptor, to convey
> both the physical address of the start of the page and its granularity. The
> smallest granularity page is assumed to be 4096 bytes and the lower twelve
> bits of the type are used to indicate the size of page of memory supplied.
> The implementation of the hypercall op currently only supports 4K pages.
> 

What is the resolution for the Arm issues mentioned by Julien? I read
the conversation in previous thread. A solution seemed to have been
agreed upon, but the changelog doesn't say anything about it.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 15:55   ` Wei Liu
@ 2019-01-09 16:00     ` Christopher Clark
  2019-01-09 17:02       ` Julien Grall
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-09 16:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, Ross Philipson, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Jason Andryuk, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
>
> On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
> > The register op is used by a domain to register a region of memory for
> > receiving messages from either a specified other domain, or, if specifying a
> > wildcard, any domain.
> >
> > This operation creates a mapping within Xen's private address space that
> > will remain resident for the lifetime of the ring. In subsequent commits,
> > the hypervisor will use this mapping to copy data from a sending domain into
> > this registered ring, making it accessible to the domain that registered the
> > ring to receive data.
> >
> > Wildcard any-sender rings are default disabled and registration will be
> > refused with EPERM unless they have been specifically enabled with the
> > argo-mac boot option introduced here. The reason why the default for
> > wildcard rings is 'deny' is that there is currently no means to protect the
> > ring from DoS by a noisy domain spamming the ring, affecting other domains
> > ability to send to it. This will be addressed with XSM policy controls in
> > subsequent work.
> >
> > Since denying access to any-sender rings is a significant functional
> > constraint, a new bootparam is provided to enable overriding this:
> >  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
> > Even though this is a boolean variable, use these descriptive strings in
> > order to make it obvious to an administrator that this has potential
> > security impact.
> >
> > The p2m type of the memory supplied by the guest for the ring must be
> > p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
> > is registered.
> >
> > xen_argo_page_descr_t type is introduced as a page descriptor, to convey
> > both the physical address of the start of the page and its granularity. The
> > smallest granularity page is assumed to be 4096 bytes and the lower twelve
> > bits of the type are used to indicate the size of page of memory supplied.
> > The implementation of the hypercall op currently only supports 4K pages.
> >
>
> What is the resolution for the Arm issues mentioned by Julien? I read
> the conversation in previous thread. A solution seemed to have been
> agreed upon, but the changelog doesn't say anything about it.

I made the interface changes that Julien had asked for. The register
op now takes arguments that can describe the granularitity of the
pages supplied, though only support for 4K pages is accepted in the
current implementation. I believe it meets Julien's requirements.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 16:00     ` Christopher Clark
@ 2019-01-09 17:02       ` Julien Grall
  2019-01-09 17:18         ` Stefano Stabellini
  2019-01-09 17:54         ` Wei Liu
  0 siblings, 2 replies; 104+ messages in thread
From: Julien Grall @ 2019-01-09 17:02 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, Jan Beulich,
	xen-devel, Eric Chanudet, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 3553 bytes --]

Hi,

Sorry for the formatting. Sending it from my phone.

On Wed, 9 Jan 2019, 11:03 Christopher Clark, <christopher.w.clark@gmail.com>
wrote:

> On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
> >
> > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
> > > The register op is used by a domain to register a region of memory for
> > > receiving messages from either a specified other domain, or, if
> specifying a
> > > wildcard, any domain.
> > >
> > > This operation creates a mapping within Xen's private address space
> that
> > > will remain resident for the lifetime of the ring. In subsequent
> commits,
> > > the hypervisor will use this mapping to copy data from a sending
> domain into
> > > this registered ring, making it accessible to the domain that
> registered the
> > > ring to receive data.
> > >
> > > Wildcard any-sender rings are default disabled and registration will be
> > > refused with EPERM unless they have been specifically enabled with the
> > > argo-mac boot option introduced here. The reason why the default for
> > > wildcard rings is 'deny' is that there is currently no means to
> protect the
> > > ring from DoS by a noisy domain spamming the ring, affecting other
> domains
> > > ability to send to it. This will be addressed with XSM policy controls
> in
> > > subsequent work.
> > >
> > > Since denying access to any-sender rings is a significant functional
> > > constraint, a new bootparam is provided to enable overriding this:
> > >  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
> > > Even though this is a boolean variable, use these descriptive strings
> in
> > > order to make it obvious to an administrator that this has potential
> > > security impact.
> > >
> > > The p2m type of the memory supplied by the guest for the ring must be
> > > p2m_ram_rw and the memory will be pinned as PGT_writable_page while
> the ring
> > > is registered.
> > >
> > > xen_argo_page_descr_t type is introduced as a page descriptor, to
> convey
> > > both the physical address of the start of the page and its
> granularity. The
> > > smallest granularity page is assumed to be 4096 bytes and the lower
> twelve
> > > bits of the type are used to indicate the size of page of memory
> supplied.
> > > The implementation of the hypercall op currently only supports 4K
> pages.
> > >
> >
> > What is the resolution for the Arm issues mentioned by Julien? I read
> > the conversation in previous thread. A solution seemed to have been
> > agreed upon, but the changelog doesn't say anything about it.
>
> I made the interface changes that Julien had asked for. The register
> op now takes arguments that can describe the granularitity of the
> pages supplied, though only support for 4K pages is accepted in the
> current implementation. I believe it meets Julien's requirements.


I still don't think allowing 4K or 64K is the right solution to go. You are
adding unnecessary burden in the hypervisor and would prevent optimization
i the hypervisor and unwanted side effect.

For instance a 64K hypervisor will always map 64K even when the guest is
passing 4K. You also can't map everything contiguously in Xen (if you ever
wanted to).

We need to stick on a single chunk size. That could be different between
Arm and x86. For Arm it would need to be 64KB.

Cheers,


> thanks,
>
> Christopher
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 4915 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 17:02       ` Julien Grall
@ 2019-01-09 17:18         ` Stefano Stabellini
  2019-01-09 18:13           ` Julien Grall
  2019-01-09 17:54         ` Wei Liu
  1 sibling, 1 reply; 104+ messages in thread
From: Stefano Stabellini @ 2019-01-09 17:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, Eric Chanudet,
	Roger Pau Monne

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4502 bytes --]

On Wed, 9 Jan 2019, Julien Grall wrote:
> Hi,
> Sorry for the formatting. Sending it from my phone.
> 
> On Wed, 9 Jan 2019, 11:03 Christopher Clark, <christopher.w.clark@gmail.com> wrote:
>       On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
>       >
>       > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
>       > > The register op is used by a domain to register a region of memory for
>       > > receiving messages from either a specified other domain, or, if specifying a
>       > > wildcard, any domain.
>       > >
>       > > This operation creates a mapping within Xen's private address space that
>       > > will remain resident for the lifetime of the ring. In subsequent commits,
>       > > the hypervisor will use this mapping to copy data from a sending domain into
>       > > this registered ring, making it accessible to the domain that registered the
>       > > ring to receive data.
>       > >
>       > > Wildcard any-sender rings are default disabled and registration will be
>       > > refused with EPERM unless they have been specifically enabled with the
>       > > argo-mac boot option introduced here. The reason why the default for
>       > > wildcard rings is 'deny' is that there is currently no means to protect the
>       > > ring from DoS by a noisy domain spamming the ring, affecting other domains
>       > > ability to send to it. This will be addressed with XSM policy controls in
>       > > subsequent work.
>       > >
>       > > Since denying access to any-sender rings is a significant functional
>       > > constraint, a new bootparam is provided to enable overriding this:
>       > >  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
>       > > Even though this is a boolean variable, use these descriptive strings in
>       > > order to make it obvious to an administrator that this has potential
>       > > security impact.
>       > >
>       > > The p2m type of the memory supplied by the guest for the ring must be
>       > > p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
>       > > is registered.
>       > >
>       > > xen_argo_page_descr_t type is introduced as a page descriptor, to convey
>       > > both the physical address of the start of the page and its granularity. The
>       > > smallest granularity page is assumed to be 4096 bytes and the lower twelve
>       > > bits of the type are used to indicate the size of page of memory supplied.
>       > > The implementation of the hypercall op currently only supports 4K pages.
>       > >
>       >
>       > What is the resolution for the Arm issues mentioned by Julien? I read
>       > the conversation in previous thread. A solution seemed to have been
>       > agreed upon, but the changelog doesn't say anything about it.
> 
>       I made the interface changes that Julien had asked for. The register
>       op now takes arguments that can describe the granularitity of the
>       pages supplied, though only support for 4K pages is accepted in the
>       current implementation. I believe it meets Julien's requirements.
> 
> 
> I still don't think allowing 4K or 64K is the right solution to go. You are adding unnecessary burden in the hypervisor and would
> prevent optimization i the hypervisor and unwanted side effect.
> 
> For instance a 64K hypervisor will always map 64K even when the guest is passing 4K. You also can't map everything contiguously
> in Xen (if you ever wanted to).
> 
> We need to stick on a single chunk size. That could be different between Arm and x86. For Arm it would need to be 64KB.

Hi Julien!

I don't think we should force 64K as the only granularity on ARM. It
causes unnecessary overhead and confusion on 4K-only deployments that
are almost all our use-cases today.

One option is to make the granularity configurable at the guest side,
like Christopher did, letting the guest specifying the granularity. The
hypervisor could return -ENOSYS if the specified granularity is not
supported.

The other option is having the hypervisor export the granularity it
supports for this interface: Xen would say "I support only 4K".
Tomorrow, it could change and Xen could say "I support only 64K". Then,
it would be up to the guest passing a page of the right granularity to
the hypervisor. I think this is probably the best option, but it would
require the addition of one hypercall to retrieve the supported
granularity from Xen.

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 17:02       ` Julien Grall
  2019-01-09 17:18         ` Stefano Stabellini
@ 2019-01-09 17:54         ` Wei Liu
  2019-01-09 18:28           ` Julien Grall
  1 sibling, 1 reply; 104+ messages in thread
From: Wei Liu @ 2019-01-09 17:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, Eric Chanudet,
	Roger Pau Monne

On Wed, Jan 09, 2019 at 12:02:34PM -0500, Julien Grall wrote:
> Hi,
> 
> Sorry for the formatting. Sending it from my phone.
> 
> On Wed, 9 Jan 2019, 11:03 Christopher Clark, <christopher.w.clark@gmail.com>
> wrote:
> 
> > On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
> > >
> > > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
> > > > The register op is used by a domain to register a region of memory for
> > > > receiving messages from either a specified other domain, or, if
> > specifying a
> > > > wildcard, any domain.
> > > >
> > > > This operation creates a mapping within Xen's private address space
> > that
> > > > will remain resident for the lifetime of the ring. In subsequent
> > commits,
> > > > the hypervisor will use this mapping to copy data from a sending
> > domain into
> > > > this registered ring, making it accessible to the domain that
> > registered the
> > > > ring to receive data.
> > > >
> > > > Wildcard any-sender rings are default disabled and registration will be
> > > > refused with EPERM unless they have been specifically enabled with the
> > > > argo-mac boot option introduced here. The reason why the default for
> > > > wildcard rings is 'deny' is that there is currently no means to
> > protect the
> > > > ring from DoS by a noisy domain spamming the ring, affecting other
> > domains
> > > > ability to send to it. This will be addressed with XSM policy controls
> > in
> > > > subsequent work.
> > > >
> > > > Since denying access to any-sender rings is a significant functional
> > > > constraint, a new bootparam is provided to enable overriding this:
> > > >  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
> > > > Even though this is a boolean variable, use these descriptive strings
> > in
> > > > order to make it obvious to an administrator that this has potential
> > > > security impact.
> > > >
> > > > The p2m type of the memory supplied by the guest for the ring must be
> > > > p2m_ram_rw and the memory will be pinned as PGT_writable_page while
> > the ring
> > > > is registered.
> > > >
> > > > xen_argo_page_descr_t type is introduced as a page descriptor, to
> > convey
> > > > both the physical address of the start of the page and its
> > granularity. The
> > > > smallest granularity page is assumed to be 4096 bytes and the lower
> > twelve
> > > > bits of the type are used to indicate the size of page of memory
> > supplied.
> > > > The implementation of the hypercall op currently only supports 4K
> > pages.
> > > >
> > >
> > > What is the resolution for the Arm issues mentioned by Julien? I read
> > > the conversation in previous thread. A solution seemed to have been
> > > agreed upon, but the changelog doesn't say anything about it.
> >
> > I made the interface changes that Julien had asked for. The register
> > op now takes arguments that can describe the granularitity of the
> > pages supplied, though only support for 4K pages is accepted in the
> > current implementation. I believe it meets Julien's requirements.
> 
> 
> I still don't think allowing 4K or 64K is the right solution to go. You are
> adding unnecessary burden in the hypervisor and would prevent optimization
> i the hypervisor and unwanted side effect.
> 
> For instance a 64K hypervisor will always map 64K even when the guest is
> passing 4K. You also can't map everything contiguously in Xen (if you ever
> wanted to).
> 
> We need to stick on a single chunk size. That could be different between
> Arm and x86. For Arm it would need to be 64KB.

Doesn't enforcing 64KB granularity has its own limitation as well?
According to my understanding of arm (and this could be wrong), you
would need to have the guest allocate (via memory exchange perhaps) 64KB
machine contiguous memory even when the hypervisor doesn't need it to be
64KB (when hypervisor is running on 4KB granularity).

I think having a method to return granularity to guest, like Stefano
suggested, is more sensible. Hypervisor will then reject registration
request which doesn't conform to the requirement.

Wei.

> 
> Cheers,
> 
> 
> > thanks,
> >
> > Christopher
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xenproject.org
> > https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
@ 2019-01-09 18:05   ` Jason Andryuk
  2019-01-10  2:08     ` Christopher Clark
  2019-01-09 18:57   ` Roger Pau Monné
  2019-01-10 21:41   ` Eric Chanudet
  2 siblings, 1 reply; 104+ messages in thread
From: Jason Andryuk @ 2019-01-09 18:05 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:

<snip>

> @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
>      smp_wmb();
>  }
>
> +static int
> +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> +                     uint32_t len)
> +{
> +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> +    void *dst;
> +    int ret;
> +    unsigned int src_offset = 0;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    offset &= ~PAGE_MASK;
> +
> +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> +        return -EFAULT;
> +
> +    while ( (offset + len) > PAGE_SIZE )
> +    {
> +        unsigned int head_len = PAGE_SIZE - offset;

I think this while loop could be re-written as
while (len) {
    head_len = len > PAGE_SIZE ? PAGE_SIZE - offset: len;

and then the extra copying below outside the loop could be dropped.

The first loop does a partial copy at offset and then sets offset=0.
The next N loops copy exactly PAGE_SIZE.
The Final copy does the remaining len bytes.

> +
> +        ret = ring_map_page(ring_info, mfns_index, &dst);
> +        if ( ret )
> +            return ret;
> +
> +        if ( src )
> +        {
> +            memcpy(dst + offset, src + src_offset, head_len);
> +            src_offset += head_len;
> +        }
> +        else
> +        {
> +            ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
> +                    -EFAULT : 0;
> +            if ( ret )
> +                return ret;
> +
> +            guest_handle_add_offset(src_hnd, head_len);
> +        }
> +
> +        mfns_index++;
> +        len -= head_len;
> +        offset = 0;
> +    }
> +
> +    ret = ring_map_page(ring_info, mfns_index, &dst);
> +    if ( ret )
> +    {
> +        argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page"
> +                     " %d of %d\n", ring_info->id.domain_id, ring_info->id.port,
> +                     ring_info->id.partner_id, ring_info, mfns_index,
> +                     ring_info->nmfns);
> +        return ret;
> +    }
> +
> +    if ( src )
> +        memcpy(dst + offset, src + src_offset, len);
> +    else
> +        ret = copy_from_guest(dst + offset, src_hnd, len) ? -EFAULT : 0;
> +
> +    return ret;
> +}

<snip>

> +
> +/*
> + * iov_count returns its count on success via an out variable to avoid
> + * potential for a negative return value to be used incorrectly
> + * (eg. coerced into an unsigned variable resulting in a large incorrect value)
> + */
> +static int
> +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> +{
> +    uint32_t sum_iov_lens = 0;
> +
> +    if ( niov > XEN_ARGO_MAXIOV )
> +        return -EINVAL;
> +
> +    while ( niov-- )
> +    {
> +        /* valid iovs must have the padding field set to zero */
> +        if ( piov->pad )
> +        {
> +            argo_dprintk("invalid iov: padding is not zero\n");
> +            return -EINVAL;
> +        }
> +
> +        /* check each to protect sum against integer overflow */
> +        if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE )

Should this be MAX_ARGO_MESSAGE_SIZE?  MAX_ARGO_MESSAGE_SIZE is less
than XEN_ARGO_MAX_RING_SIZE, so we can pass this check and then just
fail the one below.

> +        {
> +            argo_dprintk("invalid iov_len: too big (%u)>%llu\n",
> +                         piov->iov_len, XEN_ARGO_MAX_RING_SIZE);
> +            return -EINVAL;
> +        }
> +
> +        sum_iov_lens += piov->iov_len;
> +
> +        /*
> +         * Again protect sum from integer overflow
> +         * and ensure total msg size will be within bounds.
> +         */
> +        if ( sum_iov_lens > MAX_ARGO_MESSAGE_SIZE )
> +        {
> +            argo_dprintk("invalid iov series: total message too big\n");
> +            return -EMSGSIZE;
> +        }
> +
> +        piov++;
> +    }
> +
> +    *count = sum_iov_lens;
> +
> +    return 0;
> +}
> +

<snip>

> @@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>          break;
>      }
>
> +    case XEN_ARGO_OP_sendv:
> +    {
> +        xen_argo_send_addr_t send_addr;
> +
> +        XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd =
> +            guest_handle_cast(arg1, xen_argo_send_addr_t);
> +        XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd =
> +            guest_handle_cast(arg2, xen_argo_iov_t);
> +        /* arg3 is niov */
> +        /* arg4 is message_type. Must be a 32-bit value. */
> +
> +        rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0;
> +        if ( rc )
> +            break;
> +
> +        if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY )
> +            send_addr.src.domain_id = currd->domain_id;
> +
> +        /* No domain is currently authorized to send on behalf of another */
> +        if ( unlikely(send_addr.src.domain_id != currd->domain_id) )
> +        {
> +            rc = -EPERM;
> +            break;
> +        }
> +
> +        /* Reject niov or message_type values that are outside 32 bit range. */
> +        if ( unlikely((arg3 > XEN_ARGO_MAXIOV) || (arg4 & ~0xffffffffUL)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }

This needs to check send_addr.src.pad and send_addr.dst.pad == 0.
sendv() does not check the padding either.

Regards,
Jason

> +
> +        /*
> +         * Check access to the whole array here so we can use the faster __copy
> +         * operations to read each element later.
> +         */
> +        if ( unlikely(!guest_handle_okay(iovs_hnd, arg3)) )
> +            break;
> +
> +        rc = sendv(currd, &send_addr.src, &send_addr.dst, iovs_hnd, arg3, arg4);
> +        break;
> +    }
> +
>      default:
>          rc = -EOPNOTSUPP;
>          break;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 17:18         ` Stefano Stabellini
@ 2019-01-09 18:13           ` Julien Grall
  2019-01-09 20:33             ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Julien Grall @ 2019-01-09 18:13 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Tim Deegan, Wei Liu, Ross Philipson, Jason Andryuk, Daniel Smith,
	Andrew Cooper, Konrad Rzeszutek Wilk, Ian Jackson,
	Christopher Clark, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 6226 bytes --]

Ki

On Wed, 9 Jan 2019, 12:18 Stefano Stabellini, <sstabellini@kernel.org>
wrote:

> On Wed, 9 Jan 2019, Julien Grall wrote:
> > Hi,
> > Sorry for the formatting. Sending it from my phone.
> >
> > On Wed, 9 Jan 2019, 11:03 Christopher Clark, <
> christopher.w.clark@gmail.com> wrote:
> >       On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com>
> wrote:
> >       >
> >       > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark
> wrote:
> >       > > The register op is used by a domain to register a region of
> memory for
> >       > > receiving messages from either a specified other domain, or,
> if specifying a
> >       > > wildcard, any domain.
> >       > >
> >       > > This operation creates a mapping within Xen's private address
> space that
> >       > > will remain resident for the lifetime of the ring. In
> subsequent commits,
> >       > > the hypervisor will use this mapping to copy data from a
> sending domain into
> >       > > this registered ring, making it accessible to the domain that
> registered the
> >       > > ring to receive data.
> >       > >
> >       > > Wildcard any-sender rings are default disabled and
> registration will be
> >       > > refused with EPERM unless they have been specifically enabled
> with the
> >       > > argo-mac boot option introduced here. The reason why the
> default for
> >       > > wildcard rings is 'deny' is that there is currently no means
> to protect the
> >       > > ring from DoS by a noisy domain spamming the ring, affecting
> other domains
> >       > > ability to send to it. This will be addressed with XSM policy
> controls in
> >       > > subsequent work.
> >       > >
> >       > > Since denying access to any-sender rings is a significant
> functional
> >       > > constraint, a new bootparam is provided to enable overriding
> this:
> >       > >  "argo-mac" variable has allowed values: 'permissive' and
> 'enforcing'.
> >       > > Even though this is a boolean variable, use these descriptive
> strings in
> >       > > order to make it obvious to an administrator that this has
> potential
> >       > > security impact.
> >       > >
> >       > > The p2m type of the memory supplied by the guest for the ring
> must be
> >       > > p2m_ram_rw and the memory will be pinned as PGT_writable_page
> while the ring
> >       > > is registered.
> >       > >
> >       > > xen_argo_page_descr_t type is introduced as a page descriptor,
> to convey
> >       > > both the physical address of the start of the page and its
> granularity. The
> >       > > smallest granularity page is assumed to be 4096 bytes and the
> lower twelve
> >       > > bits of the type are used to indicate the size of page of
> memory supplied.
> >       > > The implementation of the hypercall op currently only supports
> 4K pages.
> >       > >
> >       >
> >       > What is the resolution for the Arm issues mentioned by Julien? I
> read
> >       > the conversation in previous thread. A solution seemed to have
> been
> >       > agreed upon, but the changelog doesn't say anything about it.
> >
> >       I made the interface changes that Julien had asked for. The
> register
> >       op now takes arguments that can describe the granularitity of the
> >       pages supplied, though only support for 4K pages is accepted in the
> >       current implementation. I believe it meets Julien's requirements.
> >
> >
> > I still don't think allowing 4K or 64K is the right solution to go. You
> are adding unnecessary burden in the hypervisor and would
> > prevent optimization i the hypervisor and unwanted side effect.
> >
> > For instance a 64K hypervisor will always map 64K even when the guest is
> passing 4K. You also can't map everything contiguously
> > in Xen (if you ever wanted to).
> >
> > We need to stick on a single chunk size. That could be different between
> Arm and x86. For Arm it would need to be 64KB.
>
> Hi Julien!
>
> I don't think we should force 64K as the only granularity on ARM. It
> causes unnecessary overhead and confusion on 4K-only deployments that
> are almost all our use-cases today.


Why a confusion? People should read the documentation when writing a
driver...


> One option is to make the granularity configurable at the guest side,
> like Christopher did, letting the guest specifying the granularity. The
> hypervisor could return -ENOSYS if the specified granularity is not
> supported.
>
> The other option is having the hypervisor export the granularity it
> supports for this interface: Xen would say "I support only 4K".
> Tomorrow, it could change and Xen could say "I support only 64K". Then,
> it would be up to the guest passing a page of the right granularity to
> the hypervisor. I think this is probably the best option, but it would
> require the addition of one hypercall to retrieve the supported
> granularity from Xen.


I would recommend to read my initial answers on the first series to
understand why allowing 4K is an issue.

AFAIK virtio and UEFI has restrictions to allow a guest running
agnostically of the hypervisor page-granularity. An example is to mandate
64K chunk or 64KB aligned address.

With your suggestion you are going to break many use-cases if the
hypervisor is moving to 64KB. At worst it could introduce security issues.
At best preventing optimization in the hypervisor or prevent guest running
(bad for backward compatibility).

Actually, this is not going to help moving towards 64K in Argo because you
would still have to modify the kernel. So this does not meet my
requirements.

I don't think requiring 64K chunk is going to be a major issue nor a
concern. Unless you expect small ring... Christoffer, what is the expected
size?

Another solution was to require contiguous guest physical memory. That
would solve quite a few problem on Arm. But Christoffer had some convincing
point to not implement this.

As I said before, I know this is not going to be the only place with that
issue. I merely wanted to start tackling the problem. However, IHMO, this
interface is not more suitable than what we have currently. So this raise
the question on whether we should just use the usual Xen interface if 64K
is not an option...

Cheers,

[-- Attachment #1.2: Type: text/html, Size: 7889 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 17:54         ` Wei Liu
@ 2019-01-09 18:28           ` Julien Grall
  2019-01-09 20:38             ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Julien Grall @ 2019-01-09 18:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: Tim Deegan, Stefano Stabellini, Ross Philipson, Jason Andryuk,
	Daniel Smith, Andrew Cooper, Konrad Rzeszutek Wilk, Ian Jackson,
	Christopher Clark, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 5504 bytes --]

On Wed, 9 Jan 2019, 12:54 Wei Liu, <wei.liu2@citrix.com> wrote:

> On Wed, Jan 09, 2019 at 12:02:34PM -0500, Julien Grall wrote:
> > Hi,
> >
> > Sorry for the formatting. Sending it from my phone.
> >
> > On Wed, 9 Jan 2019, 11:03 Christopher Clark, <
> christopher.w.clark@gmail.com>
> > wrote:
> >
> > > On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
> > > >
> > > > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
> > > > > The register op is used by a domain to register a region of memory
> for
> > > > > receiving messages from either a specified other domain, or, if
> > > specifying a
> > > > > wildcard, any domain.
> > > > >
> > > > > This operation creates a mapping within Xen's private address space
> > > that
> > > > > will remain resident for the lifetime of the ring. In subsequent
> > > commits,
> > > > > the hypervisor will use this mapping to copy data from a sending
> > > domain into
> > > > > this registered ring, making it accessible to the domain that
> > > registered the
> > > > > ring to receive data.
> > > > >
> > > > > Wildcard any-sender rings are default disabled and registration
> will be
> > > > > refused with EPERM unless they have been specifically enabled with
> the
> > > > > argo-mac boot option introduced here. The reason why the default
> for
> > > > > wildcard rings is 'deny' is that there is currently no means to
> > > protect the
> > > > > ring from DoS by a noisy domain spamming the ring, affecting other
> > > domains
> > > > > ability to send to it. This will be addressed with XSM policy
> controls
> > > in
> > > > > subsequent work.
> > > > >
> > > > > Since denying access to any-sender rings is a significant
> functional
> > > > > constraint, a new bootparam is provided to enable overriding this:
> > > > >  "argo-mac" variable has allowed values: 'permissive' and
> 'enforcing'.
> > > > > Even though this is a boolean variable, use these descriptive
> strings
> > > in
> > > > > order to make it obvious to an administrator that this has
> potential
> > > > > security impact.
> > > > >
> > > > > The p2m type of the memory supplied by the guest for the ring must
> be
> > > > > p2m_ram_rw and the memory will be pinned as PGT_writable_page while
> > > the ring
> > > > > is registered.
> > > > >
> > > > > xen_argo_page_descr_t type is introduced as a page descriptor, to
> > > convey
> > > > > both the physical address of the start of the page and its
> > > granularity. The
> > > > > smallest granularity page is assumed to be 4096 bytes and the lower
> > > twelve
> > > > > bits of the type are used to indicate the size of page of memory
> > > supplied.
> > > > > The implementation of the hypercall op currently only supports 4K
> > > pages.
> > > > >
> > > >
> > > > What is the resolution for the Arm issues mentioned by Julien? I read
> > > > the conversation in previous thread. A solution seemed to have been
> > > > agreed upon, but the changelog doesn't say anything about it.
> > >
> > > I made the interface changes that Julien had asked for. The register
> > > op now takes arguments that can describe the granularitity of the
> > > pages supplied, though only support for 4K pages is accepted in the
> > > current implementation. I believe it meets Julien's requirements.
> >
> >
> > I still don't think allowing 4K or 64K is the right solution to go. You
> are
> > adding unnecessary burden in the hypervisor and would prevent
> optimization
> > i the hypervisor and unwanted side effect.
> >
> > For instance a 64K hypervisor will always map 64K even when the guest is
> > passing 4K. You also can't map everything contiguously in Xen (if you
> ever
> > wanted to).
> >
> > We need to stick on a single chunk size. That could be different between
> > Arm and x86. For Arm it would need to be 64KB.
>
> Doesn't enforcing 64KB granularity has its own limitation as well?
> According to my understanding of arm (and this could be wrong), you
> would need to have the guest allocate (via memory exchange perhaps) 64KB
> machine contiguous memory even when the hypervisor doesn't need it to be
> 64KB (when hypervisor is running on 4KB granularity).


The 64K is just about the interface with the guest.
The hypervisor could just split the 64K in 16 4K chunk. No need for memory
exchange here.


> I think having a method to return granularity to guest, like Stefano
> suggested, is more sensible. Hypervisor will then reject registration
> request which doesn't conform to the requirement.
>

The problem is not that simple... For instance, 64K is required to support
52-bits PA yet you may still want to run your current Debian on that
platform.

You can do that nicely on KVM but on Xen it is a pain due to the current
interface. If you use 4K you may end up to expose too much to the other
side.

The only viable solution here is a full re-design of the ABI for Arm. We
can do that step by step or at one go.

The discussion here was to start solving it on Argo so that's one less step
to do. Christoffer kindly try to tackle it. Sadly, I don't think the
interface suggested is going to work.

But I don't want Argo to miss 4.12 for that. So maybe the solution is to
stick with the usal Xen interface.

Best regards,


> Wei.
>
> >
> > Cheers,
> >
> >
> > > thanks,
> > >
> > > Christopher
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xenproject.org
> > > https://lists.xenproject.org/mailman/listinfo/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 7718 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
  2019-01-09 18:05   ` Jason Andryuk
@ 2019-01-09 18:57   ` Roger Pau Monné
  2019-01-10  3:09     ` Christopher Clark
  2019-01-10 21:41   ` Eric Chanudet
  2 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-09 18:57 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

 to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> sendv operation is invoked to perform a synchronous send of buffers
> contained in iovs to a remote domain's registered ring.
>
> It takes:
>  * A destination address (domid, port) for the ring to send to.
>    It performs a most-specific match lookup, to allow for wildcard.
>  * A source address, used to inform the destination of where to reply.
>  * The address of an array of iovs containing the data to send
>  * .. and the length of that array of iovs
>  * and a 32-bit message type, available to communicate message context
>    data (eg. kernel-to-kernel, separate from the application data).
>
> If insufficient space exists in the destination ring, it will return
> -EAGAIN and Xen will notify the caller when sufficient space becomes
> available.
>
> Accesses to the ring indices are appropriately atomic. The rings are
> mapped into Xen's private address space to write as needed and the
> mappings are retained for later use.
>
> Fixed-size types are used in some areas within this code where caution
> around avoiding integer overflow is important.
>
> Notifications are sent to guests via VIRQ and send_guest_global_virq is
> exposed in the change to enable argo to call it. VIRQ_ARGO_MESSAGE is
> claimed from the VIRQ previously reserved for this purpose (#11).
>
> The VIRQ notification method is used rather than sending events using
> evtchn functions directly because:
>
> * no current event channel type is an exact fit for the intended
>   behaviour. ECS_IPI is closest, but it disallows migration to
>   other VCPUs which is not necessarily a requirement for Argo.
>
> * at the point of argo_init, allocation of an event channel is
>   complicated by none of the guest VCPUs being initialized yet
>   and the event channel logic expects that a valid event channel
>   has a present VCPU.
>
> * at the point of signalling a notification, the VIRQ logic is already
>   defensive: if d->vcpu[0] is NULL, the notification is just silently
>   dropped, whereas the evtchn_send logic is not so defensive: vcpu[0]
>   must not be NULL, otherwise a null pointer dereference occurs.
>
> Using a VIRQ removes the need for the guest to query to determine which
> event channel notifications will be delivered on. This is also likely to
> simplify establishing future L0/L1 nested hypervisor argo communication.
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
> The previous double-read of iovs from guest memory has been removed.
>
> v2 self: use ring_info backpointer in pending_ent to maintain npending
> v2 feedback Jan: drop cookie, implement teardown
> v2 self: pending_queue: reap stale ents when in need of space
> v2 self: pending_requeue: reclaim ents for stale domains
> v2.feedback Jan: only override sender domid if DOMID_ANY
> v2 feedback Jan: drop message from argo_message_op
> v2 self: check npending vs maximum limit
> v2 self: get_sanitized_ring instead of get_rx_ptr
> v2 feedback v1#13 Jan: remove double read from ringbuf insert, lower MAX_IOV
> v2 self: make iov_count const
> v2 self: iov_count : return EMSGSIZE for message too big
> v2 self: OVERHAUL
> v2 self: s/argo_pending_ent/pending_ent/g
> v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
> v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v1 feedback #13 Jan: drop guest_handle_okay when using copy_from_guest
>     - reorder do_argo_op logic
> v2 self: add _hnd suffix to iovs variable name to indicate guest handle type
> v2 self: replace use of XEN_GUEST_HANDLE_NULL with two existing macros
>
> v1 #15 feedback, Jan: sendv op : s/ECONNREFUSED/ESRCH/
> v1 #5 (#15) feedback Paul: sendv: use currd in do_argo_message_op
> v1 #13 (#15) feedback Paul: sendv op: do/while reindent only
> v1 #13 (#15) feedback Paul: sendv op: do/while: argo_ringbuf_insert to goto style
> v1 #13 (#15) feedback Paul: sendv op: do/while: reindent only again
> v1 #13 (#15) feedback Paul: sendv op: do/while : goto
> v1 #15 feedback Paul: sendv op: make page var: unsigned
> v1 #15 feedback Paul: sendv op: new local var for PAGE_SIZE - offset
> v1 #8 feedback Jan: XEN_GUEST_HANDLE : C89 compliance
> v1 rebase after switching register op from pfns to page descriptors
> v1 self: move iov DEFINE_XEN_GUEST_HANDLE out of public header into argo.c
> v1 #13 (#15) feedback Paul: fix loglevel for guest-triggered messages
> v1 : add compat xlat.lst entries
> v1 self: switched notification to send_guest_global_virq instead of event
> v1: fix gprintk use for ARM as its defn dislikes split format strings
> v1: init len variable to satisfy ARM compiler initialized checking
> v1 #13 feedback Jan: rename page var
> v1:#14 feedback Jan: uint8_t* -> void*
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: #13 feedback Jan: blank line after case op in do_argo_message_op
> v1: #15 feedback Jan: add comments explaining why the writes don't overrun
> v1: self: add ASSERT to support comment that overrun cannot happen
> v1: self: fail on short writes where guest manipulated the iov_lens
> v1: self: rename ent id to domain_id
> v1: self: add moan for iov rewrite
> v1. feedback #15 Jan: require the pad bits are zero
> v1. feedback #15 Jan: drop NULL check in argo_signal_domain as now using VIRQ
> v1. self: store domain_cookie in pending ent
> v1. feedback #15 Jan: use unsigned where possible
> v1. feedback Jan: use handle type for iov_base in public iov interface
> v1. self: log whenever visible error occurs
> v1 feedback #15, Jan: drop unnecessary mb
> v1 self: only update internal tx_ptr if able to return success
>          and update the visible tx_ptr
> v1 self: log on failure to map ring to update visible tx_ptr
> v1 feedback #15 Jan: add comment re: notification size policy
> v1 self/Roger? remove errant space after sizeof
> v1. feedback #15 Jan: require iov pad be zero
> v1. self: rename iov_base to iov_hnd for handle in public iov interface
> v1: feedback #15 Jan: handle upper-halves of hypercall args; changes some
>     types in function signatures to match.
> v1: self: add dprintk to sendv
> v1: self: add debug output to argo_iov_count
> v1. feedback #14 Jan: blank line before return in argo_iov_count
> v1 feedback #15 Jan: verify src id, not override
>
>  xen/common/argo.c          | 653 +++++++++++++++++++++++++++++++++++++++++++++
>  xen/common/event_channel.c |   2 +-
>  xen/include/public/argo.h  |  60 +++++
>  xen/include/public/xen.h   |   2 +-
>  xen/include/xen/event.h    |   7 +
>  xen/include/xlat.lst       |   2 +
>  6 files changed, 724 insertions(+), 2 deletions(-)
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 59ce8c4..4548435 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -29,14 +29,21 @@
>  #include <public/argo.h>
>
>  #define MAX_RINGS_PER_DOMAIN            128U
> +#define MAX_PENDING_PER_RING             32U
>
>  /* All messages on the ring are padded to a multiple of the slot size. */
>  #define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
>
> +/* The maximum size of a message that may be sent on the largest Argo ring. */
> +#define MAX_ARGO_MESSAGE_SIZE ((XEN_ARGO_MAX_RING_SIZE) - \
> +        (sizeof(struct xen_argo_ring_message_header)) - ROUNDUP_MESSAGE(1))
> +
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_iov_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_send_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
>
>  /* Xen command line option to enable argo */
> @@ -250,6 +257,14 @@ hash_index(const struct argo_ring_id *id)
>  }
>
>  static void
> +signal_domain(struct domain *d)
> +{
> +    argo_dprintk("signalling domid:%d\n", d->domain_id);
> +
> +    send_guest_global_virq(d, VIRQ_ARGO_MESSAGE);
> +}
> +
> +static void
>  ring_unmap(struct argo_ring_info *ring_info)
>  {
>      unsigned int i;
> @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
>      smp_wmb();
>  }
>
> +static int
> +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> +                     uint32_t len)
> +{
> +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> +    void *dst;
> +    int ret;
> +    unsigned int src_offset = 0;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    offset &= ~PAGE_MASK;
> +
> +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> +        return -EFAULT;
> +
> +    while ( (offset + len) > PAGE_SIZE )

I think you could map the whole ring in contiguous virtual address
space and then writing to it would be much more easy, you wouldn't
need to iterate with memcpy or copy_from_guest, take a look at __vmap.
You could likely map this when the ring gets setup and keep it mapped
for the lifetime of the ring.

> +    {
> +        unsigned int head_len = PAGE_SIZE - offset;
> +
> +        ret = ring_map_page(ring_info, mfns_index, &dst);
> +        if ( ret )
> +            return ret;
> +
> +        if ( src )
> +        {
> +            memcpy(dst + offset, src + src_offset, head_len);
> +            src_offset += head_len;
> +        }
> +        else
> +        {
> +            ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
> +                    -EFAULT : 0;
> +            if ( ret )
> +                return ret;

You can simplify this to:

if ( copy_from_guest(...) )
    return -EFAULT;

> +
> +            guest_handle_add_offset(src_hnd, head_len);
> +        }
> +
> +        mfns_index++;
> +        len -= head_len;
> +        offset = 0;
> +    }
> +
> +    ret = ring_map_page(ring_info, mfns_index, &dst);
> +    if ( ret )
> +    {
> +        argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page"
> +                     " %d of %d\n", ring_info->id.domain_id, ring_info->id.port,
> +                     ring_info->id.partner_id, ring_info, mfns_index,
> +                     ring_info->nmfns);
> +        return ret;
> +    }
> +
> +    if ( src )
> +        memcpy(dst + offset, src + src_offset, len);
> +    else
> +        ret = copy_from_guest(dst + offset, src_hnd, len) ? -EFAULT : 0;
> +
> +    return ret;
> +}
> +
> +/*
> + * Use this with caution: rx_ptr is under guest control and may be bogus.
> + * See get_sanitized_ring for a safer alternative.
> + */
> +static int
> +get_rx_ptr(struct argo_ring_info *ring_info, uint32_t *rx_ptr)
> +{
> +    void *src;
> +    xen_argo_ring_t *ringp;
> +    int ret;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    if ( !ring_info->nmfns || ring_info->nmfns < ring_info->npage )
> +        return -EINVAL;
> +
> +    ret = ring_map_page(ring_info, 0, &src);
> +    if ( ret )
> +        return ret;
> +
> +    ringp = (xen_argo_ring_t *)src;
> +
> +    *rx_ptr = read_atomic(&ringp->rx_ptr);
> +
> +    return 0;
> +}
> +
> +/*
> + * get_sanitized_ring creates a modified copy of the ring pointers where
> + * the rx_ptr is rounded up to ensure it is aligned, and then ring
> + * wrap is handled. Simplifies safe use of the rx_ptr for available
> + * space calculation.
> + */
> +static int
> +get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
> +{
> +    uint32_t rx_ptr;
> +    int ret;
> +
> +    ret = get_rx_ptr(ring_info, &rx_ptr);
> +    if ( ret )
> +        return ret;
> +
> +    ring->tx_ptr = ring_info->tx_ptr;
> +
> +    rx_ptr = ROUNDUP_MESSAGE(rx_ptr);
> +    if ( rx_ptr >= ring_info->len )
> +        rx_ptr = 0;
> +
> +    ring->rx_ptr = rx_ptr;

Newline.

> +    return 0;
> +}
> +
> +/*
> + * iov_count returns its count on success via an out variable to avoid
> + * potential for a negative return value to be used incorrectly
> + * (eg. coerced into an unsigned variable resulting in a large incorrect value)
> + */
> +static int
> +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> +{
> +    uint32_t sum_iov_lens = 0;
> +
> +    if ( niov > XEN_ARGO_MAXIOV )
> +        return -EINVAL;
> +
> +    while ( niov-- )

I would use a for loop here, that would remove the need to piov++, if
you want to keep it quite similar:

for ( ; niov--; piov++ )
{
    ...

> +    {
> +        /* valid iovs must have the padding field set to zero */
> +        if ( piov->pad )
> +        {
> +            argo_dprintk("invalid iov: padding is not zero\n");
> +            return -EINVAL;
> +        }
> +
> +        /* check each to protect sum against integer overflow */
> +        if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE )
> +        {
> +            argo_dprintk("invalid iov_len: too big (%u)>%llu\n",
> +                         piov->iov_len, XEN_ARGO_MAX_RING_SIZE);
> +            return -EINVAL;
> +        }
> +
> +        sum_iov_lens += piov->iov_len;
> +
> +        /*
> +         * Again protect sum from integer overflow
> +         * and ensure total msg size will be within bounds.
> +         */
> +        if ( sum_iov_lens > MAX_ARGO_MESSAGE_SIZE )
> +        {
> +            argo_dprintk("invalid iov series: total message too big\n");
> +            return -EMSGSIZE;
> +        }
> +
> +        piov++;
> +    }
> +
> +    *count = sum_iov_lens;
> +
> +    return 0;
> +}
> +
> +static int
> +ringbuf_insert(struct domain *d, struct argo_ring_info *ring_info,
> +               const struct argo_ring_id *src_id,
> +               XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd,
> +               unsigned long niov, uint32_t message_type,
> +               unsigned long *out_len)
> +{
> +    xen_argo_ring_t ring;
> +    struct xen_argo_ring_message_header mh = { 0 };

No need for the 0, { } will achieve exactly the same.

> +    int32_t sp;
> +    int32_t ret;
> +    uint32_t len = 0;
> +    xen_argo_iov_t iovs[XEN_ARGO_MAXIOV];

This seems slightly dangerous, a change of the maximum could cause
stack overflow depending on the size of xen_argo_iov_t. I think you
need some comment next to definition of XEN_ARGO_MAXIOV to note that
increasing this could cause issues.

> +    xen_argo_iov_t *piov;
> +    XEN_GUEST_HANDLE(uint8_t) NULL_hnd =
> +       guest_handle_from_param(guest_handle_from_ptr(NULL, uint8_t), uint8_t);
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    ret = __copy_from_guest(iovs, iovs_hnd, niov) ? -EFAULT : 0;
> +    if ( ret )
> +        goto out;
> +
> +    /*
> +     * Obtain the total size of data to transmit -- sets the 'len' variable
> +     * -- and sanity check that the iovs conform to size and number limits.
> +     * Enforced below: no more than 'len' bytes of guest data
> +     * (plus the message header) will be sent in this operation.
> +     */
> +    ret = iov_count(iovs, niov, &len);
> +    if ( ret )
> +        goto out;
> +
> +    /*
> +     * Size bounds check against ring size and static maximum message limit.
> +     * The message must not fill the ring; there must be at least one slot
> +     * remaining so we can distinguish a full ring from an empty one.
> +     */
> +    if ( ((ROUNDUP_MESSAGE(len) +
> +            sizeof(struct xen_argo_ring_message_header)) >= ring_info->len) ||
> +         (len > MAX_ARGO_MESSAGE_SIZE) )
> +    {
> +        ret = -EMSGSIZE;
> +        goto out;
> +    }
> +
> +    ret = get_sanitized_ring(&ring, ring_info);
> +    if ( ret )
> +        goto out;
> +
> +    argo_dprintk("ring.tx_ptr=%d ring.rx_ptr=%d ring len=%d"
> +                 " ring_info->tx_ptr=%d\n",
> +                 ring.tx_ptr, ring.rx_ptr, ring_info->len, ring_info->tx_ptr);
> +
> +    if ( ring.rx_ptr == ring.tx_ptr )
> +        sp = ring_info->len;
> +    else
> +    {
> +        sp = ring.rx_ptr - ring.tx_ptr;
> +        if ( sp < 0 )
> +            sp += ring_info->len;
> +    }
> +
> +    /*
> +     * Size bounds check against currently available space in the ring.
> +     * Again: the message must not fill the ring leaving no space remaining.
> +     */
> +    if ( (ROUNDUP_MESSAGE(len) +
> +            sizeof(struct xen_argo_ring_message_header)) >= sp )
> +    {
> +        argo_dprintk("EAGAIN\n");
> +        ret = -EAGAIN;
> +        goto out;
> +    }
> +
> +    mh.len = len + sizeof(struct xen_argo_ring_message_header);
> +    mh.source.port = src_id->port;
> +    mh.source.domain_id = src_id->domain_id;
> +    mh.message_type = message_type;
> +
> +    /*
> +     * For this copy to the guest ring, tx_ptr is always 16-byte aligned
> +     * and the message header is 16 bytes long.
> +     */
> +    BUILD_BUG_ON(
> +        sizeof(struct xen_argo_ring_message_header) != ROUNDUP_MESSAGE(1));
> +
> +    /*
> +     * First data write into the destination ring: fixed size, message header.
> +     * This cannot overrun because the available free space (value in 'sp')
> +     * is checked above and must be at least this size.
> +     */
> +    ret = memcpy_to_guest_ring(ring_info, ring.tx_ptr + sizeof(xen_argo_ring_t),
> +                               &mh, NULL_hnd, sizeof(mh));
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: failed to write message header to ring (vm%u:%x vm%d)\n",
> +                ring_info->id.domain_id, ring_info->id.port,
> +                ring_info->id.partner_id);
> +
> +        goto out;
> +    }
> +
> +    ring.tx_ptr += sizeof(mh);
> +    if ( ring.tx_ptr == ring_info->len )
> +        ring.tx_ptr = 0;
> +
> +    piov = iovs;
> +
> +    while ( niov-- )

AFAICT using a for loop would remove the need to also do a piov++ at
each iteration.

> +    {
> +        XEN_GUEST_HANDLE_64(uint8_t) buf_hnd = piov->iov_hnd;
> +        uint32_t iov_len = piov->iov_len;
> +
> +        /* If no data is provided in this iov, moan and skip on to the next */
> +        if ( !iov_len )
> +        {
> +            gprintk(XENLOG_ERR,
> +                    "argo: no data iov_len=0 iov_hnd=%p ring (vm%u:%x vm%d)\n",
> +                    buf_hnd.p, ring_info->id.domain_id, ring_info->id.port,
> +                    ring_info->id.partner_id);
> +
> +            piov++;
> +            continue;
> +        }
> +
> +        if ( unlikely(!guest_handle_okay(buf_hnd, iov_len)) )
> +        {
> +            gprintk(XENLOG_ERR,
> +                    "argo: bad iov handle [%p, %"PRIx32"] (vm%u:%x vm%d)\n",
> +                    buf_hnd.p, iov_len,
> +                    ring_info->id.domain_id, ring_info->id.port,
> +                    ring_info->id.partner_id);
> +
> +            ret = -EFAULT;
> +            goto out;
> +        }
> +
> +        sp = ring_info->len - ring.tx_ptr;
> +
> +        /* Check: iov data size versus free space at the tail of the ring */
> +        if ( iov_len > sp )
> +        {
> +            /*
> +             * Second possible data write: ring-tail-wrap-write.
> +             * Populate the ring tail and update the internal tx_ptr to handle
> +             * wrapping at the end of ring.
> +             * Size of data written here: sp
> +             * which is the exact full amount of free space available at the
> +             * tail of the ring, so this cannot overrun.
> +             */
> +            ret = memcpy_to_guest_ring(ring_info,
> +                                       ring.tx_ptr + sizeof(xen_argo_ring_t),
> +                                       NULL, buf_hnd, sp);
> +            if ( ret )
> +            {
> +                gprintk(XENLOG_ERR,
> +                        "argo: failed to copy {%p, %"PRIx32"} (vm%u:%x vm%d)\n",
> +                        buf_hnd.p, sp,
> +                        ring_info->id.domain_id, ring_info->id.port,
> +                        ring_info->id.partner_id);
> +
> +                goto out;
> +            }
> +
> +            ring.tx_ptr = 0;
> +            iov_len -= sp;
> +            guest_handle_add_offset(buf_hnd, sp);
> +
> +            ASSERT(iov_len <= ring_info->len);
> +        }
> +
> +        /*
> +         * Third possible data write: all data remaining for this iov.
> +         * Size of data written here: iov_len
> +         *
> +         * Case 1: if the ring-tail-wrap-write above was performed, then
> +         *         iov_len has been decreased by 'sp' and ring.tx_ptr is zero.
> +         *
> +         *    We know from checking the result of iov_count:
> +         *      len + sizeof(message_header) <= ring_info->len
> +         *    We also know that len is the total of summing all iov_lens, so:
> +         *       iov_len <= len
> +         *    so by transitivity:
> +         *       iov_len <= len <= (ring_info->len - sizeof(msgheader))
> +         *    and therefore:
> +         *       (iov_len + sizeof(msgheader) <= ring_info->len) &&
> +         *       (ring.tx_ptr == 0)
> +         *    so this write cannot overrun here.
> +         *
> +         * Case 2: ring-tail-wrap-write above was not performed
> +         *    -> so iov_len is the guest-supplied value and: (iov_len <= sp)
> +         *    ie. less than available space at the tail of the ring:
> +         *        so this write cannot overrun.
> +         */
> +        ret = memcpy_to_guest_ring(ring_info,
> +                                   ring.tx_ptr + sizeof(xen_argo_ring_t),
> +                                   NULL, buf_hnd, iov_len);
> +        if ( ret )
> +        {
> +            gprintk(XENLOG_ERR,
> +                    "argo: failed to copy [%p, %"PRIx32"] (vm%u:%x vm%d)\n",
> +                    buf_hnd.p, iov_len, ring_info->id.domain_id,
> +                    ring_info->id.port, ring_info->id.partner_id);
> +
> +            goto out;
> +        }
> +
> +        ring.tx_ptr += iov_len;
> +
> +        if ( ring.tx_ptr == ring_info->len )
> +            ring.tx_ptr = 0;
> +
> +        piov++;
> +    }
> +
> +    ring.tx_ptr = ROUNDUP_MESSAGE(ring.tx_ptr);
> +
> +    if ( ring.tx_ptr >= ring_info->len )
> +        ring.tx_ptr -= ring_info->len;
> +
> +    update_tx_ptr(ring_info, ring.tx_ptr);
> +
> + out:

Do you really need to out label? *out_len it's only set in the success
case, so all the error cases that use a 'goto out' could be replaced
by 'return ret;'.

> +    /*
> +     * At this point it is possible to unmap the ring_info, ie:
> +     *   ring_unmap(ring_info);
> +     * but performance should be improved by not doing so, and retaining
> +     * the mapping.
> +     * An XSM policy control over level of confidentiality required
> +     * versus performance cost could be added to decide that here.
> +     * See the similar comment in ring_map_page re: write-only mappings.
> +     */
> +
> +    if ( !ret )
> +        *out_len = len;
> +
> +    return ret;
> +}
> +
>  static void
>  wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
>  {
> @@ -359,6 +781,22 @@ wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
>  }
>
>  static void
> +wildcard_pending_list_insert(domid_t domain_id, struct pending_ent *ent)
> +{
> +    struct domain *d = get_domain_by_id(domain_id);
> +    if ( !d )
> +        return;
> +
> +    if ( d->argo )
> +    {
> +        spin_lock(&d->argo->wildcard_lock);
> +        hlist_add_head(&ent->wildcard_node, &d->argo->wildcard_pend_list);
> +        spin_unlock(&d->argo->wildcard_lock);
> +    }
> +    put_domain(d);
> +}
> +
> +static void
>  pending_remove_all(struct argo_ring_info *ring_info)
>  {
>      struct hlist_node *node, *next;
> @@ -374,6 +812,67 @@ pending_remove_all(struct argo_ring_info *ring_info)
>      ring_info->npending = 0;
>  }
>
> +static int
> +pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
> +              unsigned int len)
> +{
> +    struct pending_ent *ent;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    if ( ring_info->npending >= MAX_PENDING_PER_RING )
> +        return -ENOSPC;
> +
> +    ent = xmalloc(struct pending_ent);
> +

Extra newline.

> +    if ( !ent )
> +        return -ENOMEM;
> +
> +    ent->len = len;
> +    ent->domain_id = src_id;
> +    ent->ring_info = ring_info;
> +
> +    if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +        wildcard_pending_list_insert(src_id, ent);
> +    hlist_add_head(&ent->node, &ring_info->pending);
> +    ring_info->npending++;
> +
> +    return 0;
> +}
> +
> +static int
> +pending_requeue(struct argo_ring_info *ring_info, domid_t src_id,
> +                unsigned int len)
> +{
> +    struct hlist_node *node;
> +    struct pending_ent *ent;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    hlist_for_each_entry(ent, node, &ring_info->pending, node)
> +    {
> +        if ( ent->domain_id == src_id )
> +        {
> +            /*
> +             * Reuse an existing queue entry for a notification rather than add
> +             * another. If the existing entry is waiting for a smaller size than
> +             * the current message then adjust the record to wait for the
> +             * current (larger) size to be available before triggering a
> +             * notification.
> +             * This assists the waiting sender by ensuring that whenever a
> +             * notification is triggered, there is sufficient space available
> +             * for (at least) any one of the messages awaiting transmission.
> +             */
> +            if ( ent->len < len )
> +                ent->len = len;
> +
> +            return 0;
> +        }
> +    }
> +
> +    return pending_queue(ring_info, src_id, len);
> +}
> +
>  static void
>  wildcard_rings_pending_remove(struct domain *d)
>  {
> @@ -667,6 +1166,28 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
>      return NULL;
>  }
>
> +static struct argo_ring_info *
> +ring_find_info_by_match(const struct domain *d, uint32_t port,
> +                        domid_t partner_id)
> +{
> +    struct argo_ring_id id;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    id.port = port;
> +    id.domain_id = d->domain_id;
> +    id.partner_id = partner_id;
> +
> +    ring_info = ring_find_info(d, &id);
> +    if ( ring_info )
> +        return ring_info;
> +
> +    id.partner_id = XEN_ARGO_DOMID_ANY;
> +
> +    return ring_find_info(d, &id);
> +}
> +
>  static struct argo_send_info *
>  send_find_info(const struct domain *d, const struct argo_ring_id *id)
>  {
> @@ -1005,6 +1526,95 @@ register_ring(struct domain *currd,
>      return ret;
>  }
>
> +static long
> +sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
> +      const xen_argo_addr_t *dst_addr,
> +      XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd, unsigned long niov,
> +      uint32_t message_type)
> +{
> +    struct domain *dst_d = NULL;
> +    struct argo_ring_id src_id;
> +    struct argo_ring_info *ring_info;
> +    int ret = 0;
> +    unsigned long len = 0;
> +
> +    ASSERT(src_d->domain_id == src_addr->domain_id);
> +
> +    argo_dprintk("sendv: (%d:%x)->(%d:%x) niov:%lu iov:%p type:%u\n",
> +                 src_addr->domain_id, src_addr->port,
> +                 dst_addr->domain_id, dst_addr->port,
> +                 niov, iovs_hnd.p, message_type);
> +
> +    read_lock(&argo_lock);
> +
> +    if ( !src_d->argo )
> +    {
> +        ret = -ENODEV;
> +        goto out_unlock;
> +    }
> +
> +    src_id.port = src_addr->port;
> +    src_id.domain_id = src_d->domain_id;
> +    src_id.partner_id = dst_addr->domain_id;
> +
> +    dst_d = get_domain_by_id(dst_addr->domain_id);
> +    if ( !dst_d )
> +    {
> +        argo_dprintk("!dst_d, ESRCH\n");
> +        ret = -ESRCH;
> +        goto out_unlock;
> +    }
> +
> +    if ( !dst_d->argo )
> +    {
> +        argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +        ret = -ECONNREFUSED;
> +        goto out_unlock;

The usage of out_unlock here and in the condition above is wrong,
since it will unconditionally call read_unlock(&argo_lock); which is
wrong here because the lock has not yet been acquired.

> +    }
> +
> +    read_lock(&dst_d->argo->lock);
> +
> +    ring_info = ring_find_info_by_match(dst_d, dst_addr->port,
> +                                        src_addr->domain_id);
> +    if ( !ring_info )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u connection refused, src (vm%u:%x) dst (vm%u:%x)\n",
> +                current->domain->domain_id, src_id.domain_id, src_id.port,
> +                dst_addr->domain_id, dst_addr->port);
> +
> +        ret = -ECONNREFUSED;
> +        goto out_unlock2;
> +    }
> +
> +    spin_lock(&ring_info->lock);
> +
> +    ret = ringbuf_insert(dst_d, ring_info, &src_id, iovs_hnd, niov,
> +                         message_type, &len);
> +    if ( ret == -EAGAIN )
> +    {
> +        argo_dprintk("argo_ringbuf_sendv failed, EAGAIN\n");
> +        /* requeue to issue a notification when space is there */
> +        ret = pending_requeue(ring_info, src_addr->domain_id, len);
> +    }
> +
> +    spin_unlock(&ring_info->lock);
> +
> +    if ( ret >= 0 )
> +        signal_domain(dst_d);
> +
> + out_unlock2:

There's only a single user of the out_unlock2 label, at which point it
might be easier to read to just put the read_unlock there and just use
the existing out_unlock label.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 18:13           ` Julien Grall
@ 2019-01-09 20:33             ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-09 20:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, Jan Beulich,
	xen-devel, Eric Chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 10:14 AM Julien Grall <julien.grall@gmail.com> wrote:
> On Wed, 9 Jan 2019, 12:18 Stefano Stabellini, <sstabellini@kernel.org> wrote:
>> On Wed, 9 Jan 2019, Julien Grall wrote:
>> > Hi,
>> > Sorry for the formatting. Sending it from my phone.
>> >
>> > On Wed, 9 Jan 2019, 11:03 Christopher Clark, <christopher.w.clark@gmail.com> wrote:
>> >       On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
>> >       >
>> >       > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
>> >       > > The register op is used by a domain to register a region of memory for
>> >       > > receiving messages from either a specified other domain, or, if specifying a
>> >       > > wildcard, any domain.
>> >       > >
>> >       > > xen_argo_page_descr_t type is introduced as a page descriptor, to convey
>> >       > > both the physical address of the start of the page and its granularity. The
>> >       > > smallest granularity page is assumed to be 4096 bytes and the lower twelve
>> >       > > bits of the type are used to indicate the size of page of memory supplied.
>> >       > > The implementation of the hypercall op currently only supports 4K pages.
>> >       >
>> >       > What is the resolution for the Arm issues mentioned by Julien? I read
>> >       > the conversation in previous thread. A solution seemed to have been
>> >       > agreed upon, but the changelog doesn't say anything about it.
>> >
>> >       I made the interface changes that Julien had asked for. The register
>> >       op now takes arguments that can describe the granularitity of the
>> >       pages supplied, though only support for 4K pages is accepted in the
>> >       current implementation. I believe it meets Julien's requirements.
>> >
>> > I still don't think allowing 4K or 64K is the right solution to go. You are adding unnecessary burden in the hypervisor and would
>> > prevent optimization i the hypervisor and unwanted side effect.
>> >
>> > For instance a 64K hypervisor will always map 64K even when the guest is passing 4K. You also can't map everything contiguously
>> > in Xen (if you ever wanted to).
>> >
>> > We need to stick on a single chunk size. That could be different between Arm and x86. For Arm it would need to be 64KB.
>>
>> I don't think we should force 64K as the only granularity on ARM. It
>> causes unnecessary overhead and confusion on 4K-only deployments that
>> are almost all our use-cases today.
>
> Why a confusion? People should read the documentation when writing a driver...
>
>> One option is to make the granularity configurable at the guest side,
>> like Christopher did, letting the guest specifying the granularity. The
>> hypervisor could return -ENOSYS if the specified granularity is not
>> supported.
>>
>> The other option is having the hypervisor export the granularity it
>> supports for this interface: Xen would say "I support only 4K".
>> Tomorrow, it could change and Xen could say "I support only 64K". Then,
>> it would be up to the guest passing a page of the right granularity to
>> the hypervisor. I think this is probably the best option, but it would
>> require the addition of one hypercall to retrieve the supported
>> granularity from Xen.
>
> I would recommend to read my initial answers on the first series to understand why allowing 4K is an issue.
>
> AFAIK virtio and UEFI has restrictions to allow a guest running agnostically of the hypervisor page-granularity. An example is to mandate 64K chunk or 64KB aligned address.
>
> With your suggestion you are going to break many use-cases if the hypervisor is moving to 64KB. At worst it could introduce security issues. At best preventing optimization in the hypervisor or prevent guest running (bad for backward compatibility).
>
> Actually, this is not going to help moving towards 64K in Argo because you would still have to modify the kernel. So this does not meet my requirements.
>
> I don't think requiring 64K chunk is going to be a major issue nor a concern. Unless you expect small ring... Christoffer, what is the expected size?

The current implementation of the Linux driver has a default ring size
of 128K and I would expect that to be a common case. I'm not familiar
with any current use cases where smaller than 64K would be likely, but
I can imagine it could potentially be useful to have the option to do
so in a memory-constrained embedded environment running a service that
handles large numbers of short-lived connections, so running frequent
setup and teardown of many rings, with only small amounts of data
exchanged on each. That's not my current use case though, so it is
speculating a bit.

> Another solution was to require contiguous guest physical memory. That would solve quite a few problem on Arm. But Christoffer had some convincing point to not implement this.
>
> As I said before, I know this is not going to be the only place with that issue. I merely wanted to start tackling the problem. However, IHMO, this interface is not more suitable than what we have currently. So this raise the question on whether we should just use the usual Xen interface if 64K is not an option...

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-09 18:28           ` Julien Grall
@ 2019-01-09 20:38             ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-09 20:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, Jan Beulich,
	xen-devel, Eric Chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 10:28 AM Julien Grall <julien.grall@gmail.com> wrote:
>
>
>
> On Wed, 9 Jan 2019, 12:54 Wei Liu, <wei.liu2@citrix.com> wrote:
>>
>> On Wed, Jan 09, 2019 at 12:02:34PM -0500, Julien Grall wrote:
>> > Hi,
>> >
>> > Sorry for the formatting. Sending it from my phone.
>> >
>> > On Wed, 9 Jan 2019, 11:03 Christopher Clark, <christopher.w.clark@gmail.com>
>> > wrote:
>> >
>> > > On Wed, Jan 9, 2019 at 7:56 AM Wei Liu <wei.liu2@citrix.com> wrote:
>> > > >
>> > > > On Sun, Jan 06, 2019 at 11:42:40PM -0800, Christopher Clark wrote:
>> > > > > The register op is used by a domain to register a region of memory for
>> > > > > receiving messages from either a specified other domain, or, if
>> > > specifying a
>> > > > > wildcard, any domain.
>> > > > >
>> > > > > This operation creates a mapping within Xen's private address space
>> > > that
>> > > > > will remain resident for the lifetime of the ring. In subsequent
>> > > commits,
>> > > > > the hypervisor will use this mapping to copy data from a sending
>> > > domain into
>> > > > > this registered ring, making it accessible to the domain that
>> > > registered the
>> > > > > ring to receive data.
>> > > > >
>> > > > > Wildcard any-sender rings are default disabled and registration will be
>> > > > > refused with EPERM unless they have been specifically enabled with the
>> > > > > argo-mac boot option introduced here. The reason why the default for
>> > > > > wildcard rings is 'deny' is that there is currently no means to
>> > > protect the
>> > > > > ring from DoS by a noisy domain spamming the ring, affecting other
>> > > domains
>> > > > > ability to send to it. This will be addressed with XSM policy controls
>> > > in
>> > > > > subsequent work.
>> > > > >
>> > > > > Since denying access to any-sender rings is a significant functional
>> > > > > constraint, a new bootparam is provided to enable overriding this:
>> > > > >  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
>> > > > > Even though this is a boolean variable, use these descriptive strings
>> > > in
>> > > > > order to make it obvious to an administrator that this has potential
>> > > > > security impact.
>> > > > >
>> > > > > The p2m type of the memory supplied by the guest for the ring must be
>> > > > > p2m_ram_rw and the memory will be pinned as PGT_writable_page while
>> > > the ring
>> > > > > is registered.
>> > > > >
>> > > > > xen_argo_page_descr_t type is introduced as a page descriptor, to
>> > > convey
>> > > > > both the physical address of the start of the page and its
>> > > granularity. The
>> > > > > smallest granularity page is assumed to be 4096 bytes and the lower
>> > > twelve
>> > > > > bits of the type are used to indicate the size of page of memory
>> > > supplied.
>> > > > > The implementation of the hypercall op currently only supports 4K
>> > > pages.
>> > > > >
>> > > >
>> > > > What is the resolution for the Arm issues mentioned by Julien? I read
>> > > > the conversation in previous thread. A solution seemed to have been
>> > > > agreed upon, but the changelog doesn't say anything about it.
>> > >
>> > > I made the interface changes that Julien had asked for. The register
>> > > op now takes arguments that can describe the granularitity of the
>> > > pages supplied, though only support for 4K pages is accepted in the
>> > > current implementation. I believe it meets Julien's requirements.
>> >
>> >
>> > I still don't think allowing 4K or 64K is the right solution to go. You are
>> > adding unnecessary burden in the hypervisor and would prevent optimization
>> > i the hypervisor and unwanted side effect.
>> >
>> > For instance a 64K hypervisor will always map 64K even when the guest is
>> > passing 4K. You also can't map everything contiguously in Xen (if you ever
>> > wanted to).
>> >
>> > We need to stick on a single chunk size. That could be different between
>> > Arm and x86. For Arm it would need to be 64KB.
>>
>> Doesn't enforcing 64KB granularity has its own limitation as well?
>> According to my understanding of arm (and this could be wrong), you
>> would need to have the guest allocate (via memory exchange perhaps) 64KB
>> machine contiguous memory even when the hypervisor doesn't need it to be
>> 64KB (when hypervisor is running on 4KB granularity).
>
>
> The 64K is just about the interface with the guest.
> The hypervisor could just split the 64K in 16 4K chunk. No need for memory exchange here.
>
>>
>> I think having a method to return granularity to guest, like Stefano
>> suggested, is more sensible. Hypervisor will then reject registration
>> request which doesn't conform to the requirement.
>
>
> The problem is not that simple... For instance, 64K is required to support 52-bits PA yet you may still want to run your current Debian on that platform.
>
> You can do that nicely on KVM but on Xen it is a pain due to the current interface. If you use 4K you may end up to expose too much to the other side.
>
> The only viable solution here is a full re-design of the ABI for Arm. We can do that step by step or at one go.
>
> The discussion here was to start solving it on Argo so that's one less step to do. Christoffer kindly try to tackle it. Sadly, I don't think the interface suggested is going to work.
>
> But I don't want Argo to miss 4.12 for that. So maybe the solution is to stick with the usal Xen interface.

Thanks for the consideration. With that understanding, I'll put the
frame number -based interface back into place for a new revision of
the series, aiming for 4.12.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-09 14:15       ` Jason Andryuk
@ 2019-01-09 23:24         ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-09 23:24 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 6:16 AM Jason Andryuk <jandryuk@gmail.com> wrote:
> On Wed, Jan 9, 2019 at 1:48 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
> > On Tue, Jan 8, 2019 at 2:54 PM Jason Andryuk <jandryuk@gmail.com> wrote:
> > > On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>
> > > > +
> > > > +/* A space-available notification that is awaiting sufficient space */
> > > > +struct pending_ent
> > > > +{
> > > > +    /*
> > > > +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> > > > +     * ring_info->npending is decremented when ents for wildcard rings are
> > > > +     * cancelled for domain destroy.
> > > > +     * Caution: Must hold the correct locks before accessing ring_info via this.
> > >
> > > It would be clearer if this stated the correct locks.
> >
> > ok - it would mean duplicating the statement about which locks are
> > needed though, since it is explained elsewhere in the file, which means
> > it will need updating in two places if the locking requirements change.
> > That was why I worded it that way, as an indicator to go and find where
> > it is already described, to avoid that.
>
> "Caution" made me think *ring_info points from domain A's pending_ent
> to domain B's ring_info.  Reading patch 10 (notify op) I see that it
> really just points back to domain A's ring_info.  So the "Caution" is
> just that you still have to lock ring_info (L3) even though you can
> get to the pointer via L2.  Is that correct?

Yes, exactly.

> I agree a single location for the locking documentation is better than
> splitting or duplicating.  As long as no cross-domain locking is
> required, this is fine.

OK - thanks.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-09 18:05   ` Jason Andryuk
@ 2019-01-10  2:08     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-10  2:08 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 10:05 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
>
> <snip>
>
> > @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> >      smp_wmb();
> >  }
> >
> > +static int
> > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > +                     uint32_t len)
> > +{
> > +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> > +    void *dst;
> > +    int ret;
> > +    unsigned int src_offset = 0;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    offset &= ~PAGE_MASK;
> > +
> > +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> > +        return -EFAULT;
> > +
> > +    while ( (offset + len) > PAGE_SIZE )
> > +    {
> > +        unsigned int head_len = PAGE_SIZE - offset;
>
> I think this while loop could be re-written as
> while (len) {
>     head_len = len > PAGE_SIZE ? PAGE_SIZE - offset: len;
>
> and then the extra copying below outside the loop could be dropped.
>
> The first loop does a partial copy at offset and then sets offset=0.
> The next N loops copy exactly PAGE_SIZE.
> The Final copy does the remaining len bytes.

That looks right to me and makes a nice simplification to that
function -- thanks.

> <snip>
>
> > +
> > +/*
> > + * iov_count returns its count on success via an out variable to avoid
> > + * potential for a negative return value to be used incorrectly
> > + * (eg. coerced into an unsigned variable resulting in a large incorrect value)
> > + */
> > +static int
> > +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> > +{
> > +    uint32_t sum_iov_lens = 0;
> > +
> > +    if ( niov > XEN_ARGO_MAXIOV )
> > +        return -EINVAL;
> > +
> > +    while ( niov-- )
> > +    {
> > +        /* valid iovs must have the padding field set to zero */
> > +        if ( piov->pad )
> > +        {
> > +            argo_dprintk("invalid iov: padding is not zero\n");
> > +            return -EINVAL;
> > +        }
> > +
> > +        /* check each to protect sum against integer overflow */
> > +        if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE )
>
> Should this be MAX_ARGO_MESSAGE_SIZE?  MAX_ARGO_MESSAGE_SIZE is less
> than XEN_ARGO_MAX_RING_SIZE, so we can pass this check and then just
> fail the one below.

ack - I'll switch it to MAX_ARGO_MESSAGE_SIZE.

> <snip>
>
> > @@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >          break;
> >      }
> >
> > +    case XEN_ARGO_OP_sendv:
> > +    {
> > +        xen_argo_send_addr_t send_addr;
> > +
> > +        XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd =
> > +            guest_handle_cast(arg1, xen_argo_send_addr_t);
> > +        XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd =
> > +            guest_handle_cast(arg2, xen_argo_iov_t);
> > +        /* arg3 is niov */
> > +        /* arg4 is message_type. Must be a 32-bit value. */
> > +
> > +        rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0;
> > +        if ( rc )
> > +            break;
> > +
> > +        if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY )
> > +            send_addr.src.domain_id = currd->domain_id;
> > +
> > +        /* No domain is currently authorized to send on behalf of another */
> > +        if ( unlikely(send_addr.src.domain_id != currd->domain_id) )
> > +        {
> > +            rc = -EPERM;
> > +            break;
> > +        }
> > +
> > +        /* Reject niov or message_type values that are outside 32 bit range. */
> > +        if ( unlikely((arg3 > XEN_ARGO_MAXIOV) || (arg4 & ~0xffffffffUL)) )
> > +        {
> > +            rc = -EINVAL;
> > +            break;
> > +        }
>
> This needs to check send_addr.src.pad and send_addr.dst.pad == 0.
> sendv() does not check the padding either.

ack - will fix.

thanks

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-09 18:57   ` Roger Pau Monné
@ 2019-01-10  3:09     ` Christopher Clark
  2019-01-10 12:01       ` Roger Pau Monné
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-10  3:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Thanks for the review, Roger. Replies inline below.

On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné <royger@freebsd.org> wrote:
>
>  to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > sendv operation is invoked to perform a synchronous send of buffers
> > contained in iovs to a remote domain's registered ring.
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 59ce8c4..4548435 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c

> >
> > +static int
> > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > +                     uint32_t len)
> > +{
> > +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> > +    void *dst;
> > +    int ret;
> > +    unsigned int src_offset = 0;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    offset &= ~PAGE_MASK;
> > +
> > +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> > +        return -EFAULT;
> > +
> > +    while ( (offset + len) > PAGE_SIZE )
>
> I think you could map the whole ring in contiguous virtual address
> space and then writing to it would be much more easy, you wouldn't
> need to iterate with memcpy or copy_from_guest, take a look at __vmap.
> You could likely map this when the ring gets setup and keep it mapped
> for the lifetime of the ring.

You're right about that, because map_domain_page_global, which the
current code uses, uses vmap itself. I think there's a couple of
reasons why the code has been implemented the iterative way though.

The first is that I think ring resize has been a consideration: it's
useful to be able to increase the size of a live and active ring that
is under load without having to tear down the mappings, find a new
virtual address region of the right size and then remap it: you can
just supply some more memory and map those pages onto the end of the
ring, and ensure both sides know about the new ring size. Similarly,
shrinking a quiet ring can be useful.
However, the "gfn race" that you (correctly) pointed out in an earlier
review, and Jan's related request to drop the "revalidate an existing
mapping on ring reregister" motivated removal of a section of the code
involved, and then in v3 of the series, I've actually just blocked
ring resize because it's missing a walk through the pending
notifications to find any that have become untriggerable with the new
ring size when a ring is shrunk and I'd like to defer implementing
that for now. So the ring resize reason is more of a consideration for
a possible later version of Argo than the current one.

The second reason is about avoiding exposing the Xen virtual memory
allocator directly to frequent guest-supplied size requests for
contiguous regions (of up to 16GB). With single-page allocations to
build a ring, fragmentation is not a problem, and mischief by a guest
seems difficult. Changing it to issue requests for contiguous regions,
with variable ring sizes up to the maximum of 16GB, it seems like
significant fragmentation may be achievable. I don't know the
practical impact of that but it seems worth avoiding. Are the other
users of __vmap (or vmap) for multi-gigabyte regions only either
boot-time, infrequent operations (livepatch), or for actions by
privileged (ie. somewhat trusted) domains (ioremap), or is it already
a frequent operation somewhere else?

Given the context above, and Jason's simplification to the
memcpy_to_guest_ring function, plus the imminent merge freeze
deadline, and the understanding that this loop and the related data
structures supporting it have been tested and are working, would it be
acceptable to omit making this contiguous mapping change from this
current series?

>
> > +    {
> > +        unsigned int head_len = PAGE_SIZE - offset;
> > +
> > +        ret = ring_map_page(ring_info, mfns_index, &dst);
> > +        if ( ret )
> > +            return ret;
> > +
> > +        if ( src )
> > +        {
> > +            memcpy(dst + offset, src + src_offset, head_len);
> > +            src_offset += head_len;
> > +        }
> > +        else
> > +        {
> > +            ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
> > +                    -EFAULT : 0;
> > +            if ( ret )
> > +                return ret;
>
> You can simplify this to:
>
> if ( copy_from_guest(...) )
>     return -EFAULT;

yes! ack - thanks

<snip>
> > +/*
> > + * get_sanitized_ring creates a modified copy of the ring pointers where
> > + * the rx_ptr is rounded up to ensure it is aligned, and then ring
> > + * wrap is handled. Simplifies safe use of the rx_ptr for available
> > + * space calculation.
> > + */
> > +static int
> > +get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
> > +{
> > +    uint32_t rx_ptr;
> > +    int ret;
> > +
> > +    ret = get_rx_ptr(ring_info, &rx_ptr);
> > +    if ( ret )
> > +        return ret;
> > +
> > +    ring->tx_ptr = ring_info->tx_ptr;
> > +
> > +    rx_ptr = ROUNDUP_MESSAGE(rx_ptr);
> > +    if ( rx_ptr >= ring_info->len )
> > +        rx_ptr = 0;
> > +
> > +    ring->rx_ptr = rx_ptr;
>
> Newline.

ack, thanks

<snip>
> > +/*
> > + * iov_count returns its count on success via an out variable to avoid
> > + * potential for a negative return value to be used incorrectly
> > + * (eg. coerced into an unsigned variable resulting in a large incorrect value)
> > + */
> > +static int
> > +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> > +{
> > +    uint32_t sum_iov_lens = 0;
> > +
> > +    if ( niov > XEN_ARGO_MAXIOV )
> > +        return -EINVAL;
> > +
> > +    while ( niov-- )
>
> I would use a for loop here, that would remove the need to piov++, if
> you want to keep it quite similar:
>
> for ( ; niov--; piov++ )
> {

Yes, that is better - thanks, applied.

<snip>
> > +
> > +static int
> > +ringbuf_insert(struct domain *d, struct argo_ring_info *ring_info,
> > +               const struct argo_ring_id *src_id,
> > +               XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd,
> > +               unsigned long niov, uint32_t message_type,
> > +               unsigned long *out_len)
> > +{
> > +    xen_argo_ring_t ring;
> > +    struct xen_argo_ring_message_header mh = { 0 };
>
> No need for the 0, { } will achieve exactly the same.

ack, applied

>
> > +    int32_t sp;
> > +    int32_t ret;
> > +    uint32_t len = 0;
> > +    xen_argo_iov_t iovs[XEN_ARGO_MAXIOV];
>
> This seems slightly dangerous, a change of the maximum could cause
> stack overflow depending on the size of xen_argo_iov_t. I think you
> need some comment next to definition of XEN_ARGO_MAXIOV to note that
> increasing this could cause issues.

That makes sense, will do.

<snip>
> > +    /*
> > +     * First data write into the destination ring: fixed size, message header.
> > +     * This cannot overrun because the available free space (value in 'sp')
> > +     * is checked above and must be at least this size.
> > +     */
> > +    ret = memcpy_to_guest_ring(ring_info, ring.tx_ptr + sizeof(xen_argo_ring_t),
> > +                               &mh, NULL_hnd, sizeof(mh));
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: failed to write message header to ring (vm%u:%x vm%d)\n",
> > +                ring_info->id.domain_id, ring_info->id.port,
> > +                ring_info->id.partner_id);
> > +
> > +        goto out;
> > +    }
> > +
> > +    ring.tx_ptr += sizeof(mh);
> > +    if ( ring.tx_ptr == ring_info->len )
> > +        ring.tx_ptr = 0;
> > +
> > +    piov = iovs;
> > +
> > +    while ( niov-- )
>
> AFAICT using a for loop would remove the need to also do a piov++ at
> each iteration.

ack, applied.

<snip>
> > +         * Case 2: ring-tail-wrap-write above was not performed
> > +         *    -> so iov_len is the guest-supplied value and: (iov_len <= sp)
> > +         *    ie. less than available space at the tail of the ring:
> > +         *        so this write cannot overrun.
> > +         */
> > +        ret = memcpy_to_guest_ring(ring_info,
> > +                                   ring.tx_ptr + sizeof(xen_argo_ring_t),
> > +                                   NULL, buf_hnd, iov_len);
> > +        if ( ret )
> > +        {
> > +            gprintk(XENLOG_ERR,
> > +                    "argo: failed to copy [%p, %"PRIx32"] (vm%u:%x vm%d)\n",
> > +                    buf_hnd.p, iov_len, ring_info->id.domain_id,
> > +                    ring_info->id.port, ring_info->id.partner_id);
> > +
> > +            goto out;
> > +        }
> > +
> > +        ring.tx_ptr += iov_len;
> > +
> > +        if ( ring.tx_ptr == ring_info->len )
> > +            ring.tx_ptr = 0;
> > +
> > +        piov++;
> > +    }
> > +
> > +    ring.tx_ptr = ROUNDUP_MESSAGE(ring.tx_ptr);
> > +
> > +    if ( ring.tx_ptr >= ring_info->len )
> > +        ring.tx_ptr -= ring_info->len;
> > +
> > +    update_tx_ptr(ring_info, ring.tx_ptr);
> > +
> > + out:
>
> Do you really need to out label? *out_len it's only set in the success
> case, so all the error cases that use a 'goto out' could be replaced
> by 'return ret;'.

ack, thanks -- done.

<snip>
> > +static int
> > +pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
> > +              unsigned int len)
> > +{
> > +    struct pending_ent *ent;
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    if ( ring_info->npending >= MAX_PENDING_PER_RING )
> > +        return -ENOSPC;
> > +
> > +    ent = xmalloc(struct pending_ent);
> > +
>
> Extra newline.

ack

<snip>
> >
> > +static long
> > +sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
> > +      const xen_argo_addr_t *dst_addr,
> > +      XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd, unsigned long niov,
> > +      uint32_t message_type)
> > +{
> > +    struct domain *dst_d = NULL;
> > +    struct argo_ring_id src_id;
> > +    struct argo_ring_info *ring_info;
> > +    int ret = 0;
> > +    unsigned long len = 0;
> > +
> > +    ASSERT(src_d->domain_id == src_addr->domain_id);
> > +
> > +    argo_dprintk("sendv: (%d:%x)->(%d:%x) niov:%lu iov:%p type:%u\n",
> > +                 src_addr->domain_id, src_addr->port,
> > +                 dst_addr->domain_id, dst_addr->port,
> > +                 niov, iovs_hnd.p, message_type);
> > +
> > +    read_lock(&argo_lock);
> > +
> > +    if ( !src_d->argo )
> > +    {
> > +        ret = -ENODEV;
> > +        goto out_unlock;
> > +    }
> > +
> > +    src_id.port = src_addr->port;
> > +    src_id.domain_id = src_d->domain_id;
> > +    src_id.partner_id = dst_addr->domain_id;
> > +
> > +    dst_d = get_domain_by_id(dst_addr->domain_id);
> > +    if ( !dst_d )
> > +    {
> > +        argo_dprintk("!dst_d, ESRCH\n");
> > +        ret = -ESRCH;
> > +        goto out_unlock;
> > +    }
> > +
> > +    if ( !dst_d->argo )
> > +    {
> > +        argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > +        ret = -ECONNREFUSED;
> > +        goto out_unlock;
>
> The usage of out_unlock here and in the condition above is wrong,
> since it will unconditionally call read_unlock(&argo_lock); which is
> wrong here because the lock has not yet been acquired.

Sorry, I don't think that's quite right -- if you scroll up a bit
here, you can see where argo_lock is taken unconditionally, just after
the dprintk and before checking whether src_d is argo enabled. The
second lock hasn't been taken yet - but that's not the one being
unlocked on that out_unlock path.

>
> > +    }
> > +
> > +    read_lock(&dst_d->argo->lock);
> > +
> > +    ring_info = ring_find_info_by_match(dst_d, dst_addr->port,
> > +                                        src_addr->domain_id);
> > +    if ( !ring_info )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u connection refused, src (vm%u:%x) dst (vm%u:%x)\n",
> > +                current->domain->domain_id, src_id.domain_id, src_id.port,
> > +                dst_addr->domain_id, dst_addr->port);
> > +
> > +        ret = -ECONNREFUSED;
> > +        goto out_unlock2;
> > +    }
> > +
> > +    spin_lock(&ring_info->lock);
> > +
> > +    ret = ringbuf_insert(dst_d, ring_info, &src_id, iovs_hnd, niov,
> > +                         message_type, &len);
> > +    if ( ret == -EAGAIN )
> > +    {
> > +        argo_dprintk("argo_ringbuf_sendv failed, EAGAIN\n");
> > +        /* requeue to issue a notification when space is there */
> > +        ret = pending_requeue(ring_info, src_addr->domain_id, len);
> > +    }
> > +
> > +    spin_unlock(&ring_info->lock);
> > +
> > +    if ( ret >= 0 )
> > +        signal_domain(dst_d);
> > +
> > + out_unlock2:
>
> There's only a single user of the out_unlock2 label, at which point it
> might be easier to read to just put the read_unlock there and just use
> the existing out_unlock label.

ack, will change that.

Thanks again,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging
  2019-01-07  7:42 ` [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
  2019-01-08 15:50   ` Jan Beulich
@ 2019-01-10  9:28   ` Roger Pau Monné
  1 sibling, 0 replies; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10  9:28 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Mon, Jan 7, 2019 at 8:43 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> A convenience for working on development of the argo subsystem:
> setting a #define variable enables additional debug messages.
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v2 #03 feedback, Jan: fix ifdef/define confusion error
> v1 #04 feedback, Jan: fix dprintk implementation
>
>  xen/common/argo.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index d69ad7c..6f782f7 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -19,6 +19,15 @@
>  #include <xen/errno.h>
>  #include <xen/guest_access.h>
>
> +/* Change this to #define ARGO_DEBUG here to enable more debug messages */
> +#undef ARGO_DEBUG
> +
> +#ifdef ARGO_DEBUG
> +#define argo_dprintk(format, args...) printk("argo: " format, ## args )

I would maybe consider prefixing this with XENLOG_DEBUG, but since
it's a hidden compile-time debug I'm not sure it's relevant to set the
log level.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
  2019-01-08 22:08   ` Ross Philipson
  2019-01-08 22:54   ` Jason Andryuk
@ 2019-01-10 10:19   ` Roger Pau Monné
  2019-01-10 11:52     ` Jan Beulich
  2019-01-11  6:03     ` Christopher Clark
  2019-01-10 16:16   ` Eric Chanudet
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 10:19 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

 aOn Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> Initialises basic data structures and performs teardown of argo state
> for domain shutdown.
>
> Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
>
> Introduces a new Xen command line parameter 'argo': bool to enable/disable
> the argo hypercall. Defaults to disabled.
>
> New headers:
>   public/argo.h: with definions of addresses and ring structure, including
>   indexes for atomic update for communication between domain and hypervisor.
>
>   xen/argo.h: to expose the hooks for integration into domain lifecycle:
>     argo_init: per-domain init of argo data structures for domain_create.
>     argo_destroy: teardown for domain_destroy and the error exit
>                   path of domain_create.
>     argo_soft_reset: reset of domain state for domain_soft_reset.
>
> Adds two new fields to struct domain:
>     rwlock_t argo_lock;
>     struct argo_domain *argo;
>
> In accordance with recent work on _domain_destroy, argo_destroy is
> idempotent. It will tear down: all rings registered by this domain, all
> rings where this domain is the single sender (ie. specified partner,
> non-wildcard rings), and all pending notifications where this domain is
> awaiting signal about available space in the rings of other domains.
>
> A count will be maintained of the number of rings that a domain has
> registered in order to limit it below the fixed maximum limit defined here.
>
> The software license on the public header is the BSD license, standard
> procedure for the public Xen headers. The public header was originally
> posted under a GPL license at: [1]:
> https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
>
> The following ACK by Lars Kurth is to confirm that only people being
> employees of Citrix contributed to the header files in the series posted at
> [1] and that thus the copyright of the files in question is fully owned by
> Citrix. The ACK also confirms that Citrix is happy for the header files to
> be published under a BSD license in this series (which is based on [1]).
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> Acked-by: Lars Kurth <lars.kurth@citrix.com>
> ---
> v2 rewrite locking explanation comment
> v2 header copyright line now includes 2019
> v2 self: use ring_info backpointer in pending_ent to maintain npending
> v2 self: rename all_rings_remove_info to domain_rings_remove_all
> v2 feedback Jan: drop cookie, implement teardown
> v2 self: add npending to track number of pending entries per ring
> v2 self: amend comment on locking; drop section comments
> v2 cookie_eq: test low bits first and use likely on high bits
> v2 self: OVERHAUL
> v2 self: s/argo_pending_ent/pending_ent/g
> v2 self: drop pending_remove_ent, inline at single call site
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v2 #4 Lars: add Acked-by and details to commit message.
> v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> v2 bugfix: xsm use in soft-reset prior to introduction
> v2 feedback #9 Jan: drop 'message' from do_argo_message_op
> v1 #5 feedback Paul: init/destroy unsigned, brackets and whitespace fixes
> v1 #5 feedback Paul: Use mfn_eq for comparing mfns.
> v1 #5 feedback Paul: init/destroy : use currd
> v1 #6 (#5) feedback Jan: init/destroy: s/ENOSYS/EOPNOTSUPP/
> v1 #6 feedback Paul: Folded patch 6 into patch 5.
> v1 #6 feedback Jan: drop opt_argo_enabled initializer
> v1 $6 feedback Jan: s/ENOSYS/EOPNOTSUPP/g and drop useless dprintk
> v1. #5 feedback Paul: change the license on public header to BSD
> - ack from Lars at Citrix.
> v1. self, Jan: drop unnecessary xen include from sched.h
> v1. self, Jan: drop inclusion of public argo.h in private one
> v1. self, Jan: add include of public argo.h to argo.c
> v1. self, Jan: drop fwd decl of argo_domain in priv header
> v1. Paul/self/Jan: add data structures to xlat.lst and compat/argo.h to Makefile
> v1. self: removed allocation of event channel since switching to VIRQ
> v1. self: drop types.h include from private argo.h
> v1: reorder public argo include position
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: self: rename pending ent "id" to "domain_id"
> v1: self: add domain_cookie to ent struct
> v1. #15 feedback Jan: make cmd unsigned
> v1. #15 feedback Jan: make i loop variable unsigned
> v1: self: adjust dprintks in init, destroy
> v1: #18 feedback Jan: meld max ring count limit
> v1: self: use type not struct in public defn, affects compat gen header
> v1: feedback #15 Jan: handle upper-halves of hypercall args
> v1: add comment explaining the 'magic' field
> v1: self + Jan feedback: implement soft reset
> v1: feedback #13 Roger: use ASSERT_UNREACHABLE
>
>  docs/misc/xen-command-line.pandoc |  11 +
>  xen/common/argo.c                 | 461 +++++++++++++++++++++++++++++++++++++-
>  xen/common/domain.c               |  20 ++
>  xen/include/Makefile              |   1 +
>  xen/include/public/argo.h         |  59 +++++
>  xen/include/xen/argo.h            |  23 ++
>  xen/include/xen/sched.h           |   6 +
>  xen/include/xlat.lst              |   2 +
>  8 files changed, 582 insertions(+), 1 deletion(-)
>  create mode 100644 xen/include/public/argo.h
>  create mode 100644 xen/include/xen/argo.h
>
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index a755a67..aea13eb 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>  in combination with cpuidle.  This option is only expected to be useful for
>  developers wishing Xen to fall back to older timing methods on newer hardware.
>
> +### argo
> +> `= <boolean>`
> +
> +> Default: `false`
> +
> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> +
> +This allows domains access to the Argo hypercall, which supports registration
> +of memory rings with the hypervisor to receive messages, sending messages to
> +other domains by hypercall and querying the ring status of other domains.
> +
>  ### asid (x86)
>  > `= <boolean>`
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 6f782f7..86195d3 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -17,7 +17,177 @@
>   */
>
>  #include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/argo.h>
> +#include <xen/event.h>
> +#include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/time.h>
> +#include <public/argo.h>

We usually try to sort header includes alphabetically, and I would add
a newline between the xen/* and the public/* header includes.

> +
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled;
> +boolean_param("argo", opt_argo_enabled);

I would drop the opt_* prefix, new options added recently don't
include the prefix already.

> +
> +typedef struct argo_ring_id
> +{
> +    uint32_t port;
> +    domid_t partner_id;
> +    domid_t domain_id;
> +} argo_ring_id;
> +
> +/* Data about a domain's own ring that it has registered */
> +struct argo_ring_info
> +{
> +    /* next node in the hash, protected by L2 */
> +    struct hlist_node node;
> +    /* this ring's id, protected by L2 */
> +    struct argo_ring_id id;
> +    /* L3 */
> +    spinlock_t lock;
> +    /* length of the ring, protected by L3 */
> +    uint32_t len;
> +    /* number of pages in the ring, protected by L3 */
> +    uint32_t npage;

Can you infer number of pages form the length of the ring, or the
other way around?

I'm not sure why both need to be stored here.

> +    /* number of pages translated into mfns, protected by L3 */
> +    uint32_t nmfns;
> +    /* cached tx pointer location, protected by L3 */
> +    uint32_t tx_ptr;

All this fields are not part of any public structure, so I wonder if
it would be better to simply use unsigned int for those, or size_t.

> +    /* mapped ring pages protected by L3 */
> +    uint8_t **mfn_mapping;

Why 'uint8_t *', wouldn't it be better to just use 'void *' if it's a mapping?

> +    /* list of mfns of guest ring, protected by L3 */
> +    mfn_t *mfns;
> +    /* list of struct pending_ent for this ring, protected by L3 */
> +    struct hlist_head pending;
> +    /* number of pending entries queued for this ring, protected by L3 */
> +    uint32_t npending;
> +};
> +
> +/* Data about a single-sender ring, held by the sender (partner) domain */
> +struct argo_send_info
> +{
> +    /* next node in the hash, protected by Lsend */
> +    struct hlist_node node;
> +    /* this ring's id, protected by Lsend */
> +    struct argo_ring_id id;
> +};
> +
> +/* A space-available notification that is awaiting sufficient space */
> +struct pending_ent
> +{
> +    /* List node within argo_ring_info's pending list */
> +    struct hlist_node node;
> +    /*
> +     * List node within argo_domain's wildcard_pend_list. Only used if the
> +     * ring is one with a wildcard partner (ie. that any domain may send to)
> +     * to enable cancelling signals on wildcard rings on domain destroy.
> +     */
> +    struct hlist_node wildcard_node;
> +    /*
> +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> +     * ring_info->npending is decremented when ents for wildcard rings are
> +     * cancelled for domain destroy.
> +     * Caution: Must hold the correct locks before accessing ring_info via this.
> +     */
> +    struct argo_ring_info *ring_info;
> +    /* domain to be notified when space is available */
> +    domid_t domain_id;
> +    uint16_t pad;

No need for the pad in internal structures.

> +    /* minimum ring space available that this signal is waiting upon */
> +    uint32_t len;
> +};
> +
> +/*
> + * The value of the argo element in a struct domain is
> + * protected by the global lock argo_lock: L1
> + */
> +#define ARGO_HTABLE_SIZE 32
> +struct argo_domain
> +{
> +    /* L2 */
> +    rwlock_t lock;
> +    /*
> +     * Hash table of argo_ring_info about rings this domain has registered.
> +     * Protected by L2.
> +     */
> +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> +    /* Counter of rings registered by this domain. Protected by L2. */
> +    uint32_t ring_count;
> +
> +    /* Lsend */
> +    spinlock_t send_lock;
> +    /*
> +     * Hash table of argo_send_info about rings other domains have registered
> +     * for this domain to send to. Single partner, non-wildcard rings.
> +     * Protected by Lsend.
> +     */
> +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> +
> +    /* Lwildcard */
> +    spinlock_t wildcard_lock;
> +    /*
> +     * List of pending space-available signals for this domain about wildcard
> +     * rings registered by other domains. Protected by Lwildcard.
> +     */
> +    struct hlist_head wildcard_pend_list;
> +};
> +
> +/*
> + * Locking is organized as follows:
> + *
> + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> + *              W(<lock>) means taking a write lock on it.
> + *
> + * L1 : The global lock: argo_lock
> + * Protects the argo elements of all struct domain *d in the system.
> + * It does not protect any of the elements of d->argo, only their
> + * addresses.
> + *
> + * By extension since the destruction of a domain with a non-NULL
> + * d->argo will need to free the d->argo pointer, holding W(L1)
> + * guarantees that no domains pointers that argo is interested in
> + * become invalid whilst this lock is held.
> + */
> +
> +static DEFINE_RWLOCK(argo_lock); /* L1 */

You also add an argo_lock to each domain struct which doesn't seem to
be mentioned here at all. Shouldn't that lock be the one that protects
d->argo? (instead of this global lock?)

> +
> +/*
> + * L2 : The per-domain ring hash lock: d->argo->lock
> + * Holding a read lock on L2 protects the ring hash table and
> + * the elements in the hash_table d->argo->ring_hash, and
> + * the node and id fields in struct argo_ring_info in the
> + * hash table.
> + * Holding a write lock on L2 protects all of the elements of
> + * struct argo_ring_info.
> + *
> + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> + *
> + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> + * Protects all the fields within the argo_ring_info, aside from the ones that
> + * L2 already protects: node, id, lock.
> + *
> + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> + *
> + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> + * Protects the per-domain send hash table : d->argo->send_hash
> + * and the elements in the hash table, and the node and id fields
> + * in struct argo_send_info in the hash table.
> + *
> + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> + * Do not attempt to acquire a L2 on any domain after taking and while
> + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> + *
> + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> + * Protects the per-domain list of outstanding signals for space availability
> + * on wildcard rings.
> + *
> + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> + * No other locks are acquired after obtaining Lwildcard.
> + */

IMO I think the locking is overly complicated, and there's no
reasoning why so many locks are needed. Wouldn't it be enough to start
with a single lock that protects the whole d->argo existence and
contents?

I would start with a very simple (as simple as possible) locking
structure and go improving from there if there are performance
bottlenecks.

>  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
>  #undef ARGO_DEBUG
> @@ -28,10 +198,299 @@
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>
> +static void
> +ring_unmap(struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    if ( !ring_info->mfn_mapping )
> +        return;
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +    {
> +        if ( !ring_info->mfn_mapping[i] )
> +            continue;
> +        if ( ring_info->mfns )
> +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
> +                         mfn_x(ring_info->mfns[i]),
> +                         ring_info->mfn_mapping[i]);
> +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> +        ring_info->mfn_mapping[i] = NULL;
> +    }

As noted in another patch, I would consider mapping this in contiguous
virtual address space using vmap, but I'm not going to insist.

> +}
> +
> +static void
> +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> +{
> +    struct domain *d = get_domain_by_id(domain_id);

Newline.

> +    if ( !d )
> +        return;
> +
> +    if ( d->argo )

Don't you need to pick d->argo_lock here to prevent d->argo from being
removed under your feet?

> +    {
> +        spin_lock(&d->argo->wildcard_lock);
> +        hlist_del(&ent->wildcard_node);
> +        spin_unlock(&d->argo->wildcard_lock);
> +    }
> +    put_domain(d);
> +}
> +
> +static void
> +pending_remove_all(struct argo_ring_info *ring_info)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)

As a side note, it might be interesting to introduce a helper like
hlist_first_entry_or_null, that would remove the need to have an extra
*next element, and would be the more natural way to drain a hlist
(seeing that you have the same pattern in
wildcard_rings_pending_remove).

> +    {
> +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +            wildcard_pending_list_remove(ent->domain_id, ent);
> +        hlist_del(&ent->node);
> +        xfree(ent);
> +    }
> +    ring_info->npending = 0;
> +}
> +
> +static void
> +wildcard_rings_pending_remove(struct domain *d)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    ASSERT(rw_is_write_locked(&argo_lock));
> +
> +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> +                              node)
> +    {
> +        hlist_del(&ent->node);
> +        ent->ring_info->npending--;
> +        hlist_del(&ent->wildcard_node);
> +        xfree(ent);
> +    }
> +}
> +
> +static void
> +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));

I think the above requires a comment of why two different locks are
used to protect the ring mfns, and why just having one of them locked
is enough.

> +
> +    if ( !ring_info->mfns )
> +        return;
> +
> +    if ( !ring_info->mfn_mapping )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    ring_unmap(ring_info);
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> +
> +    xfree(ring_info->mfns);
> +    ring_info->mfns = NULL;

Xen has a handly macro for this, XFREE. That would free the memory and
assign the pointer to NULL.

> +    ring_info->npage = 0;
> +    xfree(ring_info->mfn_mapping);
> +    ring_info->mfn_mapping = NULL;
> +    ring_info->nmfns = 0;
> +}
> +
> +static void
> +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)

I think the domain parameter can be constifyed here, since it's only
used by ring_remove_mfnsnd that function already expects a const
domain struct.

> +{
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));
> +
> +    pending_remove_all(ring_info);
> +    hlist_del(&ring_info->node);
> +    ring_remove_mfns(d, ring_info);
> +    xfree(ring_info);
> +}
> +
> +static void
> +domain_rings_remove_all(struct domain *d)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_ring_info *ring_info;
> +
> +        hlist_for_each_entry_safe(ring_info, node, next,
> +                                  &d->argo->ring_hash[i], node)
> +            ring_remove_info(d, ring_info);
> +    }
> +    d->argo->ring_count = 0;
> +}
> +
> +/*
> + * Tear down all rings of other domains where src_d domain is the partner.
> + * (ie. it is the single domain that can send to those rings.)
> + * This will also cancel any pending notifications about those rings.
> + */
> +static void
> +partner_rings_remove(struct domain *src_d)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&argo_lock));
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_send_info *send_info;
> +
> +        hlist_for_each_entry_safe(send_info, node, next,
> +                                  &src_d->argo->send_hash[i], node)
> +        {
> +            struct argo_ring_info *ring_info;
> +            struct domain *dst_d;
> +
> +            dst_d = get_domain_by_id(send_info->id.domain_id);
> +            if ( dst_d )
> +            {
> +                ring_info = ring_find_info(dst_d, &send_info->id);
> +                if ( ring_info )
> +                {
> +                    ring_remove_info(dst_d, ring_info);
> +                    dst_d->argo->ring_count--;
> +                }
> +
> +                put_domain(dst_d);
> +            }
> +
> +            hlist_del(&send_info->node);
> +            xfree(send_info);
> +        }
> +    }
> +}
> +
>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>             unsigned long arg4)
>  {
> -    return -ENOSYS;
> +    struct domain *currd = current->domain;
> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> +
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        return rc;

Why not just 'return -EOPNOTSUPP;'?

> +    }
> +
> +    domain_lock(currd);
> +
> +    switch (cmd)
> +    {
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    domain_unlock(currd);
> +
> +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> +
> +    return rc;
> +}
> +
> +static void
> +argo_domain_init(struct argo_domain *argo)
> +{
> +    unsigned int i;
> +
> +    rwlock_init(&argo->lock);
> +    spin_lock_init(&argo->send_lock);
> +    spin_lock_init(&argo->wildcard_lock);
> +    argo->ring_count = 0;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> +    }
> +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> +}
> +
> +int
> +argo_init(struct domain *d)
> +{
> +    struct argo_domain *argo;
> +
> +    if ( !opt_argo_enabled )
> +    {
> +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> +        return 0;
> +    }
> +
> +    argo_dprintk("init: domid: %d\n", d->domain_id);
> +
> +    argo = xmalloc(struct argo_domain);
> +    if ( !argo )
> +        return -ENOMEM;
> +
> +    write_lock(&argo_lock);
> +
> +    argo_domain_init(argo);
> +
> +    d->argo = argo;

Where's the d->argo_lock initialization?

> +
> +    write_unlock(&argo_lock);
> +
> +    return 0;
> +}
> +
> +void
> +argo_destroy(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +        xfree(d->argo);
> +        d->argo = NULL;
> +    }
> +    write_unlock(&argo_lock);
> +}
> +
> +void
> +argo_soft_reset(struct domain *d)
> +{
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +
> +        if ( !opt_argo_enabled )
> +        {
> +            xfree(d->argo);
> +            d->argo = NULL;

Can opt_argo_enabled change during runtime?

> +        }
> +        else
> +            argo_domain_init(d->argo);
> +    }
> +
> +    write_unlock(&argo_lock);
>  }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index c623dae..9596840 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -32,6 +32,7 @@
>  #include <xen/grant_table.h>
>  #include <xen/xenoprof.h>
>  #include <xen/irq.h>
> +#include <xen/argo.h>
>  #include <asm/debugger.h>
>  #include <asm/p2m.h>
>  #include <asm/processor.h>
> @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
>
>      xfree(d->pbuf);
>
> +#ifdef CONFIG_ARGO
> +    argo_destroy(d);
> +#endif

Instead of adding such ifdefs you could provide dummy argo_destroy
inline functions in argo.h when CONFIG_ARGO is not set.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
  2019-01-09 15:55   ` Wei Liu
@ 2019-01-10 11:24   ` Roger Pau Monné
  2019-01-10 11:57     ` Jan Beulich
  2019-01-11  6:29     ` Christopher Clark
  2019-01-10 20:11   ` Eric Chanudet
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 11:24 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

 On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> The register op is used by a domain to register a region of memory for
> receiving messages from either a specified other domain, or, if specifying a
> wildcard, any domain.
>
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits,
> the hypervisor will use this mapping to copy data from a sending domain into
> this registered ring, making it accessible to the domain that registered the
> ring to receive data.
>
> Wildcard any-sender rings are default disabled and registration will be
> refused with EPERM unless they have been specifically enabled with the
> argo-mac boot option introduced here. The reason why the default for
> wildcard rings is 'deny' is that there is currently no means to protect the
> ring from DoS by a noisy domain spamming the ring, affecting other domains
> ability to send to it. This will be addressed with XSM policy controls in
> subsequent work.
>
> Since denying access to any-sender rings is a significant functional
> constraint, a new bootparam is provided to enable overriding this:
>  "argo-mac" variable has allowed values: 'permissive' and 'enforcing'.
> Even though this is a boolean variable, use these descriptive strings in
> order to make it obvious to an administrator that this has potential
> security impact.
>
> The p2m type of the memory supplied by the guest for the ring must be
> p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
> is registered.
>
> xen_argo_page_descr_t type is introduced as a page descriptor, to convey
> both the physical address of the start of the page and its granularity. The
> smallest granularity page is assumed to be 4096 bytes and the lower twelve
> bits of the type are used to indicate the size of page of memory supplied.
> The implementation of the hypercall op currently only supports 4K pages.
>
> array_index_nospec is used to guard the result of the ring id hash function.
> This is out of an abundance of caution, since this is a very basic hash
> function and it operates upon values supplied by the guest just before
> being used as an array index.
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
> v2 self: disallow ring resize via reregister
> v2 feedback Jan: drop cookie, implement teardown
> v2 feedback Jan: drop message from argo_message_op
> v2 self: move hash_index function below locking comment
> v2 self: OVERHAUL
> v2 self/Jan: remove use of magic verification field and tidy up
> v2 self: merge max and min ring size check clauses
> v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
> v2 feedback #9, Jan: use the argo-mac bootparam at point of introduction
> v2 feedback #9, Jan: rename boot opt variable to comply with convention
> v2 feedback #9, Jan: rename the argo_mac bootparam to argo-mac
> v2 feedback #9 Jan: document argo boot opt in xen-command-line.markdown
> v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v1 feedback Roger: s/pfn/gfn/ and retire always-64-bit type
> v2. feedback Jan: document the argo-mac boot opt
> v2. feedback Jan: simplify re-register, drop mappings
> v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops
>
> v1 #13 feedback, Jan: register op : s/ECONNREFUSED/ESRCH/
> v1 #5 (#13) feedback Paul: register op: use currd in do_message_op
> v1 #13 feedback, Paul: register op: use mfn_eq comparator
> v1 #5 (#13) feedback Paul: register op: use currd in argo_register_ring
> v1 #13 feedback Paul: register op: whitespace, unsigned, bounds check
> v1 #13 feedback Paul: use of hex in limit constant definition
> v1 #13 feedback Paul, register op: set nmfns on loop termination
> v1 #13 feedback Paul: register op: do/while -> gotos, reindent
> v1 argo_ring_map_page: drop uint32_t for unsigned int
> v1. #13 feedback Julien: use page descriptors instead of gpfns.
>    - adds ABI support for pages with different granularity.
> v1 feedback #13, Paul: adjust log level of message
> v1 feedback #13, Paul: use gprintk for guest-triggered warning
> v1 feedback #13, Paul: gprintk and XENLOG_DEBUG for ring registration
> v1 feedback #13, Paul: use gprintk for errs in argo_ring_map_page
> v1 feedback #13, Paul: use ENOMEM if global mapping fails
> v1 feedback Paul: overflow check before shift
> v1: add define for copy_field_to_guest_errno
> v1: fix gprintk use for ARM as its defn dislikes split format strings
> v1: use copy_field_to_guest_errno
> v1 feedback #13, Jan: argo_hash_fn: no inline, rename, change type
> v1 feedback #13, Paul, Jan: EFAULT -> ENOMEM in argo_ring_map_page
> v1 feedback #13, Jan: rename page var in argo_ring_map_page
> v1 feedback #13, Jan: switch uint8_t* to void* and drop cast
> v1 feedback #13, Jan: switch memory barrier to smp_wmb
> v1 feedback #13, Jan: make 'ring' comment comply with single-line style
> v1 feedback #13, Jan: use xzalloc_array, drop loop NULL init
> v1 feedback #13, Jan: init bool with false rather than 0
> v1 feedback #13 Jan: use __copy; define and use __copy_field_to_guest_errno
> v1 feedback #13, Jan: use xzalloc, drop individual init zeroes
> v1 feedback #13, Jan: prefix public namespace with xen
> v1 feedback #13, Jan: blank line after op case in do_argo_message_op
> v1 self: reflow comment in argo_ring_map_page to within 80 char len
> v1 feedback #13, Roger: use true not 1 in assign to update_tx_ptr bool
> v1 feedback #21, Jan: fold in the array_index_nospec hash function guards
> v1 feedback #18, Jan: fold the max ring count limit into the series
> v1 self: use unsigned long type for XEN_ARGO_REGISTER_FLAG_MASK
> v1: feedback #15 Jan: handle upper-halves of hypercall args
> v1. feedback #13 Jan: add comment re: page alignment
> v1. self: confirm ring magic presence in supplied page array
> v1. feedback #13 Jan: add comment re: minimum ring size
> v1. feedback #13 Roger: use ASSERT_UNREACHABLE
> v1. feedback Roger: add comment to hash function
>
>  docs/misc/xen-command-line.pandoc  |  15 +
>  xen/common/argo.c                  | 566 +++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h          |  72 +++++
>  xen/include/xlat.lst               |   1 +
>  6 files changed, 658 insertions(+)
>
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index aea13eb..68d4415 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which supports registration
>  of memory rings with the hypervisor to receive messages, sending messages to
>  other domains by hypercall and querying the ring status of other domains.
>
> +### argo-mac
> +> `= permissive | enforcing`

Why not call this argo-mac-permissive and make it a boolean? Default
would be 'false' and that would imply enforcing. This would get rid of
parse_opt_argo_mac since you could use the default boolean parser.

> +
> +> Default: `enforcing`
> +
> +Constrain the access control applied to the Argo communication mechanism.
> +
> +When `enforcing`, domains may not register rings that have wildcard specified
> +for the sender which would allow messages to be sent to the ring by any domain.
> +This is to protect rings and the services that utilize them against DoS by a
> +malicious or buggy domain spamming the ring.
> +
> +When the boot option is set to `permissive`, this constraint is relaxed and
> +wildcard any-sender rings are allowed to be registered.
> +
>  ### asid (x86)
>  > `= <boolean>`
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 86195d3..11988e7 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -23,16 +23,41 @@
>  #include <xen/event.h>
>  #include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/lib.h>
> +#include <xen/nospec.h>
>  #include <xen/time.h>
>  #include <public/argo.h>
>
> +#define MAX_RINGS_PER_DOMAIN            128U
> +
> +/* All messages on the ring are padded to a multiple of the slot size. */
> +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
> +
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
>
>  /* Xen command line option to enable argo */
>  static bool __read_mostly opt_argo_enabled;
>  boolean_param("argo", opt_argo_enabled);
>
> +/* Xen command line option for conservative or relaxed access control */
> +bool __read_mostly opt_argo_mac_enforcing = true;
> +
> +static int __init parse_opt_argo_mac(const char *s)
> +{
> +    if ( !strcmp(s, "enforcing") )
> +        opt_argo_mac_enforcing = true;
> +    else if ( !strcmp(s, "permissive") )
> +        opt_argo_mac_enforcing = false;
> +    else
> +        return -EINVAL;
> +
> +    return 0;
> +}
> +custom_param("argo-mac", parse_opt_argo_mac);
> +
>  typedef struct argo_ring_id
>  {
>      uint32_t port;
> @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>
> +/*
> + * This hash function is used to distribute rings within the per-domain
> + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> + * will provide a struct if a match is found with a 'argo_ring_id' key:
> + * ie. the key is a (domain id, port, partner domain id) tuple.
> + * Since port number varies the most in expected use, and the Linux driver
> + * allocates at both the high and low ends, incorporate high and low bits to
> + * help with distribution.
> + * Apply array_index_nospec as a defensive measure since this operates
> + * on user-supplied input and the array size that it indexes into is known.
> + */
> +static unsigned int
> +hash_index(const struct argo_ring_id *id)
> +{
> +    unsigned int hash;
> +
> +    hash = (uint16_t)(id->port >> 16);
> +    hash ^= (uint16_t)id->port;
> +    hash ^= id->domain_id;
> +    hash ^= id->partner_id;
> +    hash &= (ARGO_HTABLE_SIZE - 1);
> +
> +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
> +}
> +
>  static void
>  ring_unmap(struct argo_ring_info *ring_info)
>  {
> @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
>      }
>  }
>
> +static int
> +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void **out_ptr)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        gprintk(XENLOG_ERR,
> +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of %u\n",
> +                ring_info->id.domain_id, ring_info->id.port,
> +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> +        return -ENOMEM;
> +    }
> +
> +    if ( !ring_info->mfns || !ring_info->mfn_mapping)
> +    {
> +        ASSERT_UNREACHABLE();
> +        ring_info->len = 0;
> +        return -ENOMEM;
> +    }
> +
> +    if ( !ring_info->mfn_mapping[i] )
> +    {
> +        /*
> +         * TODO:
> +         * The first page of the ring contains the ring indices, so both read
> +         * and write access to the page is required by the hypervisor, but
> +         * read-access is not needed for this mapping for the remainder of the
> +         * ring.
> +         * Since this mapping will remain resident in Xen's address space for
> +         * the lifetime of the ring, and following the principle of least
> +         * privilege, it could be preferable to:
> +         *  # add a XSM check to determine what policy is wanted here
> +         *  # depending on the XSM query, optionally create this mapping as
> +         *    _write-only_ on platforms that can support it.
> +         *    (eg. Intel EPT/AMD NPT).

Why do Intel EPT or AMD NPT matter here?

You are mapping the page to Xen address space, which doesn't use
either EPT or NPT. Writable or read-only mappings would be created by
setting the right bit in the Xen page tables.

> +         */
> +        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]);
> +

No need for the newline.

> +        if ( !ring_info->mfn_mapping[i] )
> +        {
> +            gprintk(XENLOG_ERR,
> +                "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of %u\n",
> +                    ring_info->id.domain_id, ring_info->id.port,
> +                    ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> +            return -ENOMEM;
> +        }
> +        argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +                     mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +    }
> +
> +    if ( out_ptr )
> +        *out_ptr = ring_info->mfn_mapping[i];
> +
> +    return 0;
> +}
> +
> +static void
> +update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +    void *dst;
> +    uint32_t *p;
> +
> +    ASSERT(ring_info->mfn_mapping[0]);
> +
> +    ring_info->tx_ptr = tx_ptr;
> +
> +    dst = ring_info->mfn_mapping[0];
> +    p = dst + offsetof(xen_argo_ring_t, tx_ptr);

Hm, wouldn't it be easier to cast page 0 to the layout of the ring so
that you don't need to use pointer arithmetic to get the fields? Ie:
make dst be of type xen_argo_ring_t.

> +
> +    write_atomic(p, tx_ptr);
> +    smp_wmb();
> +}
> +
>  static void
>  wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
>  {
> @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
>      }
>  }
>
> +static int
> +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
> +{
> +    p2m_type_t p2mt;
> +    int ret = 0;
> +
> +#ifdef CONFIG_X86
> +    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> +#else
> +    *mfn = p2m_lookup(d, gfn, &p2mt);
> +#endif
> +
> +    if ( !mfn_valid(*mfn) )
> +        ret = -EINVAL;
> +#ifdef CONFIG_X86
> +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> +        ret = -EAGAIN;
> +#endif
> +    else if ( (p2mt != p2m_ram_rw) ||
> +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> +        ret = -EINVAL;
> +
> +#ifdef CONFIG_X86
> +    put_gfn(d, gfn_x(gfn));
> +#endif
> +
> +    return ret;
> +}
> +
> +static int
> +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +               uint32_t npage,
> +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +               uint32_t len)
> +{
> +    unsigned int i;
> +    int ret = 0;
> +    mfn_t *mfns;
> +    uint8_t **mfn_mapping;
> +
> +    /*
> +     * first bounds check on npage here also serves as an overflow check
> +     * before left shifting it
> +     */
> +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
> +         ((npage << PAGE_SHIFT) < len) )
> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /* Ring already existed: drop the previous mapping. */
> +        gprintk(XENLOG_INFO,
> +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears mapping\n",
> +                d->domain_id, ring_info->id.domain_id,
> +                ring_info->id.port, ring_info->id.partner_id);
> +
> +        ring_remove_mfns(d, ring_info);
> +        ASSERT(!ring_info->mfns);
> +    }
> +
> +    mfns = xmalloc_array(mfn_t, npage);
> +    if ( !mfns )
> +        return -ENOMEM;
> +
> +    for ( i = 0; i < npage; i++ )
> +        mfns[i] = INVALID_MFN;
> +
> +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> +    if ( !mfn_mapping )
> +    {
> +        xfree(mfns);
> +        return -ENOMEM;
> +    }
> +
> +    ring_info->npage = npage;
> +    ring_info->mfns = mfns;
> +    ring_info->mfn_mapping = mfn_mapping;
> +
> +    ASSERT(ring_info->npage == npage);
> +
> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;
> +
> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )

This loop seems to assume that there can be pages already added to the
ring, but IIRC you said that redimensioning of the ring was removed in
this version?

I think for an initial version it would be easier to don't allow
redimensioning of active rings, and just allow teardown and
re-initialization as the way to redimension a ring.

> +    {
> +        xen_argo_page_descr_t pg_descr;
> +        gfn_t gfn;
> +        mfn_t mfn;
> +
> +        ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ?
> +                -EFAULT : 0;
> +        if ( ret )
> +            break;
> +
> +        /* Implementation currently only supports handling 4K pages */
> +        if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) !=
> +                XEN_ARGO_PAGE_DESCR_SIZE_4K )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +        gfn = _gfn(pg_descr >> PAGE_SHIFT);
> +
> +        ret = find_ring_mfn(d, gfn, &mfn);
> +        if ( ret )
> +        {
> +            gprintk(XENLOG_ERR,
> +               "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p %d/%d\n",
> +                    d->domain_id, gfn_x(gfn), ring_info->id.domain_id,
> +                    ring_info->id.port, ring_info->id.partner_id,
> +                    ring_info, i, ring_info->npage);
> +            break;
> +        }
> +
> +        ring_info->mfns[i] = mfn;
> +
> +        argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n",
> +                     i, gfn_x(gfn), mfn_x(ring_info->mfns[i]));
> +    }
> +
> +    ring_info->nmfns = i;
> +
> +    if ( ret )
> +        ring_remove_mfns(d, ring_info);
> +    else
> +    {
> +        ASSERT(ring_info->nmfns == ring_info->npage);
> +
> +        gprintk(XENLOG_DEBUG,
> +        "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns %d\n",
> +                d->domain_id, ring_info->id.domain_id,
> +                ring_info->id.port, ring_info->id.partner_id, ring_info,
> +                ring_info->mfn_mapping, ring_info->npage, ring_info->nmfns);
> +    }
> +
> +    return ret;
> +}
> +
> +static struct argo_ring_info *
> +ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> +{
> +    unsigned int ring_hash_index;
> +    struct hlist_node *node;
> +    struct argo_ring_info *ring_info;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    ring_hash_index = hash_index(id);
> +
> +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
> +                 d->argo, ring_hash_index,
> +                 d->argo->ring_hash[ring_hash_index].first, id);
> +    argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n",
> +                 id->port, id->domain_id, id->partner_id);
> +
> +    hlist_for_each_entry(ring_info, node, &d->argo->ring_hash[ring_hash_index],
> +                         node)
> +    {
> +        struct argo_ring_id *cmpid = &ring_info->id;

const?

> +
> +        if ( cmpid->port == id->port &&
> +             cmpid->domain_id == id->domain_id &&
> +             cmpid->partner_id == id->partner_id )
> +        {
> +            argo_dprintk("ring_info=%p\n", ring_info);
> +            return ring_info;
> +        }
> +    }
> +    argo_dprintk("no ring_info found\n");
> +
> +    return NULL;
> +}
> +
> +static long
> +register_ring(struct domain *currd,

If this is indeed the current domain (as the name suggests), why do
you need to pass it around? Or else just name the parameter d.

> +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +              uint32_t npage, bool fail_exist)
> +{
> +    xen_argo_register_ring_t reg;
> +    struct argo_ring_id ring_id;
> +    void *map_ringp;
> +    xen_argo_ring_t *ringp;
> +    struct argo_ring_info *ring_info;
> +    struct argo_send_info *send_info = NULL;
> +    struct domain *dst_d = NULL;
> +    int ret = 0;
> +    uint32_t private_tx_ptr;
> +
> +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> +    {
> +        ret = -EFAULT;
> +        goto out;

I don't see the point of using an out label, why not just use 'return
-EFAULT;' (here and below). This avoids the braces and also removes
the need for the ret assignment.

> +    }
> +
> +    /*
> +     * A ring must be large enough to transmit messages, so requires space for:
> +     * * 1 message header, plus
> +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> +     *   for the message payload to be written into, plus
> +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> +     *   single message -- see the logic in ringbuf_insert -- allowing for this
> +     *   ensures that there can be space remaining when a message is present.
> +     * The above determines the minimum acceptable ring size.
> +     */
> +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
> +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> +         (reg.pad != 0) )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ring_id.partner_id = reg.partner_id;
> +    ring_id.port = reg.port;
> +    ring_id.domain_id = currd->domain_id;
> +
> +    read_lock(&argo_lock);
> +
> +    if ( !currd->argo )
> +    {
> +        ret = -ENODEV;
> +        goto out_unlock;
> +    }
> +
> +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> +    {
> +        if ( opt_argo_mac_enforcing )
> +        {
> +            ret = -EPERM;
> +            goto out_unlock;
> +        }
> +    }
> +    else
> +    {
> +        dst_d = get_domain_by_id(reg.partner_id);
> +        if ( !dst_d )
> +        {
> +            argo_dprintk("!dst_d, ESRCH\n");
> +            ret = -ESRCH;
> +            goto out_unlock;
> +        }
> +
> +        if ( !dst_d->argo )
> +        {
> +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +            ret = -ECONNREFUSED;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +
> +        send_info = xzalloc(struct argo_send_info);
> +        if ( !send_info )
> +        {
> +            ret = -ENOMEM;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +        send_info->id = ring_id;
> +    }
> +
> +    write_lock(&currd->argo->lock);
> +
> +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> +    {
> +        ret = -ENOSPC;
> +        goto out_unlock2;
> +    }
> +
> +    ring_info = ring_find_info(currd, &ring_id);
> +    if ( !ring_info )
> +    {
> +        ring_info = xzalloc(struct argo_ring_info);
> +        if ( !ring_info )
> +        {
> +            ret = -ENOMEM;
> +            goto out_unlock2;
> +        }
> +
> +        spin_lock_init(&ring_info->lock);
> +
> +        ring_info->id = ring_id;
> +        INIT_HLIST_HEAD(&ring_info->pending);
> +
> +        hlist_add_head(&ring_info->node,
> +                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> +
> +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +    }
> +    else
> +    {
> +        if ( ring_info->len )
> +        {
> +            /*
> +             * If the caller specified that the ring must not already exist,
> +             * fail at attempt to add a completed ring which already exists.
> +             */
> +            if ( fail_exist )
> +            {
> +                argo_dprintk("disallowed reregistration of existing ring\n");
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            if ( ring_info->len != reg.len )
> +            {
> +                /*
> +                 * Change of ring size could result in entries on the pending
> +                 * notifications list that will never trigger.
> +                 * Simple blunt solution: disallow ring resize for now.
> +                 * TODO: investigate enabling ring resize.
> +                 */

I think ring resizing was removed on this version?

> +                gprintk(XENLOG_ERR,
> +                    "argo: vm%u attempted to change ring size(vm%u:%x vm%d)\n",
> +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> +                        ring_id.partner_id);
> +                /*
> +                 * Could return EINVAL here, but if the ring didn't already
> +                 * exist then the arguments would have been valid, so: EEXIST.
> +                 */
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            gprintk(XENLOG_DEBUG,
> +                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
> +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> +                    ring_id.partner_id);
> +        }
> +    }
> +
> +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +
> +    /*
> +     * The first page of the memory supplied for the ring has the xen_argo_ring
> +     * structure at its head, which is where the ring indexes reside.
> +     */
> +    ret = ring_map_page(ring_info, 0, &map_ringp);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +    ringp = map_ringp;
> +
> +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> +
> +    if ( (private_tx_ptr >= reg.len) ||
> +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> +    {
> +        /*
> +         * Since the ring is a mess, attempt to flush the contents of it
> +         * here by setting the tx_ptr to the next aligned message slot past
> +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> +         */
> +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> +
> +        if ( private_tx_ptr >= reg.len )
> +            private_tx_ptr = 0;
> +
> +        update_tx_ptr(ring_info, private_tx_ptr);
> +    }
> +
> +    ring_info->tx_ptr = private_tx_ptr;
> +    ring_info->len = reg.len;
> +    currd->argo->ring_count++;
> +
> +    if ( send_info )
> +    {
> +        spin_lock(&dst_d->argo->send_lock);
> +
> +        hlist_add_head(&send_info->node,
> +                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> +
> +        spin_unlock(&dst_d->argo->send_lock);
> +    }
> +
> + out_unlock2:
> +    if ( !ret && send_info )
> +        xfree(send_info);

There's no need to check if send_info is set, xfree(NULL) is safe.

> +
> +    if ( dst_d )
> +        put_domain(dst_d);
> +
> +    write_unlock(&currd->argo->lock);
> +
> + out_unlock:
> +    read_unlock(&argo_lock);
> +
> + out:
> +    return ret;
> +}
> +
>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> @@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>
>      switch (cmd)
>      {
> +    case XEN_ARGO_OP_register_ring:
> +    {
> +        XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd =
> +            guest_handle_cast(arg1, xen_argo_register_ring_t);
> +        XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd =
> +            guest_handle_cast(arg2, xen_argo_page_descr_t);
> +        /* arg3 is npage */
> +        /* arg4 is flags */
> +        bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST;
> +
> +        if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +        /*
> +         * Check access to the whole array here so we can use the faster __copy
> +         * operations to read each element later.
> +         */
> +        if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) )
> +            break;
> +        /* arg4: reserve currently-undefined bits, require zero.  */
> +        if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist);
> +        break;
> +    }
> +
>      default:
>          rc = -EOPNOTSUPP;
>          break;
> diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> index 8997a1c..70e9a78 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
> index ca700c9..8dde5d5 100644
> --- a/xen/include/asm-x86/guest_access.h
> +++ b/xen/include/asm-x86/guest_access.h
> @@ -41,6 +41,8 @@
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> +
>  /* Offset the given guest handle into the array it refers to. */
>  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
>  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 4818684..8947230 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -31,6 +31,26 @@
>
>  #include "xen.h"
>
> +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB

Is such a big size really required as the default maximum? The size of
the internal structures required to support a 16GB ring would be quite
big, has this been taken into account?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 08/15] argo: implement the unregister op
  2019-01-07  7:42 ` [PATCH v3 08/15] argo: implement the unregister op Christopher Clark
@ 2019-01-10 11:40   ` Roger Pau Monné
  2019-01-15  8:05     ` Christopher Clark
  2019-01-14 15:06   ` Jan Beulich
  1 sibling, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 11:40 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> Takes a single argument: a handle to the ring unregistration struct,
> which specifies the port and partner domain id or wildcard.
>
> The ring's entry is removed from the hashtable of registered rings;
> any entries for pending notifications are removed; and the ring is
> unmapped from Xen's address space.
>
> If the ring had been registered to communicate with a single specified
> domain (ie. a non-wildcard ring) then the partner domain state is removed
> from the partner domain's argo send_info hash table.
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
> v2 feedback Jan: drop cookie, implement teardown
> v2 feedback Jan: drop message from argo_message_op
> v2 self: OVERHAUL
> v2 self: reorder logic to shorten critical section
> v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> v1 #5 (#14) feedback Paul: use currd in do_argo_message_op
> v1 #5 (#14) feedback Paul: full use currd in argo_unregister_ring
> v1 #13 (#14) feedback Paul: replace do/while with goto; reindent
> v1 self: add blank lines in unregister case in do_argo_message_op
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: #13 feedback Jan: blank line after op case in do_argo_message_op
> v1: #14 feedback Jan: replace domain id override with validation
> v1: #18 feedback Jan: meld the ring count limit into the series
> v1: feedback #15 Jan: verify zero in unused hypercall args
>
>  xen/common/argo.c         | 115 ++++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/argo.h |  19 ++++++++
>  xen/include/xlat.lst      |   1 +
>  3 files changed, 135 insertions(+)
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 11988e7..59ce8c4 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -37,6 +37,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
>
>  /* Xen command line option to enable argo */
>  static bool __read_mostly opt_argo_enabled;
> @@ -666,6 +667,105 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
>      return NULL;
>  }
>
> +static struct argo_send_info *
> +send_find_info(const struct domain *d, const struct argo_ring_id *id)
> +{
> +    struct hlist_node *node;
> +    struct argo_send_info *send_info;
> +
> +    hlist_for_each_entry(send_info, node, &d->argo->send_hash[hash_index(id)],
> +                         node)
> +    {
> +        struct argo_ring_id *cmpid = &send_info->id;

Const.

> +
> +        if ( cmpid->port == id->port &&
> +             cmpid->domain_id == id->domain_id &&
> +             cmpid->partner_id == id->partner_id )
> +        {
> +            argo_dprintk("send_info=%p\n", send_info);
> +            return send_info;
> +        }
> +    }
> +    argo_dprintk("no send_info found\n");

Is this message actually helpful without printing any of the
parameters provided to the function?

> +
> +    return NULL;
> +}
> +
> +static long
> +unregister_ring(struct domain *currd,

Same as the comment made on the other patch, if this parameter is the
current domain there's no need to pass it around, or else it should be
named d instead of currd.

> +                XEN_GUEST_HANDLE_PARAM(xen_argo_unregister_ring_t) unreg_hnd)
> +{
> +    xen_argo_unregister_ring_t unreg;
> +    struct argo_ring_id ring_id;
> +    struct argo_ring_info *ring_info;
> +    struct argo_send_info *send_info;
> +    struct domain *dst_d = NULL;
> +    int ret;
> +
> +    ret = copy_from_guest(&unreg, unreg_hnd, 1) ? -EFAULT : 0;
> +    if ( ret )
> +        goto out;
> +
> +    ret = unreg.pad ? -EINVAL : 0;
> +    if ( ret )
> +        goto out;

I don't see the point in the out label when you could just use 'return
-EINVAL' or -EFAULT directly here and above.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-10 10:19   ` Roger Pau Monné
@ 2019-01-10 11:52     ` Jan Beulich
  2019-01-10 12:26       ` Roger Pau Monné
  2019-01-11  6:03     ` Christopher Clark
  1 sibling, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-10 11:52 UTC (permalink / raw)
  To: royger, Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

 >>> On 10.01.19 at 11:19, <royger@freebsd.org> wrote:
> aOn Mon, Jan 7, 2019 at 8:44 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>>
>> +/* Xen command line option to enable argo */
>> +static bool __read_mostly opt_argo_enabled;
>> +boolean_param("argo", opt_argo_enabled);
> 
> I would drop the opt_* prefix, new options added recently don't
> include the prefix already.

Would you mind pointing out examples? Especially for boolean ones I
think we've tried to consistently name them opt_*. But in the case
here (it being static) I'm not overly fussed.

>> +
>> +typedef struct argo_ring_id
>> +{
>> +    uint32_t port;
>> +    domid_t partner_id;
>> +    domid_t domain_id;
>> +} argo_ring_id;
>> +
>> +/* Data about a domain's own ring that it has registered */
>> +struct argo_ring_info
>> +{
>> +    /* next node in the hash, protected by L2 */
>> +    struct hlist_node node;
>> +    /* this ring's id, protected by L2 */
>> +    struct argo_ring_id id;
>> +    /* L3 */
>> +    spinlock_t lock;
>> +    /* length of the ring, protected by L3 */
>> +    uint32_t len;
>> +    /* number of pages in the ring, protected by L3 */
>> +    uint32_t npage;
> 
> Can you infer number of pages form the length of the ring, or the
> other way around?
> 
> I'm not sure why both need to be stored here.
> 
>> +    /* number of pages translated into mfns, protected by L3 */
>> +    uint32_t nmfns;
>> +    /* cached tx pointer location, protected by L3 */
>> +    uint32_t tx_ptr;
> 
> All this fields are not part of any public structure, so I wonder if
> it would be better to simply use unsigned int for those, or size_t.

Yes indeed - there's way too much use of fixed width types here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-10 11:24   ` Roger Pau Monné
@ 2019-01-10 11:57     ` Jan Beulich
  2019-01-11  6:30       ` Christopher Clark
  2019-01-11  6:29     ` Christopher Clark
  1 sibling, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-10 11:57 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

 >>> On 10.01.19 at 12:24, <royger@gmail.com> wrote:
> On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>> +static long
>> +register_ring(struct domain *currd,
> 
> If this is indeed the current domain (as the name suggests), why do
> you need to pass it around? Or else just name the parameter d.

When all (or at least most) callers already latch the pointer into a
local variable, handing it through is often cheaper than re-obtaining
it as current->domain. ASSERT(currd == current->domain) might be
worthwhile in such cases, though.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-10  3:09     ` Christopher Clark
@ 2019-01-10 12:01       ` Roger Pau Monné
  2019-01-10 12:13         ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 12:01 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> Thanks for the review, Roger. Replies inline below.
>
> On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné <royger@freebsd.org> wrote:
> >
> >  to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> > >
> > > sendv operation is invoked to perform a synchronous send of buffers
> > > contained in iovs to a remote domain's registered ring.
> > >
> > > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > > index 59ce8c4..4548435 100644
> > > --- a/xen/common/argo.c
> > > +++ b/xen/common/argo.c
>
> > >
> > > +static int
> > > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > > +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > > +                     uint32_t len)
> > > +{
> > > +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> > > +    void *dst;
> > > +    int ret;
> > > +    unsigned int src_offset = 0;
> > > +
> > > +    ASSERT(spin_is_locked(&ring_info->lock));
> > > +
> > > +    offset &= ~PAGE_MASK;
> > > +
> > > +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> > > +        return -EFAULT;
> > > +
> > > +    while ( (offset + len) > PAGE_SIZE )
> >
> > I think you could map the whole ring in contiguous virtual address
> > space and then writing to it would be much more easy, you wouldn't
> > need to iterate with memcpy or copy_from_guest, take a look at __vmap.
> > You could likely map this when the ring gets setup and keep it mapped
> > for the lifetime of the ring.
>
> You're right about that, because map_domain_page_global, which the
> current code uses, uses vmap itself. I think there's a couple of
> reasons why the code has been implemented the iterative way though.
>
> The first is that I think ring resize has been a consideration: it's
> useful to be able to increase the size of a live and active ring that
> is under load without having to tear down the mappings, find a new
> virtual address region of the right size and then remap it: you can
> just supply some more memory and map those pages onto the end of the
> ring, and ensure both sides know about the new ring size. Similarly,
> shrinking a quiet ring can be useful.

Is such on the fly expansion something common with argo?

I'm not saying it's something that shouldn't be supported, but the
burden of allowing such resizing doesn't seem trivial. You will have
to redimension a lot of the arrays used to store the pages used, and
at that point I wonder whether remapping the virtual address space is
really the biggest issue you are going to have if you allow such run
time resizing.

> However, the "gfn race" that you (correctly) pointed out in an earlier
> review, and Jan's related request to drop the "revalidate an existing
> mapping on ring reregister" motivated removal of a section of the code
> involved, and then in v3 of the series, I've actually just blocked
> ring resize because it's missing a walk through the pending
> notifications to find any that have become untriggerable with the new
> ring size when a ring is shrunk and I'd like to defer implementing
> that for now. So the ring resize reason is more of a consideration for
> a possible later version of Argo than the current one.
>
> The second reason is about avoiding exposing the Xen virtual memory
> allocator directly to frequent guest-supplied size requests for
> contiguous regions (of up to 16GB).

As said in another reply, I'm not sure allowing 16GB rings is safe.
The amount of internal memory required to track such rings is not
trivial given the arrays to store the mfns, the pages, and the virtual
mappings.

> With single-page allocations to
> build a ring, fragmentation is not a problem, and mischief by a guest
> seems difficult.

Hm, there's still a lot of big dynamic memory allocations in order to
support a 16GB ring, which makes me think that virtual address space
is not the only problem if you allow 16GB rings.

> Changing it to issue requests for contiguous regions,
> with variable ring sizes up to the maximum of 16GB, it seems like
> significant fragmentation may be achievable. I don't know the
> practical impact of that but it seems worth avoiding. Are the other
> users of __vmap (or vmap) for multi-gigabyte regions only either
> boot-time, infrequent operations (livepatch), or for actions by
> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> a frequent operation somewhere else?

I haven't checked, but I would be quite surprised to find any vmap
usage with such size (16GB). Maybe someone more familiar with the mm
subsystem can provide some insight here.

> Given the context above, and Jason's simplification to the
> memcpy_to_guest_ring function, plus the imminent merge freeze
> deadline, and the understanding that this loop and the related data
> structures supporting it have been tested and are working, would it be
> acceptable to omit making this contiguous mapping change from this
> current series?

My opinion would be to just use vmap if it works, because that IMO
greatly simplifies the code by being able to have the whole ring
mapped at all the time. It would remove the iteration to copy
requests, and remove the usage of ring_map_page everywhere. That would
be my recommendation code-wise, but as said above someone more
familiar with the mm subsystem might have other opinion's about how to
deal with accesses to 16GB of guest memory, and indeed your iterative
solution might be the best approach.

> > > +static long
> > > +sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
> > > +      const xen_argo_addr_t *dst_addr,
> > > +      XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd, unsigned long niov,
> > > +      uint32_t message_type)
> > > +{
> > > +    struct domain *dst_d = NULL;
> > > +    struct argo_ring_id src_id;
> > > +    struct argo_ring_info *ring_info;
> > > +    int ret = 0;
> > > +    unsigned long len = 0;
> > > +
> > > +    ASSERT(src_d->domain_id == src_addr->domain_id);
> > > +
> > > +    argo_dprintk("sendv: (%d:%x)->(%d:%x) niov:%lu iov:%p type:%u\n",
> > > +                 src_addr->domain_id, src_addr->port,
> > > +                 dst_addr->domain_id, dst_addr->port,
> > > +                 niov, iovs_hnd.p, message_type);
> > > +
> > > +    read_lock(&argo_lock);
> > > +
> > > +    if ( !src_d->argo )
> > > +    {
> > > +        ret = -ENODEV;
> > > +        goto out_unlock;
> > > +    }
> > > +
> > > +    src_id.port = src_addr->port;
> > > +    src_id.domain_id = src_d->domain_id;
> > > +    src_id.partner_id = dst_addr->domain_id;
> > > +
> > > +    dst_d = get_domain_by_id(dst_addr->domain_id);
> > > +    if ( !dst_d )
> > > +    {
> > > +        argo_dprintk("!dst_d, ESRCH\n");
> > > +        ret = -ESRCH;
> > > +        goto out_unlock;
> > > +    }
> > > +
> > > +    if ( !dst_d->argo )
> > > +    {
> > > +        argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > > +        ret = -ECONNREFUSED;
> > > +        goto out_unlock;
> >
> > The usage of out_unlock here and in the condition above is wrong,
> > since it will unconditionally call read_unlock(&argo_lock); which is
> > wrong here because the lock has not yet been acquired.
>
> Sorry, I don't think that's quite right -- if you scroll up a bit
> here, you can see where argo_lock is taken unconditionally, just after
> the dprintk and before checking whether src_d is argo enabled. The
> second lock hasn't been taken yet - but that's not the one being
> unlocked on that out_unlock path.

Ops, yes, sorry. Got messed up with so many locks.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-10 12:01       ` Roger Pau Monné
@ 2019-01-10 12:13         ` Jan Beulich
  2019-01-10 12:40           ` Roger Pau Monné
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-10 12:13 UTC (permalink / raw)
  To: royger, Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 10.01.19 at 13:01, <royger@freebsd.org> wrote:
> On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>>
>> The second reason is about avoiding exposing the Xen virtual memory
>> allocator directly to frequent guest-supplied size requests for
>> contiguous regions (of up to 16GB).
> 
> As said in another reply, I'm not sure allowing 16GB rings is safe.
> The amount of internal memory required to track such rings is not
> trivial given the arrays to store the mfns, the pages, and the virtual
> mappings.
> 
>> With single-page allocations to
>> build a ring, fragmentation is not a problem, and mischief by a guest
>> seems difficult.
> 
> Hm, there's still a lot of big dynamic memory allocations in order to
> support a 16GB ring, which makes me think that virtual address space
> is not the only problem if you allow 16GB rings.
> 
>> Changing it to issue requests for contiguous regions,
>> with variable ring sizes up to the maximum of 16GB, it seems like
>> significant fragmentation may be achievable. I don't know the
>> practical impact of that but it seems worth avoiding. Are the other
>> users of __vmap (or vmap) for multi-gigabyte regions only either
>> boot-time, infrequent operations (livepatch), or for actions by
>> privileged (ie. somewhat trusted) domains (ioremap), or is it already
>> a frequent operation somewhere else?
> 
> I haven't checked, but I would be quite surprised to find any vmap
> usage with such size (16GB). Maybe someone more familiar with the mm
> subsystem can provide some insight here.

And indeed the vmap range reserved in VA space is just 64GB (on
x86) at present.

>> Given the context above, and Jason's simplification to the
>> memcpy_to_guest_ring function, plus the imminent merge freeze
>> deadline, and the understanding that this loop and the related data
>> structures supporting it have been tested and are working, would it be
>> acceptable to omit making this contiguous mapping change from this
>> current series?
> 
> My opinion would be to just use vmap if it works, because that IMO
> greatly simplifies the code by being able to have the whole ring
> mapped at all the time. It would remove the iteration to copy
> requests, and remove the usage of ring_map_page everywhere. That would
> be my recommendation code-wise, but as said above someone more
> familiar with the mm subsystem might have other opinion's about how to
> deal with accesses to 16GB of guest memory, and indeed your iterative
> solution might be the best approach.

No-one can allocate 16GB physically contiguous memory.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 10/15] argo: implement the notify op
  2019-01-07  7:42 ` [PATCH v3 10/15] argo: implement the notify op Christopher Clark
@ 2019-01-10 12:21   ` Roger Pau Monné
  2019-01-15  6:53     ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 12:21 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

 On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> Queries for data about space availability in registered rings and
> causes notification to be sent when space has become available.
>
> The hypercall op populates a supplied data structure with information about
> ring state, and if insufficent space is currently available in a given ring,

insufficient

> the hypervisor will record the domain's expressed interest and notify it
> when it observes that space has become available.
>
> Checks for free space occur when this notify op is invoked, so it may be
> intentionally invoked with no data structure to populate
> (ie. a NULL argument) to trigger such a check and consequent notifications.
>
> Limit the maximum number of notify requests in a single operation to a
> simple fixed limit of 256.
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> ---
> v2 feedback Jan: drop cookie, implement teardown
> v2 notify: add flag to indicate ring is shared
> v2 argument name for fill_ring_data arg is now currd
> v2 self: check ring size vs request and flag error rather than queue signal
> v2 feedback Jan: drop 'message' from 'argo_message_op'
> v2 self: simplify signal_domid, drop unnecessary label + goto
> v2 self: skip the cookie check in pending_cancel
> v2 self: implement npending limit on number of pending entries
> v1 feedback #16 Jan: sanitize_ring in ringbuf_payload_space
> v2 self: inline fill_ring_data_array
> v2 self: avoid retesting dst_d for put_domain
> v2 self/Jan: remove use of magic verification field and tidy up
> v1 feedback #16 Jan: remove testing of magic in guest-supplied structure
> v2 self: s/argo_pending_ent/pending_ent/g
> v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
> v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v2 self: reduce indentation via goto out if arg NULL
> v1 feedback #13 Jan: resolve checking of array handle and use of __copy
>
> v1 #5 (#16) feedback Paul: notify op: use currd in do_argo_message_op
> v1 #5 (#16) feedback Paul: notify op: use currd in argo_notify
> v1 #5 (#16) feedback Paul: notify op: use currd in argo_notify_check_pending
> v1 #5 (#16) feedback Paul: notify op: use currd in argo_fill_ring_data_array
> v1 #13 (#16) feedback Paul: notify op: do/while: reindent only
> v1 #13 (#16) feedback Paul: notify op: do/while: goto
> v1 : add compat xlat.lst entries
> v1: add definition for copy_field_from_guest_errno
> v1 #13 feedback Jan: make 'ring data' comment comply with single-line style
> v1 feedback #13 Jan: use __copy; so define and use __copy_field_to_guest_errno
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: #13 feedback Jan: add blank line after case in do_argo_message_op
> v1: self: rename ent id to domain_id
> v1: self: ent id-> domain_id
> v1: self: drop signal if domain_cookie mismatches
> v1. feedback #15 Jan: make loop i unsigned
> v1. self: drop unnecessary mb() in argo_notify_check_pending
> v1. self: add blank line
> v1 #16 feedback Jan: const domain arg to +argo_fill_ring_data
> v1. feedback #15 Jan: check unusued hypercall args are zero
> v1 feedback #16 Jan: add comment on space available signal policy
> v1. feedback #16 Jan: move declr, drop braces, lower indent
> v1. feedback #18 Jan: meld the resource limits into the main commit
> v1. feedback #16 Jan: clarify use of magic field
> v1. self: use single copy to read notify ring data struct
> v1: argo_fill_ring_data: fix dprintk types for port field
> v1: self: use %x for printing port as per other print sites
> v1. feedback Jan: add comments explaining ring full vs empty
> v1. following Jan: fix argo_ringbuf_payload_space calculation for empty ring
>
>  xen/common/argo.c         | 359 ++++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/argo.h |  67 +++++++++
>  xen/include/xlat.lst      |   2 +
>  3 files changed, 428 insertions(+)
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 4548435..37eb291 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -29,6 +29,7 @@
>  #include <public/argo.h>
>
>  #define MAX_RINGS_PER_DOMAIN            128U
> +#define MAX_NOTIFY_COUNT                256U
>  #define MAX_PENDING_PER_RING             32U
>
>  /* All messages on the ring are padded to a multiple of the slot size. */
> @@ -43,6 +44,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_iov_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_data_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_send_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
>
> @@ -231,6 +234,13 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>
> +static struct argo_ring_info *
> +ring_find_info(const struct domain *d, const struct argo_ring_id *id);
> +
> +static struct argo_ring_info *
> +ring_find_info_by_match(const struct domain *d, uint32_t port,
> +                        domid_t partner_id);

Can you place the static functions such that you don't need prototypes for them?

> +
>  /*
>   * This hash function is used to distribute rings within the per-domain
>   * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> @@ -265,6 +275,17 @@ signal_domain(struct domain *d)
>  }
>
>  static void
> +signal_domid(domid_t domain_id)
> +{
> +    struct domain *d = get_domain_by_id(domain_id);

Newline.

> +    if ( !d )
> +        return;
> +
> +    signal_domain(d);
> +    put_domain(d);
> +}
> +
> +static void
>  ring_unmap(struct argo_ring_info *ring_info)
>  {
>      unsigned int i;
> @@ -473,6 +494,62 @@ get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
>      return 0;
>  }
>
> +static uint32_t
> +ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    xen_argo_ring_t ring;
> +    uint32_t len;
> +    int32_t ret;

You use a signed type to internally store the return value, but the
return type from the function itself is unsigned. Is it guaranteed
that ret < INT32_MAX always?

> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    len = ring_info->len;
> +    if ( !len )
> +        return 0;
> +
> +    ret = get_sanitized_ring(&ring, ring_info);
> +    if ( ret )
> +        return 0;
> +
> +    argo_dprintk("sanitized ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
> +                 ring.tx_ptr, ring.rx_ptr);
> +
> +    /*
> +     * rx_ptr == tx_ptr means that the ring has been emptied, so return
> +     * the maximum payload size that can be accepted -- see message size
> +     * checking logic in the entry to ringbuf_insert which ensures that
> +     * there is always one message slot (of size ROUNDUP_MESSAGE(1)) left
> +     * available, preventing a ring from being entirely filled. This ensures
> +     * that matching ring indexes always indicate an empty ring and not a
> +     * full one.
> +     * The subtraction here will not underflow due to minimum size constraints
> +     * enforced on ring size elsewhere.
> +     */
> +    if ( ring.rx_ptr == ring.tx_ptr )
> +        return len - sizeof(struct xen_argo_ring_message_header)
> +                   - ROUNDUP_MESSAGE(1);

Why not do something like:

ret = ring.rx_ptr - ring.tx_ptr;
if ( ret <= 0)
    ret += len;

Instead of this early return?

The only difference when the ring is full is that len should be used
instead of the ptr difference.

> +
> +    ret = ring.rx_ptr - ring.tx_ptr;
> +    if ( ret < 0 )
> +        ret += len;
> +
> +    /*
> +     * The maximum size payload for a message that will be accepted is:
> +     * (the available space between the ring indexes)
> +     *    minus (space for a message header)
> +     *    minus (space for one message slot)
> +     * since ringbuf_insert requires that one message slot be left
> +     * unfilled, to avoid filling the ring to capacity and confusing a full
> +     * ring with an empty one.
> +     * Since the ring indexes are sanitized, the value in ret is aligned, so
> +     * the simple subtraction here works to return the aligned value needed:
> +     */
> +    ret -= sizeof(struct xen_argo_ring_message_header);
> +    ret -= ROUNDUP_MESSAGE(1);
> +
> +    return (ret < 0) ? 0 : ret;
> +}
> +
>  /*
>   * iov_count returns its count on success via an out variable to avoid
>   * potential for a negative return value to be used incorrectly
> @@ -812,6 +889,61 @@ pending_remove_all(struct argo_ring_info *ring_info)
>      ring_info->npending = 0;
>  }
>
> +static void
> +pending_notify(struct hlist_head *to_notify)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    ASSERT(rw_is_locked(&argo_lock));
> +
> +    hlist_for_each_entry_safe(ent, node, next, to_notify, node)
> +    {
> +        hlist_del(&ent->node);
> +        signal_domid(ent->domain_id);
> +        xfree(ent);
> +    }
> +}
> +
> +static void
> +pending_find(const struct domain *d, struct argo_ring_info *ring_info,
> +             uint32_t payload_space, struct hlist_head *to_notify)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    /*
> +     * TODO: Current policy here is to signal _all_ of the waiting domains
> +     *       interested in sending a message of size less than payload_space.
> +     *
> +     * This is likely to be suboptimal, since once one of them has added
> +     * their message to the ring, there may well be insufficient room
> +     * available for any of the others to transmit, meaning that they were
> +     * woken in vain, which created extra work just to requeue their wait.
> +     *
> +     * Retain this simple policy for now since it at least avoids starving a
> +     * domain of available space notifications because of a policy that only
> +     * notified other domains instead. Improvement may be possible;
> +     * investigation required.
> +     */
> +
> +    spin_lock(&ring_info->lock);
> +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> +    {
> +        if ( payload_space >= ent->len )
> +        {
> +            if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +                wildcard_pending_list_remove(ent->domain_id, ent);
> +            hlist_del(&ent->node);
> +            ring_info->npending--;
> +            hlist_add_head(&ent->node, to_notify);
> +        }
> +    }
> +    spin_unlock(&ring_info->lock);
> +}
> +
>  static int
>  pending_queue(struct argo_ring_info *ring_info, domid_t src_id,
>                unsigned int len)
> @@ -874,6 +1006,27 @@ pending_requeue(struct argo_ring_info *ring_info, domid_t src_id,
>  }
>
>  static void
> +pending_cancel(struct argo_ring_info *ring_info, domid_t src_id)
> +{
> +    struct hlist_node *node, *next;
> +    struct pending_ent *ent;
> +
> +    ASSERT(spin_is_locked(&ring_info->lock));
> +
> +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
> +    {
> +        if ( ent->domain_id == src_id )
> +        {
> +            if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +                wildcard_pending_list_remove(ent->domain_id, ent);
> +            hlist_del(&ent->node);
> +            xfree(ent);
> +            ring_info->npending--;
> +        }
> +    }
> +}
> +
> +static void
>  wildcard_rings_pending_remove(struct domain *d)
>  {
>      struct hlist_node *node, *next;
> @@ -994,6 +1147,92 @@ partner_rings_remove(struct domain *src_d)
>  }
>
>  static int
> +fill_ring_data(const struct domain *currd,
> +               XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t) data_ent_hnd)
> +{
> +    xen_argo_ring_data_ent_t ent;
> +    struct domain *dst_d;
> +    struct argo_ring_info *ring_info;
> +    int ret;
> +
> +    ASSERT(rw_is_locked(&argo_lock));
> +
> +    ret = __copy_from_guest(&ent, data_ent_hnd, 1) ? -EFAULT : 0;
> +    if ( ret )
> +        goto out;

if ( __copy_from_guest(&ent, data_ent_hnd, 1) )
    return -EFAULT;

And you can get rid of the out label.

> +
> +    argo_dprintk("fill_ring_data: ent.ring.domain=%u,ent.ring.port=%x\n",
> +                 ent.ring.domain_id, ent.ring.port);
> +
> +    ent.flags = 0;

Please memset ent to 0 or initialize it to { }, or else you are
leaking hypervisor stack data to the guest in the padding field.

> +
> +    dst_d = get_domain_by_id(ent.ring.domain_id);
> +    if ( dst_d )
> +    {
> +        if ( dst_d->argo )
> +        {
> +            read_lock(&dst_d->argo->lock);
> +
> +            ring_info = ring_find_info_by_match(dst_d, ent.ring.port,
> +                                                currd->domain_id);
> +            if ( ring_info )
> +            {
> +                uint32_t space_avail;
> +
> +                ent.flags |= XEN_ARGO_RING_DATA_F_EXISTS;
> +                ent.max_message_size = ring_info->len -
> +                                   sizeof(struct xen_argo_ring_message_header) -
> +                                   ROUNDUP_MESSAGE(1);
> +
> +                if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> +                    ent.flags |= XEN_ARGO_RING_DATA_F_SHARED;
> +
> +                spin_lock(&ring_info->lock);
> +
> +                space_avail = ringbuf_payload_space(dst_d, ring_info);
> +
> +                argo_dprintk("fill_ring_data: port=%x space_avail=%u"
> +                             " space_wanted=%u\n",
> +                             ring_info->id.port, space_avail,
> +                             ent.space_required);
> +
> +                /* Do not queue a notification for an unachievable size */
> +                if ( ent.space_required > ent.max_message_size )
> +                    ent.flags |= XEN_ARGO_RING_DATA_F_EMSGSIZE;
> +                else if ( space_avail >= ent.space_required )
> +                {
> +                    pending_cancel(ring_info, currd->domain_id);
> +                    ent.flags |= XEN_ARGO_RING_DATA_F_SUFFICIENT;
> +                }
> +                else
> +                {
> +                    pending_requeue(ring_info, currd->domain_id,
> +                                    ent.space_required);
> +                    ent.flags |= XEN_ARGO_RING_DATA_F_PENDING;
> +                }
> +
> +                spin_unlock(&ring_info->lock);
> +
> +                if ( space_avail == ent.max_message_size )
> +                    ent.flags |= XEN_ARGO_RING_DATA_F_EMPTY;
> +
> +            }
> +            read_unlock(&dst_d->argo->lock);
> +        }
> +        put_domain(dst_d);
> +    }
> +
> +    ret = __copy_field_to_guest(data_ent_hnd, &ent, flags) ? -EFAULT : 0;
> +    if ( ret )
> +        goto out;
> +
> +    ret = __copy_field_to_guest(data_ent_hnd, &ent, max_message_size) ?
> +                -EFAULT : 0;
> + out:
> +    return ret;
> +}
> +
> +static int
>  find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
>  {
>      p2m_type_t p2mt;
> @@ -1526,6 +1765,111 @@ register_ring(struct domain *currd,
>      return ret;
>  }
>
> +static void
> +notify_ring(struct domain *d, struct argo_ring_info *ring_info,
> +            struct hlist_head *to_notify)
> +{
> +    uint32_t space;
> +
> +    ASSERT(rw_is_locked(&argo_lock));
> +    ASSERT(rw_is_locked(&d->argo->lock));
> +
> +    spin_lock(&ring_info->lock);
> +
> +    if ( ring_info->len )
> +        space = ringbuf_payload_space(d, ring_info);
> +    else
> +        space = 0;
> +
> +    spin_unlock(&ring_info->lock);
> +
> +    if ( space )
> +        pending_find(d, ring_info, space, to_notify);
> +}
> +
> +static void
> +notify_check_pending(struct domain *currd)
> +{
> +    unsigned int i;
> +    HLIST_HEAD(to_notify);
> +
> +    ASSERT(rw_is_locked(&argo_lock));
> +
> +    read_lock(&currd->argo->lock);
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; i++ )
> +    {
> +        struct hlist_node *node, *next;
> +        struct argo_ring_info *ring_info;
> +
> +        hlist_for_each_entry_safe(ring_info, node, next,
> +                                  &currd->argo->ring_hash[i], node)
> +        {
> +            notify_ring(currd, ring_info, &to_notify);
> +        }

No need for the braces.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-10 11:52     ` Jan Beulich
@ 2019-01-10 12:26       ` Roger Pau Monné
  2019-01-10 12:46         ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 12:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 12:52 PM Jan Beulich <JBeulich@suse.com> wrote:
>
>  >>> On 10.01.19 at 11:19, <royger@freebsd.org> wrote:
> > aOn Mon, Jan 7, 2019 at 8:44 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
> >>
> >> +/* Xen command line option to enable argo */
> >> +static bool __read_mostly opt_argo_enabled;
> >> +boolean_param("argo", opt_argo_enabled);
> >
> > I would drop the opt_* prefix, new options added recently don't
> > include the prefix already.
>
> Would you mind pointing out examples? Especially for boolean ones I
> think we've tried to consistently name them opt_*. But in the case
> here (it being static) I'm not overly fussed.

I was mostly thinking about the dom0_pvh boolean option that I added.
I'm not overly fuzzed, I just think it's not adding any value to the
variable name, but if you prefer to keep the opt_ prefix that's fine.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-10 12:13         ` Jan Beulich
@ 2019-01-10 12:40           ` Roger Pau Monné
  2019-01-10 12:53             ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-10 12:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 10.01.19 at 13:01, <royger@freebsd.org> wrote:
> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
> >>
> >> The second reason is about avoiding exposing the Xen virtual memory
> >> allocator directly to frequent guest-supplied size requests for
> >> contiguous regions (of up to 16GB).
> >
> > As said in another reply, I'm not sure allowing 16GB rings is safe.
> > The amount of internal memory required to track such rings is not
> > trivial given the arrays to store the mfns, the pages, and the virtual
> > mappings.
> >
> >> With single-page allocations to
> >> build a ring, fragmentation is not a problem, and mischief by a guest
> >> seems difficult.
> >
> > Hm, there's still a lot of big dynamic memory allocations in order to
> > support a 16GB ring, which makes me think that virtual address space
> > is not the only problem if you allow 16GB rings.
> >
> >> Changing it to issue requests for contiguous regions,
> >> with variable ring sizes up to the maximum of 16GB, it seems like
> >> significant fragmentation may be achievable. I don't know the
> >> practical impact of that but it seems worth avoiding. Are the other
> >> users of __vmap (or vmap) for multi-gigabyte regions only either
> >> boot-time, infrequent operations (livepatch), or for actions by
> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> >> a frequent operation somewhere else?
> >
> > I haven't checked, but I would be quite surprised to find any vmap
> > usage with such size (16GB). Maybe someone more familiar with the mm
> > subsystem can provide some insight here.
>
> And indeed the vmap range reserved in VA space is just 64GB (on
> x86) at present.
>
> >> Given the context above, and Jason's simplification to the
> >> memcpy_to_guest_ring function, plus the imminent merge freeze
> >> deadline, and the understanding that this loop and the related data
> >> structures supporting it have been tested and are working, would it be
> >> acceptable to omit making this contiguous mapping change from this
> >> current series?
> >
> > My opinion would be to just use vmap if it works, because that IMO
> > greatly simplifies the code by being able to have the whole ring
> > mapped at all the time. It would remove the iteration to copy
> > requests, and remove the usage of ring_map_page everywhere. That would
> > be my recommendation code-wise, but as said above someone more
> > familiar with the mm subsystem might have other opinion's about how to
> > deal with accesses to 16GB of guest memory, and indeed your iterative
> > solution might be the best approach.
>
> No-one can allocate 16GB physically contiguous memory.

Right, my question/comment was whether it would make sense to limit
the size of the argos ring to something smaller and then use vmap to
map the whole ring in contiguous virtual space in order to ease
accesses.

TBH, I'm not sure virtual address space is the only issue if argos
allows 16GB rings to be used. 16GB rings will consume a non-trivial
amount of memory for the internal argos state tracking structures
AFAICT.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-10 12:26       ` Roger Pau Monné
@ 2019-01-10 12:46         ` Jan Beulich
  0 siblings, 0 replies; 104+ messages in thread
From: Jan Beulich @ 2019-01-10 12:46 UTC (permalink / raw)
  To: royger
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

>>> On 10.01.19 at 13:26, <royger@freebsd.org> wrote:
> On Thu, Jan 10, 2019 at 12:52 PM Jan Beulich <JBeulich@suse.com> wrote:
>>
>>  >>> On 10.01.19 at 11:19, <royger@freebsd.org> wrote:
>> > aOn Mon, Jan 7, 2019 at 8:44 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
>> >>
>> >> +/* Xen command line option to enable argo */
>> >> +static bool __read_mostly opt_argo_enabled;
>> >> +boolean_param("argo", opt_argo_enabled);
>> >
>> > I would drop the opt_* prefix, new options added recently don't
>> > include the prefix already.
>>
>> Would you mind pointing out examples? Especially for boolean ones I
>> think we've tried to consistently name them opt_*. But in the case
>> here (it being static) I'm not overly fussed.
> 
> I was mostly thinking about the dom0_pvh boolean option that I added.
> I'm not overly fuzzed, I just think it's not adding any value to the
> variable name, but if you prefer to keep the opt_ prefix that's fine.

Well, one value is that at use sites of such variables you immediately
notice that the value is (potentially) admin controlled. At least in some
cases this is helpful to be obvious.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-10 12:40           ` Roger Pau Monné
@ 2019-01-10 12:53             ` Jan Beulich
  2019-01-11  6:37               ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-10 12:53 UTC (permalink / raw)
  To: royger
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Christopher Clark,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

>>> On 10.01.19 at 13:40, <royger@freebsd.org> wrote:
> On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 10.01.19 at 13:01, <royger@freebsd.org> wrote:
>> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark 
> <christopher.w.clark@gmail.com> wrote:
>> >>
>> >> The second reason is about avoiding exposing the Xen virtual memory
>> >> allocator directly to frequent guest-supplied size requests for
>> >> contiguous regions (of up to 16GB).
>> >
>> > As said in another reply, I'm not sure allowing 16GB rings is safe.
>> > The amount of internal memory required to track such rings is not
>> > trivial given the arrays to store the mfns, the pages, and the virtual
>> > mappings.
>> >
>> >> With single-page allocations to
>> >> build a ring, fragmentation is not a problem, and mischief by a guest
>> >> seems difficult.
>> >
>> > Hm, there's still a lot of big dynamic memory allocations in order to
>> > support a 16GB ring, which makes me think that virtual address space
>> > is not the only problem if you allow 16GB rings.
>> >
>> >> Changing it to issue requests for contiguous regions,
>> >> with variable ring sizes up to the maximum of 16GB, it seems like
>> >> significant fragmentation may be achievable. I don't know the
>> >> practical impact of that but it seems worth avoiding. Are the other
>> >> users of __vmap (or vmap) for multi-gigabyte regions only either
>> >> boot-time, infrequent operations (livepatch), or for actions by
>> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
>> >> a frequent operation somewhere else?
>> >
>> > I haven't checked, but I would be quite surprised to find any vmap
>> > usage with such size (16GB). Maybe someone more familiar with the mm
>> > subsystem can provide some insight here.
>>
>> And indeed the vmap range reserved in VA space is just 64GB (on
>> x86) at present.
>>
>> >> Given the context above, and Jason's simplification to the
>> >> memcpy_to_guest_ring function, plus the imminent merge freeze
>> >> deadline, and the understanding that this loop and the related data
>> >> structures supporting it have been tested and are working, would it be
>> >> acceptable to omit making this contiguous mapping change from this
>> >> current series?
>> >
>> > My opinion would be to just use vmap if it works, because that IMO
>> > greatly simplifies the code by being able to have the whole ring
>> > mapped at all the time. It would remove the iteration to copy
>> > requests, and remove the usage of ring_map_page everywhere. That would
>> > be my recommendation code-wise, but as said above someone more
>> > familiar with the mm subsystem might have other opinion's about how to
>> > deal with accesses to 16GB of guest memory, and indeed your iterative
>> > solution might be the best approach.
>>
>> No-one can allocate 16GB physically contiguous memory.
> 
> Right, my question/comment was whether it would make sense to limit
> the size of the argos ring to something smaller and then use vmap to
> map the whole ring in contiguous virtual space in order to ease
> accesses.

Whether you vmap() the ring in (page sized) pieces or in one blob is,
for the purpose of the order of magnitude of VA space consumption,
not overly relevant: You can't map more than at most three such
gigantic rings anyway with the current VA layout. (In practice
mapping individual pages would halve the effectively usable VA
space, due to the guard pages inserted between regions.) IOW -
the ring size should be bounded at a lower value anyway imo.

> TBH, I'm not sure virtual address space is the only issue if argos
> allows 16GB rings to be used. 16GB rings will consume a non-trivial
> amount of memory for the internal argos state tracking structures
> AFAICT.

Fully agree. It has taken us ages to eliminate all runtime
allocations of order > 0, and it looks like we'd be gaining some
back here. I consider this tolerable as long as the feature is
experimental, but it would need fixing for it to become fully
supported. Christopher - annotating such code with fixme
comments right away helps later spotting (and addressing) them.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
                     ` (2 preceding siblings ...)
  2019-01-10 10:19   ` Roger Pau Monné
@ 2019-01-10 16:16   ` Eric Chanudet
  2019-01-11  6:05     ` Christopher Clark
  2019-01-11 11:54   ` Jan Beulich
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 104+ messages in thread
From: Eric Chanudet @ 2019-01-10 16:16 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Roger Pau Monne

On 06/01/19 at 11:42pm, Christopher Clark wrote:
>+partner_rings_remove(struct domain *src_d)
<snip>
>+                ring_info = ring_find_info(dst_d, &send_info->id);
ring_find_info is defined later (PATCH 07/15), should it be moved to
this patch since it is now used here?

-- 
Eric Chanudet

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
  2019-01-09 15:55   ` Wei Liu
  2019-01-10 11:24   ` Roger Pau Monné
@ 2019-01-10 20:11   ` Eric Chanudet
  2019-01-11  6:09     ` Christopher Clark
  2019-01-14 14:19   ` Jan Beulich
  2019-01-14 15:31   ` Andrew Cooper
  4 siblings, 1 reply; 104+ messages in thread
From: Eric Chanudet @ 2019-01-10 20:11 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Roger Pau Monne

On 06/01/19 at 11:42pm, Christopher Clark wrote:
>+/*
>+ * The maximum size of an Argo ring is defined to be: 16GB
>+ *  -- which is 0x1000000 bytes.
>+ * A byte index into the ring is at most 24 bits.
>+ */
>+#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
It looks like 16MB. Did I miss a <<10 somewhere or is it a typo in the
comment?

-- 
Eric Chanudet

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
  2019-01-09 18:05   ` Jason Andryuk
  2019-01-09 18:57   ` Roger Pau Monné
@ 2019-01-10 21:41   ` Eric Chanudet
  2019-01-11  7:12     ` Christopher Clark
  2 siblings, 1 reply; 104+ messages in thread
From: Eric Chanudet @ 2019-01-10 21:41 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Roger Pau Monne

On 06/01/19 at 11:42pm, Christopher Clark wrote:
>+memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
>+                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
>+                     uint32_t len)
>+{
>+    unsigned int mfns_index = offset >> PAGE_SHIFT;
>+    void *dst;
>+    int ret;
>+    unsigned int src_offset = 0;
>+
>+    ASSERT(spin_is_locked(&ring_info->lock));
>+
>+    offset &= ~PAGE_MASK;
>+
>+    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
>+        return -EFAULT;
With offset < PAGE_SIZE with the previous mask, shouldn't the sanity
check be:
    if (len + offset > XEN_ARGO_MAX_RING_SIZE)

-- 
Eric Chanudet

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-09 14:38         ` Jan Beulich
@ 2019-01-10 23:29           ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-10 23:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Wed, Jan 9, 2019 at 6:38 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 09.01.19 at 15:26, <jandryuk@gmail.com> wrote:
> > On Wed, Jan 9, 2019 at 4:35 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 08.01.19 at 23:54, <jandryuk@gmail.com> wrote:
> >> > On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
> >> >> +     */
> >> >> +    struct argo_ring_info *ring_info;
> >> >> +    /* domain to be notified when space is available */
> >> >> +    domid_t domain_id;
> >> >> +    uint16_t pad;
> >> >
> >> > Can we order domain_id after len and drop the pad?
> >>
> >> That would still call for a pad field - we prefer to have explicit padding,
> >> and also to check it's zero, the latter to allow for assigning meaning to
> >> the field down the road.
> >
> > This struct is internal to Xen and argo, so do we still need explicit
> > padding?
>
> Oh, internal structures don't need any explicit padding. Where the
> domain_id field gets placed still doesn't matter then, though.

ok, I've switched the len field here from uint32_t to unsigned int
(part of moving away from fixed-width types where not required, as
requested later in this thread) and so while at it have dropped the
unneeded pad field too.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-10 10:19   ` Roger Pau Monné
  2019-01-10 11:52     ` Jan Beulich
@ 2019-01-11  6:03     ` Christopher Clark
  2019-01-11  9:27       ` Roger Pau Monné
  1 sibling, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  6:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 2:19 AM Roger Pau Monné <royger@freebsd.org> wrote:
>
>  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > Initialises basic data structures and performs teardown of argo state
> > for domain shutdown.

> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 6f782f7..86195d3 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -17,7 +17,177 @@
> >   */
> >
> >  #include <xen/errno.h>
> > +#include <xen/sched.h>
> > +#include <xen/domain.h>
> > +#include <xen/argo.h>
> > +#include <xen/event.h>
> > +#include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/time.h>
> > +#include <public/argo.h>
>
> We usually try to sort header includes alphabetically, and I would add
> a newline between the xen/* and the public/* header includes.

ack, thanks, done.

>
> > +
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> > +
> > +/* Xen command line option to enable argo */
> > +static bool __read_mostly opt_argo_enabled;
> > +boolean_param("argo", opt_argo_enabled);
>
> I would drop the opt_* prefix, new options added recently don't
> include the prefix already.

after the later in-thread discussion w/ yourself and Jan, I've
retained this prefix

> > +/* Data about a domain's own ring that it has registered */
> > +struct argo_ring_info
> > +{
> > +    /* next node in the hash, protected by L2 */
> > +    struct hlist_node node;
> > +    /* this ring's id, protected by L2 */
> > +    struct argo_ring_id id;
> > +    /* L3 */
> > +    spinlock_t lock;
> > +    /* length of the ring, protected by L3 */
> > +    uint32_t len;
> > +    /* number of pages in the ring, protected by L3 */
> > +    uint32_t npage;
>
> Can you infer number of pages form the length of the ring, or the
> other way around?
>
> I'm not sure why both need to be stored here.

Yes, you're right. I've removed the npage struct member and calculated
the value where needed from the len.

> > +    /* number of pages translated into mfns, protected by L3 */
> > +    uint32_t nmfns;
> > +    /* cached tx pointer location, protected by L3 */
> > +    uint32_t tx_ptr;
>
> All this fields are not part of any public structure, so I wonder if
> it would be better to simply use unsigned int for those, or size_t.

ack, done.

> > +    /* mapped ring pages protected by L3 */
> > +    uint8_t **mfn_mapping;
>
> Why 'uint8_t *', wouldn't it be better to just use 'void *' if it's a mapping?

Yes. Have switched it to be void*.

> > +/* A space-available notification that is awaiting sufficient space */
> > +struct pending_ent
> > +{
> > +    /* List node within argo_ring_info's pending list */
> > +    struct hlist_node node;
> > +    /*
> > +     * List node within argo_domain's wildcard_pend_list. Only used if the
> > +     * ring is one with a wildcard partner (ie. that any domain may send to)
> > +     * to enable cancelling signals on wildcard rings on domain destroy.
> > +     */
> > +    struct hlist_node wildcard_node;
> > +    /*
> > +     * Pointer to the ring_info that this ent pertains to. Used to ensure that
> > +     * ring_info->npending is decremented when ents for wildcard rings are
> > +     * cancelled for domain destroy.
> > +     * Caution: Must hold the correct locks before accessing ring_info via this.
> > +     */
> > +    struct argo_ring_info *ring_info;
> > +    /* domain to be notified when space is available */
> > +    domid_t domain_id;
> > +    uint16_t pad;
>
> No need for the pad in internal structures.

ack - removed pad.

> > +/*
> > + * The value of the argo element in a struct domain is
> > + * protected by the global lock argo_lock: L1
> > + */
> > +#define ARGO_HTABLE_SIZE 32
> > +struct argo_domain
> > +{
> > +    /* L2 */
> > +    rwlock_t lock;
> > +    /*
> > +     * Hash table of argo_ring_info about rings this domain has registered.
> > +     * Protected by L2.
> > +     */
> > +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> > +    /* Counter of rings registered by this domain. Protected by L2. */
> > +    uint32_t ring_count;
> > +
> > +    /* Lsend */
> > +    spinlock_t send_lock;
> > +    /*
> > +     * Hash table of argo_send_info about rings other domains have registered
> > +     * for this domain to send to. Single partner, non-wildcard rings.
> > +     * Protected by Lsend.
> > +     */
> > +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> > +
> > +    /* Lwildcard */
> > +    spinlock_t wildcard_lock;
> > +    /*
> > +     * List of pending space-available signals for this domain about wildcard
> > +     * rings registered by other domains. Protected by Lwildcard.
> > +     */
> > +    struct hlist_head wildcard_pend_list;
> > +};
> > +
> > +/*
> > + * Locking is organized as follows:
> > + *
> > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > + *              W(<lock>) means taking a write lock on it.
> > + *
> > + * L1 : The global lock: argo_lock
> > + * Protects the argo elements of all struct domain *d in the system.
> > + * It does not protect any of the elements of d->argo, only their
> > + * addresses.
> > + *
> > + * By extension since the destruction of a domain with a non-NULL
> > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > + * guarantees that no domains pointers that argo is interested in
> > + * become invalid whilst this lock is held.
> > + */
> > +
> > +static DEFINE_RWLOCK(argo_lock); /* L1 */
>
> You also add an argo_lock to each domain struct which doesn't seem to
> be mentioned here at all.

You're right! Thanks - that's a nice find. That lock is not used at all.
I'd missed it since it's just not referenced anywhere in the argo.c file.
I've removed it.

> Shouldn't that lock be the one that protects d->argo? (instead of this global lock?)

According the design that is in place at the moment, no, but
I need to study that option a bit before I can comment further on
whether it would make sense to add it in order to do so.
I imagine not though because we're not looking to add any more locks.

> > +/*
> > + * L2 : The per-domain ring hash lock: d->argo->lock
> > + * Holding a read lock on L2 protects the ring hash table and
> > + * the elements in the hash_table d->argo->ring_hash, and
> > + * the node and id fields in struct argo_ring_info in the
> > + * hash table.
> > + * Holding a write lock on L2 protects all of the elements of
> > + * struct argo_ring_info.
> > + *
> > + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> > + *
> > + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> > + * Protects all the fields within the argo_ring_info, aside from the ones that
> > + * L2 already protects: node, id, lock.
> > + *
> > + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> > + *
> > + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> > + * Protects the per-domain send hash table : d->argo->send_hash
> > + * and the elements in the hash table, and the node and id fields
> > + * in struct argo_send_info in the hash table.
> > + *
> > + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> > + * Do not attempt to acquire a L2 on any domain after taking and while
> > + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> > + *
> > + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> > + * Protects the per-domain list of outstanding signals for space availability
> > + * on wildcard rings.
> > + *
> > + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> > + * No other locks are acquired after obtaining Lwildcard.
> > + */
>
> IMO I think the locking is overly complicated, and there's no
> reasoning why so many locks are needed. Wouldn't it be enough to start
> with a single lock that protects the whole d->argo existence and
> contents?
>
> I would start with a very simple (as simple as possible) locking
> structure and go improving from there if there are performance
> bottlenecks.

It definitely doesn't help when there's an extra lock lying around
just to be confusing. Sorry.

The locking discipline in this code is challenging and you are right that
there hasn't a explanation given as to _why_ there are the locks that there
are. I will fix that. I can also review the placement of the ASSERTs that
check (and document) the locks within the code, if that helps.

The current locking comments describe the how, but the why hasn't been
covered so far and it is needed. The unreasonably-short version is: this
code is *hot* when the communication paths are in use -- it operates the
data path -- and there needs to be isolation for paths using rings from the
potentially malicious or disruptive activities of other domains, or even
other vcpus of the same domain operating other rings.

I am confident that the locking (that actually gets operated) is correct and
justified though, and I hope that adding some new clear documentation for it
can address this.

> >  /* Change this to #define ARGO_DEBUG here to enable more debug messages */
> >  #undef ARGO_DEBUG
> > @@ -28,10 +198,299 @@
> >  #define argo_dprintk(format, ... ) ((void)0)
> >  #endif
> >
> > +static void
> > +ring_unmap(struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +        return;
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +    {
> > +        if ( !ring_info->mfn_mapping[i] )
> > +            continue;
> > +        if ( ring_info->mfns )
> > +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from %p\n",
> > +                         mfn_x(ring_info->mfns[i]),
> > +                         ring_info->mfn_mapping[i]);
> > +        unmap_domain_page_global(ring_info->mfn_mapping[i]);
> > +        ring_info->mfn_mapping[i] = NULL;
> > +    }
>
> As noted in another patch, I would consider mapping this in contiguous
> virtual address space using vmap, but I'm not going to insist.

Thanks - I will look at what is involved to do it, but I may well have
to leave it as-is for now.

>
> > +}
> > +
> > +static void
> > +wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> > +{
> > +    struct domain *d = get_domain_by_id(domain_id);
>
> Newline.

ack

>
> > +    if ( !d )
> > +        return;
> > +
> > +    if ( d->argo )
>
> Don't you need to pick d->argo_lock here to prevent d->argo from being
> removed under your feet?

No, because wildcard_pending_list_remove is called from:

* pending_find and pending_cancel:
  with both R(L2) and L3 held (of a different domain, but
  that's ok), which means R(L1) is held,
  so d->argo is safe in wildcard_pending_list_remove.

* pending_remove_all:
  which is only called from ring_remove_info,
  which has ASSERTS that either R(L1) or R(L2) is held,
  and R(L2) means R(L1) must already be held (following protocol).
  so d->argo is safe in wildcard_pending_list_remove.

>
> > +    {
> > +        spin_lock(&d->argo->wildcard_lock);
> > +        hlist_del(&ent->wildcard_node);
> > +        spin_unlock(&d->argo->wildcard_lock);
> > +    }
> > +    put_domain(d);
> > +}
> > +
> > +static void
> > +pending_remove_all(struct argo_ring_info *ring_info)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &ring_info->pending, node)
>
> As a side note, it might be interesting to introduce a helper like
> hlist_first_entry_or_null, that would remove the need to have an extra
> *next element, and would be the more natural way to drain a hlist
> (seeing that you have the same pattern in
> wildcard_rings_pending_remove).
>
> > +    {
> > +        if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> > +            wildcard_pending_list_remove(ent->domain_id, ent);
> > +        hlist_del(&ent->node);
> > +        xfree(ent);
> > +    }
> > +    ring_info->npending = 0;
> > +}
> > +
> > +static void
> > +wildcard_rings_pending_remove(struct domain *d)
> > +{
> > +    struct hlist_node *node, *next;
> > +    struct pending_ent *ent;
> > +
> > +    ASSERT(rw_is_write_locked(&argo_lock));
> > +
> > +    hlist_for_each_entry_safe(ent, node, next, &d->argo->wildcard_pend_list,
> > +                              node)
> > +    {
> > +        hlist_del(&ent->node);
> > +        ent->ring_info->npending--;
> > +        hlist_del(&ent->wildcard_node);
> > +        xfree(ent);
> > +    }
> > +}
> > +
> > +static void
> > +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
>
> I think the above requires a comment of why two different locks are
> used to protect the ring mfns, and why just having one of them locked
> is enough.

ack, I need to add further locking docs.

>
> > +
> > +    if ( !ring_info->mfns )
> > +        return;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    ring_unmap(ring_info);
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> > +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> > +
> > +    xfree(ring_info->mfns);
> > +    ring_info->mfns = NULL;
>
> Xen has a handly macro for this, XFREE. That would free the memory and
> assign the pointer to NULL.

thanks, that's great - done

>
> > +    ring_info->npage = 0;
> > +    xfree(ring_info->mfn_mapping);
> > +    ring_info->mfn_mapping = NULL;
> > +    ring_info->nmfns = 0;
> > +}
> > +
> > +static void
> > +ring_remove_info(struct domain *d, struct argo_ring_info *ring_info)
>
> I think the domain parameter can be constifyed here, since it's only
> used by ring_remove_mfnsnd that function already expects a const
> domain struct.

ack, yes, thanks.

> > +
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    struct domain *currd = current->domain;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +    {
> > +        rc = -EOPNOTSUPP;
> > +        return rc;
>
> Why not just 'return -EOPNOTSUPP;'?

good point - will do

> > +
> > +int
> > +argo_init(struct domain *d)
> > +{
> > +    struct argo_domain *argo;
> > +
> > +    if ( !opt_argo_enabled )
> > +    {
> > +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> > +        return 0;
> > +    }
> > +
> > +    argo_dprintk("init: domid: %d\n", d->domain_id);
> > +
> > +    argo = xmalloc(struct argo_domain);
> > +    if ( !argo )
> > +        return -ENOMEM;
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_domain_init(argo);
> > +
> > +    d->argo = argo;
>
> Where's the d->argo_lock initialization?

It was added to domain.c in this patch, but there is now no need, that lock
is gone. Thanks for the catch.

>
> > +
> > +    write_unlock(&argo_lock);
> > +
> > +    return 0;
> > +}
> > +
> > +void
> > +argo_destroy(struct domain *d)
> > +{
> > +    BUG_ON(!d->is_dying);
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +        xfree(d->argo);
> > +        d->argo = NULL;
> > +    }
> > +    write_unlock(&argo_lock);
> > +}
> > +
> > +void
> > +argo_soft_reset(struct domain *d)
> > +{
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +
> > +        if ( !opt_argo_enabled )
> > +        {
> > +            xfree(d->argo);
> > +            d->argo = NULL;
>
> Can opt_argo_enabled change during runtime?

Not at the moment, no. It should be made changeable
later, but keeping it fixed assists with derisking this for
release consideration.

>
> > +        }
> > +        else
> > +            argo_domain_init(d->argo);
> > +    }
> > +
> > +    write_unlock(&argo_lock);
> >  }
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index c623dae..9596840 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -32,6 +32,7 @@
> >  #include <xen/grant_table.h>
> >  #include <xen/xenoprof.h>
> >  #include <xen/irq.h>
> > +#include <xen/argo.h>
> >  #include <asm/debugger.h>
> >  #include <asm/p2m.h>
> >  #include <asm/processor.h>
> > @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
> >
> >      xfree(d->pbuf);
> >
> > +#ifdef CONFIG_ARGO
> > +    argo_destroy(d);
> > +#endif
>
> Instead of adding such ifdefs you could provide dummy argo_destroy
> inline functions in argo.h when CONFIG_ARGO is not set.

ack, have done this.

Thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-10 16:16   ` Eric Chanudet
@ 2019-01-11  6:05     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  6:05 UTC (permalink / raw)
  To: Christopher Clark, xen-devel, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Konrad Rzeszutek Wilk,
	Paul Durrant, Roger Pau Monne, Stefano Stabellini, Tim Deegan,
	Wei Liu, Jason Andryuk, Rich Persaud, Ross Philipson,
	James McKenzie, Daniel Smith

On Thu, Jan 10, 2019 at 8:16 AM Eric Chanudet <eric.chanudet@gmail.com> wrote:
>
> On 06/01/19 at 11:42pm, Christopher Clark wrote:
> >+partner_rings_remove(struct domain *src_d)
> <snip>
> >+                ring_info = ring_find_info(dst_d, &send_info->id);
> ring_find_info is defined later (PATCH 07/15), should it be moved to
> this patch since it is now used here?

Yes, you're right -- thanks -- and I should've caught that. Fixed and
will be thorough with testing the series individually before posting
again.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-10 20:11   ` Eric Chanudet
@ 2019-01-11  6:09     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  6:09 UTC (permalink / raw)
  To: Christopher Clark, xen-devel, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Konrad Rzeszutek Wilk,
	Paul Durrant, Roger Pau Monne, Stefano Stabellini, Tim Deegan,
	Wei Liu, Jason Andryuk, Rich Persaud, Ross Philipson,
	James McKenzie, Daniel Smith

On Thu, Jan 10, 2019 at 12:11 PM Eric Chanudet <eric.chanudet@gmail.com> wrote:
>
> On 06/01/19 at 11:42pm, Christopher Clark wrote:
> >+/*
> >+ * The maximum size of an Argo ring is defined to be: 16GB
> >+ *  -- which is 0x1000000 bytes.
> >+ * A byte index into the ring is at most 24 bits.
> >+ */
> >+#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
> It looks like 16MB. Did I miss a <<10 somewhere or is it a typo in the
> comment?

Yeah, it's an error in the comment. Thanks.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-10 11:24   ` Roger Pau Monné
  2019-01-10 11:57     ` Jan Beulich
@ 2019-01-11  6:29     ` Christopher Clark
  2019-01-11  9:38       ` Roger Pau Monné
  1 sibling, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  6:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 3:25 AM Roger Pau Monné <royger@gmail.com> wrote:
>
>  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > The register op is used by a domain to register a region of memory for
> > receiving messages from either a specified other domain, or, if specifying a
> > wildcard, any domain.
> >
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index aea13eb..68d4415 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which supports registration
> >  of memory rings with the hypervisor to receive messages, sending messages to
> >  other domains by hypercall and querying the ring status of other domains.
> >
> > +### argo-mac
> > +> `= permissive | enforcing`
>
> Why not call this argo-mac-permissive and make it a boolean? Default
> would be 'false' and that would imply enforcing. This would get rid of
> parse_opt_argo_mac since you could use the default boolean parser.

Yes, that makes sense, thanks -- done

> > +static int
> > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void **out_ptr)
> > +{
> > +    if ( i >= ring_info->nmfns )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of %u\n",
> > +                ring_info->id.domain_id, ring_info->id.port,
> > +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> > +        return -ENOMEM;
> > +    }
> > +
> > +    if ( !ring_info->mfns || !ring_info->mfn_mapping)
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        ring_info->len = 0;
> > +        return -ENOMEM;
> > +    }
> > +
> > +    if ( !ring_info->mfn_mapping[i] )
> > +    {
> > +        /*
> > +         * TODO:
> > +         * The first page of the ring contains the ring indices, so both read
> > +         * and write access to the page is required by the hypervisor, but
> > +         * read-access is not needed for this mapping for the remainder of the
> > +         * ring.
> > +         * Since this mapping will remain resident in Xen's address space for
> > +         * the lifetime of the ring, and following the principle of least
> > +         * privilege, it could be preferable to:
> > +         *  # add a XSM check to determine what policy is wanted here
> > +         *  # depending on the XSM query, optionally create this mapping as
> > +         *    _write-only_ on platforms that can support it.
> > +         *    (eg. Intel EPT/AMD NPT).
>
> Why do Intel EPT or AMD NPT matter here?

I think (though could be wrong and am open to correction here) that
EPT and NPT enable the construction of write-only (ie not readable)
memory mappings. Standard page tables can't do that: with those,
if it's writable, it's also readable.

> You are mapping the page to Xen address space, which doesn't use
> either EPT or NPT. Writable or read-only mappings would be created by
> setting the right bit in the Xen page tables.

ok. I've dropped the comment.

>
> > +         */
> > +        ring_info->mfn_mapping[i] = map_domain_page_global(ring_info->mfns[i]);
> > +
>
> No need for the newline.

ack.

> > +static void
> > +update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> > +{
> > +    void *dst;
> > +    uint32_t *p;
> > +
> > +    ASSERT(ring_info->mfn_mapping[0]);
> > +
> > +    ring_info->tx_ptr = tx_ptr;
> > +
> > +    dst = ring_info->mfn_mapping[0];
> > +    p = dst + offsetof(xen_argo_ring_t, tx_ptr);
>
> Hm, wouldn't it be easier to cast page 0 to the layout of the ring so
> that you don't need to use pointer arithmetic to get the fields? Ie:
> make dst be of type xen_argo_ring_t.

Yes, good point - and that's what's already done elsewhere with the
rx_ptr, so that makes the code more consistent. Done.

> > +static int
> > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > +               uint32_t npage,
> > +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > +               uint32_t len)
> > +{
> > +    unsigned int i;
> > +    int ret = 0;
> > +    mfn_t *mfns;
> > +    uint8_t **mfn_mapping;
> > +
> > +    /*
> > +     * first bounds check on npage here also serves as an overflow check
> > +     * before left shifting it
> > +     */
> > +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
> > +         ((npage << PAGE_SHIFT) < len) )
> > +        return -EINVAL;
> > +
> > +    if ( ring_info->mfns )
> > +    {
> > +        /* Ring already existed: drop the previous mapping. */
> > +        gprintk(XENLOG_INFO,
> > +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears mapping\n",
> > +                d->domain_id, ring_info->id.domain_id,
> > +                ring_info->id.port, ring_info->id.partner_id);
> > +
> > +        ring_remove_mfns(d, ring_info);
> > +        ASSERT(!ring_info->mfns);
> > +    }
> > +
> > +    mfns = xmalloc_array(mfn_t, npage);
> > +    if ( !mfns )
> > +        return -ENOMEM;
> > +
> > +    for ( i = 0; i < npage; i++ )
> > +        mfns[i] = INVALID_MFN;
> > +
> > +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> > +    if ( !mfn_mapping )
> > +    {
> > +        xfree(mfns);
> > +        return -ENOMEM;
> > +    }
> > +
> > +    ring_info->npage = npage;
> > +    ring_info->mfns = mfns;
> > +    ring_info->mfn_mapping = mfn_mapping;
> > +
> > +    ASSERT(ring_info->npage == npage);
> > +
> > +    if ( ring_info->nmfns == ring_info->npage )
> > +        return 0;
> > +
> > +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
>
> This loop seems to assume that there can be pages already added to the
> ring, but IIRC you said that redimensioning of the ring was removed in
> this version?

That's correct - it's currently rejected.

> I think for an initial version it would be easier to don't allow
> redimensioning of active rings, and just allow teardown and
> re-initialization as the way to redimension a ring.

ack. I'll look into how it affects this function and simplifying it.


>
> > +    {
> > +        xen_argo_page_descr_t pg_descr;
> > +        gfn_t gfn;
> > +        mfn_t mfn;
> > +
> > +        ret = __copy_from_guest_offset(&pg_descr, pg_descr_hnd, i, 1) ?
> > +                -EFAULT : 0;
> > +        if ( ret )
> > +            break;
> > +
> > +        /* Implementation currently only supports handling 4K pages */
> > +        if ( (pg_descr & XEN_ARGO_PAGE_DESCR_SIZE_MASK) !=
> > +                XEN_ARGO_PAGE_DESCR_SIZE_4K )
> > +        {
> > +            ret = -EINVAL;
> > +            break;
> > +        }
> > +        gfn = _gfn(pg_descr >> PAGE_SHIFT);
> > +
> > +        ret = find_ring_mfn(d, gfn, &mfn);
> > +        if ( ret )
> > +        {
> > +            gprintk(XENLOG_ERR,
> > +               "argo: vm%u: invalid gfn %"PRI_gfn" r:(vm%u:%x vm%d) %p %d/%d\n",
> > +                    d->domain_id, gfn_x(gfn), ring_info->id.domain_id,
> > +                    ring_info->id.port, ring_info->id.partner_id,
> > +                    ring_info, i, ring_info->npage);
> > +            break;
> > +        }
> > +
> > +        ring_info->mfns[i] = mfn;
> > +
> > +        argo_dprintk("%d: %"PRI_gfn" -> %"PRI_mfn"\n",
> > +                     i, gfn_x(gfn), mfn_x(ring_info->mfns[i]));
> > +    }
> > +
> > +    ring_info->nmfns = i;
> > +
> > +    if ( ret )
> > +        ring_remove_mfns(d, ring_info);
> > +    else
> > +    {
> > +        ASSERT(ring_info->nmfns == ring_info->npage);
> > +
> > +        gprintk(XENLOG_DEBUG,
> > +        "argo: vm%u ring (vm%u:%x vm%d) %p mfn_mapping %p npage %d nmfns %d\n",
> > +                d->domain_id, ring_info->id.domain_id,
> > +                ring_info->id.port, ring_info->id.partner_id, ring_info,
> > +                ring_info->mfn_mapping, ring_info->npage, ring_info->nmfns);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +static struct argo_ring_info *
> > +ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> > +{
> > +    unsigned int ring_hash_index;
> > +    struct hlist_node *node;
> > +    struct argo_ring_info *ring_info;
> > +
> > +    ASSERT(rw_is_locked(&d->argo->lock));
> > +
> > +    ring_hash_index = hash_index(id);
> > +
> > +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
> > +                 d->argo, ring_hash_index,
> > +                 d->argo->ring_hash[ring_hash_index].first, id);
> > +    argo_dprintk("id.port=%x id.domain=vm%u id.partner_id=vm%d\n",
> > +                 id->port, id->domain_id, id->partner_id);
> > +
> > +    hlist_for_each_entry(ring_info, node, &d->argo->ring_hash[ring_hash_index],
> > +                         node)
> > +    {
> > +        struct argo_ring_id *cmpid = &ring_info->id;
>
> const?

yep, thanks, done.

>
> > +
> > +        if ( cmpid->port == id->port &&
> > +             cmpid->domain_id == id->domain_id &&
> > +             cmpid->partner_id == id->partner_id )
> > +        {
> > +            argo_dprintk("ring_info=%p\n", ring_info);
> > +            return ring_info;
> > +        }
> > +    }
> > +    argo_dprintk("no ring_info found\n");
> > +
> > +    return NULL;
> > +}
> > +
> > +static long
> > +register_ring(struct domain *currd,
>
> If this is indeed the current domain (as the name suggests), why do
> you need to pass it around? Or else just name the parameter d.

After the later in-thread discussion between you and Jan, I've left
the argument name 'currd' but added the ASSERT recommended by Jan,
that currd matches domain->current.

I've done the same in the other functions (across the series) that
take currd as an argument, except for notify_check_pending where I've
just renamed the argument to 'd'; there's no reason in that function
that it needs to be handling the current domain.

>
> > +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> > +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > +              uint32_t npage, bool fail_exist)
> > +{
> > +    xen_argo_register_ring_t reg;
> > +    struct argo_ring_id ring_id;
> > +    void *map_ringp;
> > +    xen_argo_ring_t *ringp;
> > +    struct argo_ring_info *ring_info;
> > +    struct argo_send_info *send_info = NULL;
> > +    struct domain *dst_d = NULL;
> > +    int ret = 0;
> > +    uint32_t private_tx_ptr;
> > +
> > +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> > +    {
> > +        ret = -EFAULT;
> > +        goto out;
>
> I don't see the point of using an out label, why not just use 'return
> -EFAULT;' (here and below). This avoids the braces and also removes
> the need for the ret assignment.

done.

>
> > +    }
> > +
> > +    /*
> > +     * A ring must be large enough to transmit messages, so requires space for:
> > +     * * 1 message header, plus
> > +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> > +     *   for the message payload to be written into, plus
> > +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> > +     *   single message -- see the logic in ringbuf_insert -- allowing for this
> > +     *   ensures that there can be space remaining when a message is present.
> > +     * The above determines the minimum acceptable ring size.
> > +     */
> > +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> > +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
> > +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> > +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> > +         (reg.pad != 0) )
> > +    {
> > +        ret = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    ring_id.partner_id = reg.partner_id;
> > +    ring_id.port = reg.port;
> > +    ring_id.domain_id = currd->domain_id;
> > +
> > +    read_lock(&argo_lock);
> > +
> > +    if ( !currd->argo )
> > +    {
> > +        ret = -ENODEV;
> > +        goto out_unlock;
> > +    }
> > +
> > +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> > +    {
> > +        if ( opt_argo_mac_enforcing )
> > +        {
> > +            ret = -EPERM;
> > +            goto out_unlock;
> > +        }
> > +    }
> > +    else
> > +    {
> > +        dst_d = get_domain_by_id(reg.partner_id);
> > +        if ( !dst_d )
> > +        {
> > +            argo_dprintk("!dst_d, ESRCH\n");
> > +            ret = -ESRCH;
> > +            goto out_unlock;
> > +        }
> > +
> > +        if ( !dst_d->argo )
> > +        {
> > +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > +            ret = -ECONNREFUSED;
> > +            put_domain(dst_d);
> > +            goto out_unlock;
> > +        }
> > +
> > +        send_info = xzalloc(struct argo_send_info);
> > +        if ( !send_info )
> > +        {
> > +            ret = -ENOMEM;
> > +            put_domain(dst_d);
> > +            goto out_unlock;
> > +        }
> > +        send_info->id = ring_id;
> > +    }
> > +
> > +    write_lock(&currd->argo->lock);
> > +
> > +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> > +    {
> > +        ret = -ENOSPC;
> > +        goto out_unlock2;
> > +    }
> > +
> > +    ring_info = ring_find_info(currd, &ring_id);
> > +    if ( !ring_info )
> > +    {
> > +        ring_info = xzalloc(struct argo_ring_info);
> > +        if ( !ring_info )
> > +        {
> > +            ret = -ENOMEM;
> > +            goto out_unlock2;
> > +        }
> > +
> > +        spin_lock_init(&ring_info->lock);
> > +
> > +        ring_info->id = ring_id;
> > +        INIT_HLIST_HEAD(&ring_info->pending);
> > +
> > +        hlist_add_head(&ring_info->node,
> > +                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> > +
> > +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +    }
> > +    else
> > +    {
> > +        if ( ring_info->len )
> > +        {
> > +            /*
> > +             * If the caller specified that the ring must not already exist,
> > +             * fail at attempt to add a completed ring which already exists.
> > +             */
> > +            if ( fail_exist )
> > +            {
> > +                argo_dprintk("disallowed reregistration of existing ring\n");
> > +                ret = -EEXIST;
> > +                goto out_unlock2;
> > +            }
> > +
> > +            if ( ring_info->len != reg.len )
> > +            {
> > +                /*
> > +                 * Change of ring size could result in entries on the pending
> > +                 * notifications list that will never trigger.
> > +                 * Simple blunt solution: disallow ring resize for now.
> > +                 * TODO: investigate enabling ring resize.
> > +                 */
>
> I think ring resizing was removed on this version?

Yes: This is the code that was introduced to prevent it.

>
> > +                gprintk(XENLOG_ERR,
> > +                    "argo: vm%u attempted to change ring size(vm%u:%x vm%d)\n",
> > +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                        ring_id.partner_id);
> > +                /*
> > +                 * Could return EINVAL here, but if the ring didn't already
> > +                 * exist then the arguments would have been valid, so: EEXIST.
> > +                 */
> > +                ret = -EEXIST;
> > +                goto out_unlock2;
> > +            }
> > +
> > +            gprintk(XENLOG_DEBUG,
> > +                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
> > +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                    ring_id.partner_id);
> > +        }
> > +    }
> > +
> > +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +
> > +        ring_remove_info(currd, ring_info);
> > +        goto out_unlock2;
> > +    }
> > +
> > +    /*
> > +     * The first page of the memory supplied for the ring has the xen_argo_ring
> > +     * structure at its head, which is where the ring indexes reside.
> > +     */
> > +    ret = ring_map_page(ring_info, 0, &map_ringp);
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +
> > +        ring_remove_info(currd, ring_info);
> > +        goto out_unlock2;
> > +    }
> > +    ringp = map_ringp;
> > +
> > +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> > +
> > +    if ( (private_tx_ptr >= reg.len) ||
> > +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> > +    {
> > +        /*
> > +         * Since the ring is a mess, attempt to flush the contents of it
> > +         * here by setting the tx_ptr to the next aligned message slot past
> > +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> > +         */
> > +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> > +
> > +        if ( private_tx_ptr >= reg.len )
> > +            private_tx_ptr = 0;
> > +
> > +        update_tx_ptr(ring_info, private_tx_ptr);
> > +    }
> > +
> > +    ring_info->tx_ptr = private_tx_ptr;
> > +    ring_info->len = reg.len;
> > +    currd->argo->ring_count++;
> > +
> > +    if ( send_info )
> > +    {
> > +        spin_lock(&dst_d->argo->send_lock);
> > +
> > +        hlist_add_head(&send_info->node,
> > +                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> > +
> > +        spin_unlock(&dst_d->argo->send_lock);
> > +    }
> > +
> > + out_unlock2:
> > +    if ( !ret && send_info )
> > +        xfree(send_info);
>
> There's no need to check if send_info is set, xfree(NULL) is safe.

done.

>
> > +
> > +    if ( dst_d )
> > +        put_domain(dst_d);
> > +
> > +    write_unlock(&currd->argo->lock);
> > +
> > + out_unlock:
> > +    read_unlock(&argo_lock);
> > +
> > + out:
> > +    return ret;
> > +}
> > +
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> > @@ -392,6 +926,38 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >
> >      switch (cmd)
> >      {
> > +    case XEN_ARGO_OP_register_ring:
> > +    {
> > +        XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd =
> > +            guest_handle_cast(arg1, xen_argo_register_ring_t);
> > +        XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd =
> > +            guest_handle_cast(arg2, xen_argo_page_descr_t);
> > +        /* arg3 is npage */
> > +        /* arg4 is flags */
> > +        bool fail_exist = arg4 & XEN_ARGO_REGISTER_FLAG_FAIL_EXIST;
> > +
> > +        if ( unlikely(arg3 > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> > +        {
> > +            rc = -EINVAL;
> > +            break;
> > +        }
> > +        /*
> > +         * Check access to the whole array here so we can use the faster __copy
> > +         * operations to read each element later.
> > +         */
> > +        if ( unlikely(!guest_handle_okay(pg_descr_hnd, arg3)) )
> > +            break;
> > +        /* arg4: reserve currently-undefined bits, require zero.  */
> > +        if ( unlikely(arg4 & ~XEN_ARGO_REGISTER_FLAG_MASK) )
> > +        {
> > +            rc = -EINVAL;
> > +            break;
> > +        }
> > +
> > +        rc = register_ring(currd, reg_hnd, pg_descr_hnd, arg3, fail_exist);
> > +        break;
> > +    }
> > +
> >      default:
> >          rc = -EOPNOTSUPP;
> >          break;
> > diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h
> > index 8997a1c..70e9a78 100644
> > --- a/xen/include/asm-arm/guest_access.h
> > +++ b/xen/include/asm-arm/guest_access.h
> > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
> >  /* Is the guest handle a NULL reference? */
> >  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> > +
> >  /* Offset the given guest handle into the array it refers to. */
> >  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
> >  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> > diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
> > index ca700c9..8dde5d5 100644
> > --- a/xen/include/asm-x86/guest_access.h
> > +++ b/xen/include/asm-x86/guest_access.h
> > @@ -41,6 +41,8 @@
> >  /* Is the guest handle a NULL reference? */
> >  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
> > +
> >  /* Offset the given guest handle into the array it refers to. */
> >  #define guest_handle_add_offset(hnd, nr) ((hnd).p += (nr))
> >  #define guest_handle_subtract_offset(hnd, nr) ((hnd).p -= (nr))
> > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> > index 4818684..8947230 100644
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -31,6 +31,26 @@
> >
> >  #include "xen.h"
> >
> > +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
>
> Is such a big size really required as the default maximum? The size of
> the internal structures required to support a 16GB ring would be quite
> big, has this been taken into account?

Yes, that was incorrect. The comment is now fixed. 16MB is much more
reasonable.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-10 11:57     ` Jan Beulich
@ 2019-01-11  6:30       ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  6:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	Roger Pau Monne, eric chanudet, Roger Pau Monné

On Thu, Jan 10, 2019 at 3:57 AM Jan Beulich <JBeulich@suse.com> wrote:
>
>  >>> On 10.01.19 at 12:24, <royger@gmail.com> wrote:
> > On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark <christopher.w.clark@gmail.com> wrote:
> >> +static long
> >> +register_ring(struct domain *currd,
> >
> > If this is indeed the current domain (as the name suggests), why do
> > you need to pass it around? Or else just name the parameter d.
>
> When all (or at least most) callers already latch the pointer into a
> local variable, handing it through is often cheaper than re-obtaining
> it as current->domain. ASSERT(currd == current->domain) might be
> worthwhile in such cases, though.

argument retained and ASSERTs added, thanks.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-10 12:53             ` Jan Beulich
@ 2019-01-11  6:37               ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  6:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper, Roger Pau Monné,
	Ian Jackson, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, xen-devel, Konrad Rzeszutek Wilk,
	eric chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 4:53 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 10.01.19 at 13:40, <royger@freebsd.org> wrote:
> > On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 10.01.19 at 13:01, <royger@freebsd.org> wrote:
> >> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> >> >>
> >> >> The second reason is about avoiding exposing the Xen virtual memory
> >> >> allocator directly to frequent guest-supplied size requests for
> >> >> contiguous regions (of up to 16GB).
> >> >
> >> > As said in another reply, I'm not sure allowing 16GB rings is safe.
> >> > The amount of internal memory required to track such rings is not
> >> > trivial given the arrays to store the mfns, the pages, and the virtual
> >> > mappings.
> >> >
> >> >> With single-page allocations to
> >> >> build a ring, fragmentation is not a problem, and mischief by a guest
> >> >> seems difficult.
> >> >
> >> > Hm, there's still a lot of big dynamic memory allocations in order to
> >> > support a 16GB ring, which makes me think that virtual address space
> >> > is not the only problem if you allow 16GB rings.
> >> >
> >> >> Changing it to issue requests for contiguous regions,
> >> >> with variable ring sizes up to the maximum of 16GB, it seems like
> >> >> significant fragmentation may be achievable. I don't know the
> >> >> practical impact of that but it seems worth avoiding. Are the other
> >> >> users of __vmap (or vmap) for multi-gigabyte regions only either
> >> >> boot-time, infrequent operations (livepatch), or for actions by
> >> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> >> >> a frequent operation somewhere else?
> >> >
> >> > I haven't checked, but I would be quite surprised to find any vmap
> >> > usage with such size (16GB). Maybe someone more familiar with the mm
> >> > subsystem can provide some insight here.
> >>
> >> And indeed the vmap range reserved in VA space is just 64GB (on
> >> x86) at present.
> >>
> >> >> Given the context above, and Jason's simplification to the
> >> >> memcpy_to_guest_ring function, plus the imminent merge freeze
> >> >> deadline, and the understanding that this loop and the related data
> >> >> structures supporting it have been tested and are working, would it be
> >> >> acceptable to omit making this contiguous mapping change from this
> >> >> current series?
> >> >
> >> > My opinion would be to just use vmap if it works, because that IMO
> >> > greatly simplifies the code by being able to have the whole ring
> >> > mapped at all the time. It would remove the iteration to copy
> >> > requests, and remove the usage of ring_map_page everywhere. That would
> >> > be my recommendation code-wise, but as said above someone more
> >> > familiar with the mm subsystem might have other opinion's about how to
> >> > deal with accesses to 16GB of guest memory, and indeed your iterative
> >> > solution might be the best approach.
> >>
> >> No-one can allocate 16GB physically contiguous memory.
> >
> > Right, my question/comment was whether it would make sense to limit
> > the size of the argos ring to something smaller and then use vmap to
> > map the whole ring in contiguous virtual space in order to ease
> > accesses.
>
> Whether you vmap() the ring in (page sized) pieces or in one blob is,
> for the purpose of the order of magnitude of VA space consumption,
> not overly relevant: You can't map more than at most three such
> gigantic rings anyway with the current VA layout. (In practice
> mapping individual pages would halve the effectively usable VA
> space, due to the guard pages inserted between regions.) IOW -
> the ring size should be bounded at a lower value anyway imo.
>
> > TBH, I'm not sure virtual address space is the only issue if argos
> > allows 16GB rings to be used. 16GB rings will consume a non-trivial
> > amount of memory for the internal argos state tracking structures
> > AFAICT.
>
> Fully agree. It has taken us ages to eliminate all runtime
> allocations of order > 0, and it looks like we'd be gaining some
> back here. I consider this tolerable as long as the feature is
> experimental, but it would need fixing for it to become fully
> supported. Christopher - annotating such code with fixme
> comments right away helps later spotting (and addressing) them.

Sorry for blowing this thread up with the ring size statement based on
the incorrect comment (and thanks, Eric, for checking it).
I don't think 16MB rings introduce anywhere near the level of concern
for internal state, but I'll look at the allocations and see if fixmes
are appropriate.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
  2019-01-10 21:41   ` Eric Chanudet
@ 2019-01-11  7:12     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-11  7:12 UTC (permalink / raw)
  To: Christopher Clark, xen-devel, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Konrad Rzeszutek Wilk,
	Paul Durrant, Roger Pau Monne, Stefano Stabellini, Tim Deegan,
	Wei Liu, Jason Andryuk, Rich Persaud, Ross Philipson,
	James McKenzie, Daniel Smith

On Thu, Jan 10, 2019 at 1:41 PM Eric Chanudet <eric.chanudet@gmail.com> wrote:
>
> On 06/01/19 at 11:42pm, Christopher Clark wrote:
> >+memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> >+                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> >+                     uint32_t len)
> >+{
> >+    unsigned int mfns_index = offset >> PAGE_SHIFT;
> >+    void *dst;
> >+    int ret;
> >+    unsigned int src_offset = 0;
> >+
> >+    ASSERT(spin_is_locked(&ring_info->lock));
> >+
> >+    offset &= ~PAGE_MASK;
> >+
> >+    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
> >+        return -EFAULT;
> With offset < PAGE_SIZE with the previous mask, shouldn't the sanity
> check be:
>     if (len + offset > XEN_ARGO_MAX_RING_SIZE)

Yes, that's correct - thanks.
I'll switch the len and offset arguments to unsigned int while at it.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-11  6:03     ` Christopher Clark
@ 2019-01-11  9:27       ` Roger Pau Monné
  2019-01-14  8:32         ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-11  9:27 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Fri, Jan 11, 2019 at 7:04 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> On Thu, Jan 10, 2019 at 2:19 AM Roger Pau Monné <royger@freebsd.org> wrote:
> >
> >  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> > > +/*
> > > + * Locking is organized as follows:
> > > + *
> > > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > > + *              W(<lock>) means taking a write lock on it.
> > > + *
> > > + * L1 : The global lock: argo_lock
> > > + * Protects the argo elements of all struct domain *d in the system.
> > > + * It does not protect any of the elements of d->argo, only their
> > > + * addresses.
> > > + *
> > > + * By extension since the destruction of a domain with a non-NULL
> > > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > > + * guarantees that no domains pointers that argo is interested in
> > > + * become invalid whilst this lock is held.
> > > + */
> > > +
> > > +static DEFINE_RWLOCK(argo_lock); /* L1 */
> >
> > You also add an argo_lock to each domain struct which doesn't seem to
> > be mentioned here at all.
>
> You're right! Thanks - that's a nice find. That lock is not used at all.
> I'd missed it since it's just not referenced anywhere in the argo.c file.
> I've removed it.
>
> > Shouldn't that lock be the one that protects d->argo? (instead of this global lock?)
>
> According the design that is in place at the moment, no, but
> I need to study that option a bit before I can comment further on
> whether it would make sense to add it in order to do so.
> I imagine not though because we're not looking to add any more locks.

I'm wondering why a global argo_lock shared with all domains is used
to protect d->argo, instead of using a per-domain lock (d->argo_lock
for example). This global argo_lock shared between all domains is
going to introduce contention with no specific benefit AFAICT.

I would recommend an initial implementation that uses a single
per-domain lock (ie: d->argo_lock) to protect the whole contents of
d->argo, and then go adding more fine grained locking as required,
providing evidence that such fine grainer locking is actually
improving performance (or required for some other reason). IMO, the
current locking scheme is overly complicated, and it's very hard for
me to reason about it's correctness.

> > > +/*
> > > + * L2 : The per-domain ring hash lock: d->argo->lock
> > > + * Holding a read lock on L2 protects the ring hash table and
> > > + * the elements in the hash_table d->argo->ring_hash, and
> > > + * the node and id fields in struct argo_ring_info in the
> > > + * hash table.
> > > + * Holding a write lock on L2 protects all of the elements of
> > > + * struct argo_ring_info.
> > > + *
> > > + * To take L2 you must already have R(L1). W(L1) implies W(L2) and L3.
> > > + *
> > > + * L3 : The ringinfo lock: argo_ring_info *ringinfo; ringinfo->lock
> > > + * Protects all the fields within the argo_ring_info, aside from the ones that
> > > + * L2 already protects: node, id, lock.
> > > + *
> > > + * To aquire L3 you must already have R(L2). W(L2) implies L3.
> > > + *
> > > + * Lsend : The per-domain single-sender partner rings lock: d->argo->send_lock
> > > + * Protects the per-domain send hash table : d->argo->send_hash
> > > + * and the elements in the hash table, and the node and id fields
> > > + * in struct argo_send_info in the hash table.
> > > + *
> > > + * To take Lsend, you must already have R(L1). W(L1) implies Lsend.
> > > + * Do not attempt to acquire a L2 on any domain after taking and while
> > > + * holding a Lsend lock -- acquire the L2 (if one is needed) beforehand.
> > > + *
> > > + * Lwildcard : The per-domain wildcard pending list lock: d->argo->wildcard_lock
> > > + * Protects the per-domain list of outstanding signals for space availability
> > > + * on wildcard rings.
> > > + *
> > > + * To take Lwildcard, you must already have R(L1). W(L1) implies Lwildcard.
> > > + * No other locks are acquired after obtaining Lwildcard.
> > > + */
> >
> > IMO I think the locking is overly complicated, and there's no
> > reasoning why so many locks are needed. Wouldn't it be enough to start
> > with a single lock that protects the whole d->argo existence and
> > contents?
> >
> > I would start with a very simple (as simple as possible) locking
> > structure and go improving from there if there are performance
> > bottlenecks.
>
> It definitely doesn't help when there's an extra lock lying around
> just to be confusing. Sorry.
>
> The locking discipline in this code is challenging and you are right that
> there hasn't a explanation given as to _why_ there are the locks that there
> are. I will fix that. I can also review the placement of the ASSERTs that
> check (and document) the locks within the code, if that helps.
>
> The current locking comments describe the how, but the why hasn't been
> covered so far and it is needed. The unreasonably-short version is: this
> code is *hot* when the communication paths are in use -- it operates the
> data path -- and there needs to be isolation for paths using rings from the
> potentially malicious or disruptive activities of other domains, or even
> other vcpus of the same domain operating other rings.

Yes, that’s fine, but as said above I wonder why for example a global
argo_lock is used to protect d->argo, instead of a per-domain lock. At
first sight this doesn’t look like the best approach performance wise.

> I am confident that the locking (that actually gets operated) is correct and
> justified though, and I hope that adding some new clear documentation for it
> can address this.

I’m not saying otherwise, but I cannot assert it either.

> > > +void
> > > +argo_soft_reset(struct domain *d)
> > > +{
> > > +    write_lock(&argo_lock);
> > > +
> > > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > > +
> > > +    if ( d->argo )
> > > +    {
> > > +        domain_rings_remove_all(d);
> > > +        partner_rings_remove(d);
> > > +        wildcard_rings_pending_remove(d);
> > > +
> > > +        if ( !opt_argo_enabled )
> > > +        {
> > > +            xfree(d->argo);
> > > +            d->argo = NULL;
> >
> > Can opt_argo_enabled change during runtime?
>
> Not at the moment, no. It should be made changeable
> later, but keeping it fixed assists with derisking this for
> release consideration.

Then if d->argo is set opt_argo_enabled must be true, and thus this
condition is never true?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-11  6:29     ` Christopher Clark
@ 2019-01-11  9:38       ` Roger Pau Monné
  0 siblings, 0 replies; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-11  9:38 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Fri, Jan 11, 2019 at 7:29 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> On Thu, Jan 10, 2019 at 3:25 AM Roger Pau Monné <royger@gmail.com> wrote:
> >
> >  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> > > +static int
> > > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void **out_ptr)
> > > +{
> > > +    if ( i >= ring_info->nmfns )
> > > +    {
> > > +        gprintk(XENLOG_ERR,
> > > +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of %u\n",
> > > +                ring_info->id.domain_id, ring_info->id.port,
> > > +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> > > +        return -ENOMEM;
> > > +    }
> > > +
> > > +    if ( !ring_info->mfns || !ring_info->mfn_mapping)
> > > +    {
> > > +        ASSERT_UNREACHABLE();
> > > +        ring_info->len = 0;
> > > +        return -ENOMEM;
> > > +    }
> > > +
> > > +    if ( !ring_info->mfn_mapping[i] )
> > > +    {
> > > +        /*
> > > +         * TODO:
> > > +         * The first page of the ring contains the ring indices, so both read
> > > +         * and write access to the page is required by the hypervisor, but
> > > +         * read-access is not needed for this mapping for the remainder of the
> > > +         * ring.
> > > +         * Since this mapping will remain resident in Xen's address space for
> > > +         * the lifetime of the ring, and following the principle of least
> > > +         * privilege, it could be preferable to:
> > > +         *  # add a XSM check to determine what policy is wanted here
> > > +         *  # depending on the XSM query, optionally create this mapping as
> > > +         *    _write-only_ on platforms that can support it.
> > > +         *    (eg. Intel EPT/AMD NPT).
> >
> > Why do Intel EPT or AMD NPT matter here?
>
> I think (though could be wrong and am open to correction here) that
> EPT and NPT enable the construction of write-only (ie not readable)
> memory mappings. Standard page tables can't do that: with those,
> if it's writable, it's also readable.

The hypervisor itself doesn't run on EPT or NPT second stage
translation, that's used exclusively for (HVM) guests. So even if
there's such support in EPT or NPT it's not relevant here. x86 page
tables don't have the capability to create write-only mappings.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
                     ` (3 preceding siblings ...)
  2019-01-10 16:16   ` Eric Chanudet
@ 2019-01-11 11:54   ` Jan Beulich
  2019-01-14  8:33     ` Christopher Clark
  2019-01-14 14:46   ` Wei Liu
  2019-01-14 14:58   ` Andrew Cooper
  6 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-11 11:54 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -17,7 +17,177 @@
>   */
>  
>  #include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/argo.h>
> +#include <xen/event.h>
> +#include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/time.h>
> +#include <public/argo.h>
> +
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> +
> +/* Xen command line option to enable argo */
> +static bool __read_mostly opt_argo_enabled;
> +boolean_param("argo", opt_argo_enabled);
> +
> +typedef struct argo_ring_id
> +{
> +    uint32_t port;

evtchn_port_t?

> +static void
> +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> +{
> +    unsigned int i;
> +
> +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> +           rw_is_write_locked(&argo_lock));
> +
> +    if ( !ring_info->mfns )
> +        return;
> +
> +    if ( !ring_info->mfn_mapping )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    ring_unmap(ring_info);
> +
> +    for ( i = 0; i < ring_info->nmfns; i++ )
> +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> +
> +    xfree(ring_info->mfns);
> +    ring_info->mfns = NULL;
> +    ring_info->npage = 0;
> +    xfree(ring_info->mfn_mapping);
> +    ring_info->mfn_mapping = NULL;
> +    ring_info->nmfns = 0;

While it shouldn't matter with locking in use, I generally would
consider it better if counts got set to zero before freeing the
arrays.

>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>             unsigned long arg4)
>  {
> -    return -ENOSYS;
> +    struct domain *currd = current->domain;
> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> +
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        return rc;
> +    }
> +
> +    domain_lock(currd);

What is the rationale for using the domain lock here? We're trying to
limit its use as much as possible, due to the otherwise heavy
contention which can result, as it may be held for comparably long
periods of time.

> +int
> +argo_init(struct domain *d)
> +{
> +    struct argo_domain *argo;
> +
> +    if ( !opt_argo_enabled )
> +    {
> +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> +        return 0;
> +    }
> +
> +    argo_dprintk("init: domid: %d\n", d->domain_id);
> +
> +    argo = xmalloc(struct argo_domain);
> +    if ( !argo )
> +        return -ENOMEM;
> +
> +    write_lock(&argo_lock);
> +
> +    argo_domain_init(argo);

I doubt the lock needs to be held for this function call.

> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -148,3 +148,5 @@
>  ?	flask_setenforce		xsm/flask_op.h
>  !	flask_sid_context		xsm/flask_op.h
>  ?	flask_transition		xsm/flask_op.h
> +?	argo_addr			argo.h
> +?	argo_ring			argo.h

Did I overlook the use of what these cause to be generated?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-11  9:27       ` Roger Pau Monné
@ 2019-01-14  8:32         ` Christopher Clark
  2019-01-14 11:32           ` Roger Pau Monné
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-14  8:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Fri, Jan 11, 2019 at 1:27 AM Roger Pau Monné <royger@freebsd.org> wrote:
> On Fri, Jan 11, 2019 at 7:04 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > On Thu, Jan 10, 2019 at 2:19 AM Roger Pau Monné <royger@freebsd.org> wrote:
> > >
> > >  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > > <christopher.w.clark@gmail.com> wrote:
> > > > +/*
> > > > + * Locking is organized as follows:
> > > > + *
> > > > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > > > + *              W(<lock>) means taking a write lock on it.
> > > > + *
> > > > + * L1 : The global lock: argo_lock
> > > > + * Protects the argo elements of all struct domain *d in the system.
> > > > + * It does not protect any of the elements of d->argo, only their
> > > > + * addresses.
> > > > + *
> > > > + * By extension since the destruction of a domain with a non-NULL
> > > > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > > > + * guarantees that no domains pointers that argo is interested in
> > > > + * become invalid whilst this lock is held.
> > > > + */
> > > > +
> > > > +static DEFINE_RWLOCK(argo_lock); /* L1 */
>
> I'm wondering why a global argo_lock shared with all domains is used
> to protect d->argo, instead of using a per-domain lock (d->argo_lock
> for example). This global argo_lock shared between all domains is
> going to introduce contention with no specific benefit AFAICT.

The granular locking structure is motivated by:
1) Performance isolation / DoS avoidance
2) Teardown of state across multiple domains on domain destroy
3) Performance via concurrent operation of rings

Using the single global lock avoids the need for sequencing the
acquisition of multiple individual per-domain locks (and lower level
data structure locks) to prevent deadlock: taking W(L1) grants access
to all and taking R(L1) ensures that teardown of any domain will not
interfere with any Argo hypercall operation. It supports using the
granular locks across domains without complicated or fragile lock
acquisition logic.

I've written a document about the locking to add to the tree with the
series, and a copy is at github here:

https://github.com/dozylynx/xen/blob/0cb95385eba696ecf4856075a524c5e528e60455/docs/misc/argo-locking.md

> I would recommend an initial implementation that uses a single
> per-domain lock (ie: d->argo_lock) to protect the whole contents of
> d->argo, and then go adding more fine grained locking as required,
> providing evidence that such fine grainer locking is actually
> improving performance (or required for some other reason). IMO, the
> current locking scheme is overly complicated, and it's very hard for
> me to reason about it's correctness.

I've now implemented some macros to describe and document locking
requirements in the code, and simplify ASSERTing the correct status.
The majority of functions have a single annotation at entry indicating
and verifying their locking status. They are described in the doc.

>
> > > > +void
> > > > +argo_soft_reset(struct domain *d)
> > > > +{
> > > > +    write_lock(&argo_lock);
> > > > +
> > > > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > > > +
> > > > +    if ( d->argo )
> > > > +    {
> > > > +        domain_rings_remove_all(d);
> > > > +        partner_rings_remove(d);
> > > > +        wildcard_rings_pending_remove(d);
> > > > +
> > > > +        if ( !opt_argo_enabled )
> > > > +        {
> > > > +            xfree(d->argo);
> > > > +            d->argo = NULL;
> > >
> > > Can opt_argo_enabled change during runtime?
> >
> > Not at the moment, no. It should be made changeable
> > later, but keeping it fixed assists with derisking this for
> > release consideration.
>
> Then if d->argo is set opt_argo_enabled must be true, and thus this
> condition is never true?

Yes; ack, have removed this logic.

I'm holding off on posting v4 of the series but my latest tree, with the
new macros applied, is on github at:

branch:
https://github.com/dozylynx/xen/tree/staging-argo-2019-01-13
main file within that branch:
https://github.com/dozylynx/xen/blob/staging-argo-2019-01-13/xen/common/argo.c

Given the limited time remaining for the merge for 4.12 under
consideration, if there are any further reservations, please let me
know.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-11 11:54   ` Jan Beulich
@ 2019-01-14  8:33     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-14  8:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Fri, Jan 11, 2019 at 3:54 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -17,7 +17,177 @@
> >   */
> >
> >  #include <xen/errno.h>
> > +#include <xen/sched.h>
> > +#include <xen/domain.h>
> > +#include <xen/argo.h>
> > +#include <xen/event.h>
> > +#include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/time.h>
> > +#include <public/argo.h>
> > +
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> > +
> > +/* Xen command line option to enable argo */
> > +static bool __read_mostly opt_argo_enabled;
> > +boolean_param("argo", opt_argo_enabled);
> > +
> > +typedef struct argo_ring_id
> > +{
> > +    uint32_t port;
>
> evtchn_port_t?

No; so to avoid the potential for confusion, I've renamed that (and other
places where 'port' was used) to 'aport', for "argo port", and added a
definition for the type: xen_argo_port_t as uint32_t, so the distinction is
clearer.

>
> > +static void
> > +ring_remove_mfns(const struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(rw_is_write_locked(&d->argo->lock) ||
> > +           rw_is_write_locked(&argo_lock));
> > +
> > +    if ( !ring_info->mfns )
> > +        return;
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    ring_unmap(ring_info);
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +        if ( !mfn_eq(ring_info->mfns[i], INVALID_MFN) )
> > +            put_page_and_type(mfn_to_page(ring_info->mfns[i]));
> > +
> > +    xfree(ring_info->mfns);
> > +    ring_info->mfns = NULL;
> > +    ring_info->npage = 0;
> > +    xfree(ring_info->mfn_mapping);
> > +    ring_info->mfn_mapping = NULL;
> > +    ring_info->nmfns = 0;
>
> While it shouldn't matter with locking in use, I generally would
> consider it better if counts got set to zero before freeing the
> arrays.

ack, done.

>
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    struct domain *currd = current->domain;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +    {
> > +        rc = -EOPNOTSUPP;
> > +        return rc;
> > +    }
> > +
> > +    domain_lock(currd);
>
> What is the rationale for using the domain lock here? We're trying to
> limit its use as much as possible, due to the otherwise heavy
> contention which can result, as it may be held for comparably long
> periods of time.

My inference is that was intended for avoiding interaction between the
hypercall ops and domain destroy, but it is not necessary. I've removed it.
Thanks.

>
> > +int
> > +argo_init(struct domain *d)
> > +{
> > +    struct argo_domain *argo;
> > +
> > +    if ( !opt_argo_enabled )
> > +    {
> > +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> > +        return 0;
> > +    }
> > +
> > +    argo_dprintk("init: domid: %d\n", d->domain_id);
> > +
> > +    argo = xmalloc(struct argo_domain);
> > +    if ( !argo )
> > +        return -ENOMEM;
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_domain_init(argo);
>
> I doubt the lock needs to be held for this function call.

ack; have reordered it to not do that.

>
> > --- a/xen/include/xlat.lst
> > +++ b/xen/include/xlat.lst
> > @@ -148,3 +148,5 @@
> >  ?    flask_setenforce                xsm/flask_op.h
> >  !    flask_sid_context               xsm/flask_op.h
> >  ?    flask_transition                xsm/flask_op.h
> > +?    argo_addr                       argo.h
> > +?    argo_ring                       argo.h
>
> Did I overlook the use of what these cause to be generated?

The last commit in the v3 series added a file to make use of it
but I've now melded that into the commit series now, so coverage
will be introduced as hypercall argument types are added in the
next series.

thanks,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14  8:32         ` Christopher Clark
@ 2019-01-14 11:32           ` Roger Pau Monné
  2019-01-14 14:28             ` Rich Persaud
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-14 11:32 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 9:32 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> On Fri, Jan 11, 2019 at 1:27 AM Roger Pau Monné <royger@freebsd.org> wrote:
> > On Fri, Jan 11, 2019 at 7:04 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> > >
> > > On Thu, Jan 10, 2019 at 2:19 AM Roger Pau Monné <royger@freebsd.org> wrote:
> > > >
> > > >  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > > > <christopher.w.clark@gmail.com> wrote:
> > > > > +/*
> > > > > + * Locking is organized as follows:
> > > > > + *
> > > > > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > > > > + *              W(<lock>) means taking a write lock on it.
> > > > > + *
> > > > > + * L1 : The global lock: argo_lock
> > > > > + * Protects the argo elements of all struct domain *d in the system.
> > > > > + * It does not protect any of the elements of d->argo, only their
> > > > > + * addresses.
> > > > > + *
> > > > > + * By extension since the destruction of a domain with a non-NULL
> > > > > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > > > > + * guarantees that no domains pointers that argo is interested in
> > > > > + * become invalid whilst this lock is held.
> > > > > + */
> > > > > +
> > > > > +static DEFINE_RWLOCK(argo_lock); /* L1 */
> >
> > I'm wondering why a global argo_lock shared with all domains is used
> > to protect d->argo, instead of using a per-domain lock (d->argo_lock
> > for example). This global argo_lock shared between all domains is
> > going to introduce contention with no specific benefit AFAICT.
>
> The granular locking structure is motivated by:
> 1) Performance isolation / DoS avoidance
> 2) Teardown of state across multiple domains on domain destroy
> 3) Performance via concurrent operation of rings
>
> Using the single global lock avoids the need for sequencing the
> acquisition of multiple individual per-domain locks (and lower level
> data structure locks) to prevent deadlock: taking W(L1) grants access
> to all and taking R(L1) ensures that teardown of any domain will not
> interfere with any Argo hypercall operation. It supports using the
> granular locks across domains without complicated or fragile lock
> acquisition logic.

I'm not sure such global lock is needed. Isn't it going to introduce a
non-trivial amount of contention?

Please bear with me, iff instead of the global lock a d->arg_lock is
used, is there any use-case where you would need to lock multiple
d->arg_locks in sequence? I'm not sure I see which use-case would
require this, since I expect you take the d->arg_lock, perform the
needed operations on that domain argo data/rings and move to the next
one.

> I've written a document about the locking to add to the tree with the
> series, and a copy is at github here:
>
> https://github.com/dozylynx/xen/blob/0cb95385eba696ecf4856075a524c5e528e60455/docs/misc/argo-locking.md

Thanks. It would have been better to send the contents of the document
to the list, so inline comments can be added. It's hard to comment on
the document now since it's only on github AFAICT.

As a general comment, I think the "Hierarchical Locking Model and
Protocol" is too complex, and the fact that there are multiple
interchangeable lock sequences to write to the rings for example is
not a good locking scheme because it makes it hard to reason about.

For example you can write to the ring by simply write-locking L1, or
by read-locking L1 and L2, and then locking the ring lock (L3). IMO I
would state that writing to the ring _always_ requires L3 to be
locked, regardless of the other locks. This makes it easier to reason
about locking correctness, and which locks protect what data in the
argos structure, if indeed so many different locks are required.

There are also several claims that fine-grainer locking provides
better performance in order to justify the need of such locks. IMO
without providing any evidence of such performance benefit it's hard
to be convinced so many locks are needed.

> > I would recommend an initial implementation that uses a single
> > per-domain lock (ie: d->argo_lock) to protect the whole contents of
> > d->argo, and then go adding more fine grained locking as required,
> > providing evidence that such fine grainer locking is actually
> > improving performance (or required for some other reason). IMO, the
> > current locking scheme is overly complicated, and it's very hard for
> > me to reason about it's correctness.
>
> I've now implemented some macros to describe and document locking
> requirements in the code, and simplify ASSERTing the correct status.
> The majority of functions have a single annotation at entry indicating
> and verifying their locking status. They are described in the doc.
>
> >
> > > > > +void
> > > > > +argo_soft_reset(struct domain *d)
> > > > > +{
> > > > > +    write_lock(&argo_lock);
> > > > > +
> > > > > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > > > > +
> > > > > +    if ( d->argo )
> > > > > +    {
> > > > > +        domain_rings_remove_all(d);
> > > > > +        partner_rings_remove(d);
> > > > > +        wildcard_rings_pending_remove(d);
> > > > > +
> > > > > +        if ( !opt_argo_enabled )
> > > > > +        {
> > > > > +            xfree(d->argo);
> > > > > +            d->argo = NULL;
> > > >
> > > > Can opt_argo_enabled change during runtime?
> > >
> > > Not at the moment, no. It should be made changeable
> > > later, but keeping it fixed assists with derisking this for
> > > release consideration.
> >
> > Then if d->argo is set opt_argo_enabled must be true, and thus this
> > condition is never true?
>
> Yes; ack, have removed this logic.
>
> I'm holding off on posting v4 of the series but my latest tree, with the
> new macros applied, is on github at:
>
> branch:
> https://github.com/dozylynx/xen/tree/staging-argo-2019-01-13
> main file within that branch:
> https://github.com/dozylynx/xen/blob/staging-argo-2019-01-13/xen/common/argo.c

IMO, I'm finding the patch series slightly difficult to review. For
example patch 4 introduces a full-blown argo_domain structure with all
the sub-structures and fields, which I'm not sure all are actually
used in that patch. It would be easier to review if a simpler set of
operations is first introduced, like starting by only allowing single
domain rings, and then introducing multicast rings.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-07  7:42 ` [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery Christopher Clark
@ 2019-01-14 12:57   ` Jan Beulich
  2019-01-17  7:22     ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-14 12:57 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> Argo doesn't use compat hypercall or argument translation but can use some
> of the infrastructure for validating the hypercall argument structures to
> ensure that the struct sizes, offsets and compositions don't vary between 32
> and 64bit, so add that here in a new dedicated source file for this purpose.
> 
> Some of the argo hypercall argument structures contain elements that are
> hypercall argument structure types themselves, and the standard compat
> structure validation does not handle this, since the types differ in compat
> vs. non-compat versions; so for some of the tests the exact-type-match check
> is replaced with a weaker, but still sufficient, sizeof check.

"Still sufficient" on what basis? Note that to date we didn't have to
make exceptions like this (iirc), so I'm not happy to see some appear.

> Then there are additional hypercall argument structures that contain
> elements that do not have a fixed size (last element, variable length array
> fields), so we have to then disable that size check too for validating those
> structures; the coverage of offset of elements is still retained.

There are prior cases of such as well; I'm not sure though if any
were actually in need of checking through these macros. Still I'd
like to better understand what it is that doesn't work in that case.
Quite possibly there's something that can be fixed in the scripts
(or elsewhere).

> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -70,7 +70,7 @@ obj-y += xmalloc_tlsf.o
>  obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
>  
>  
> -obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
> +obj-$(CONFIG_COMPAT) += $(addprefix compat/,argo.o domain.o kernel.o memory.o multicall.o xlat.o)

While a matter of taste to a certain degree, I'm not convinced
introducing a separate file for this is really necessary, especially
if some of the overrides to the CHECK_* macros would go away.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
                     ` (2 preceding siblings ...)
  2019-01-10 20:11   ` Eric Chanudet
@ 2019-01-14 14:19   ` Jan Beulich
  2019-01-15  7:56     ` Christopher Clark
  2019-01-14 15:31   ` Andrew Cooper
  4 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-14 14:19 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -23,16 +23,41 @@
>  #include <xen/event.h>
>  #include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/lib.h>
> +#include <xen/nospec.h>
>  #include <xen/time.h>
>  #include <public/argo.h>
>  
> +#define MAX_RINGS_PER_DOMAIN            128U
> +
> +/* All messages on the ring are padded to a multiple of the slot size. */
> +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))

Pointless outermost pair of parentheses.

> @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>  
> +/*
> + * This hash function is used to distribute rings within the per-domain
> + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> + * will provide a struct if a match is found with a 'argo_ring_id' key:
> + * ie. the key is a (domain id, port, partner domain id) tuple.
> + * Since port number varies the most in expected use, and the Linux driver
> + * allocates at both the high and low ends, incorporate high and low bits to
> + * help with distribution.
> + * Apply array_index_nospec as a defensive measure since this operates
> + * on user-supplied input and the array size that it indexes into is known.
> + */
> +static unsigned int
> +hash_index(const struct argo_ring_id *id)
> +{
> +    unsigned int hash;
> +
> +    hash = (uint16_t)(id->port >> 16);
> +    hash ^= (uint16_t)id->port;

I may have asked this before, but are the casts really needed
with ...

> +    hash ^= id->domain_id;
> +    hash ^= id->partner_id;
> +    hash &= (ARGO_HTABLE_SIZE - 1);

... the masking done here?

> +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);

With the masking above - is this really needed?

And then the question is whether the quality of the hash is
sufficient: There won't be more set bits in the result than
are in any of the three input values, so if they're all small,
higher hash table entries won't be used at all. I would
assume the goal to be that by the time 32 entities appear,
chances be good that at least about 30 of the hash table
entries are in use.

> @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
>      }
>  }
>  
> +static int
> +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void **out_ptr)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        gprintk(XENLOG_ERR,
> +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of %u\n",

ring_info->id.{domain,partner}_id look to be of the same type -
why once %u and once %d? Same elsewhere.

> +                ring_info->id.domain_id, ring_info->id.port,
> +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> +        return -ENOMEM;
> +    }

    i = array_index_nospec(i, ring_info->nmfns);

considering the array indexes here? Of course at this point only
zero can be passed in, but I assume this changes in later patches
and the index is at least indirectly guest controlled.

> @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
>      }
>  }
>  
> +static int
> +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)

So you have find_ring_mfn(), find_ring_mfns(), and ring_find_info().
Any chance you could use a consistent ordering of "ring" and "find"?
Or is there a reason behind the apparent mismatch?

> +{
> +    p2m_type_t p2mt;
> +    int ret = 0;
> +
> +#ifdef CONFIG_X86
> +    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> +#else
> +    *mfn = p2m_lookup(d, gfn, &p2mt);
> +#endif
> +
> +    if ( !mfn_valid(*mfn) )
> +        ret = -EINVAL;
> +#ifdef CONFIG_X86
> +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> +        ret = -EAGAIN;
> +#endif
> +    else if ( (p2mt != p2m_ram_rw) ||
> +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> +        ret = -EINVAL;
> +
> +#ifdef CONFIG_X86
> +    put_gfn(d, gfn_x(gfn));
> +#endif
> +
> +    return ret;
> +}

Please check whether you could leverage check_get_page_from_gfn()
here. If you can't, please at least take inspiration as to e.g. the
#ifdef-s from that function.

> +static int
> +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +               uint32_t npage,
> +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +               uint32_t len)

Noticing it here, but perhaps still an issue elsewhere as well: Didn't
we agree on removing unnecessary use of fixed width types? Or
was that in the context on an earlier patch of v3?

> +{
> +    unsigned int i;
> +    int ret = 0;
> +    mfn_t *mfns;
> +    uint8_t **mfn_mapping;
> +
> +    /*
> +     * first bounds check on npage here also serves as an overflow check
> +     * before left shifting it
> +     */
> +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||

Isn't this redundant with the check in do_argo_p()?

> +         ((npage << PAGE_SHIFT) < len) )
> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /* Ring already existed: drop the previous mapping. */
> +        gprintk(XENLOG_INFO,
> +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears mapping\n",

Indentation (also elsewhere).

> +                d->domain_id, ring_info->id.domain_id,
> +                ring_info->id.port, ring_info->id.partner_id);
> +
> +        ring_remove_mfns(d, ring_info);
> +        ASSERT(!ring_info->mfns);
> +    }
> +
> +    mfns = xmalloc_array(mfn_t, npage);
> +    if ( !mfns )
> +        return -ENOMEM;
> +
> +    for ( i = 0; i < npage; i++ )
> +        mfns[i] = INVALID_MFN;
> +
> +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> +    if ( !mfn_mapping )
> +    {
> +        xfree(mfns);
> +        return -ENOMEM;
> +    }
> +
> +    ring_info->npage = npage;
> +    ring_info->mfns = mfns;
> +    ring_info->mfn_mapping = mfn_mapping;

As the inverse to the cleanup sequence in an earlier patch: Please
set ->npage last here even if it doesn't strictly matter.

> +    ASSERT(ring_info->npage == npage);

What is this trying to make sure, seeing the assignment just a
few lines up?

> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;

Can this happen with the ring_remove_mfns() call above?

> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )

And hence can i start from other than zero here? And why not
use the (possibly cheaper to access) function argument "npage"
as the loop upper bound? The other similar loop a few lines up
is coded that simpler way.

> +static long
> +register_ring(struct domain *currd,
> +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +              uint32_t npage, bool fail_exist)
> +{
> +    xen_argo_register_ring_t reg;
> +    struct argo_ring_id ring_id;
> +    void *map_ringp;
> +    xen_argo_ring_t *ringp;
> +    struct argo_ring_info *ring_info;
> +    struct argo_send_info *send_info = NULL;
> +    struct domain *dst_d = NULL;
> +    int ret = 0;
> +    uint32_t private_tx_ptr;
> +
> +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> +    {
> +        ret = -EFAULT;
> +        goto out;
> +    }
> +
> +    /*
> +     * A ring must be large enough to transmit messages, so requires space for:
> +     * * 1 message header, plus
> +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> +     *   for the message payload to be written into, plus
> +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> +     *   single message -- see the logic in ringbuf_insert -- allowing for this
> +     *   ensures that there can be space remaining when a message is present.
> +     * The above determines the minimum acceptable ring size.
> +     */
> +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||

These two summands don't look to fulfill the "cannot be filled to
capacity" constraint the comment describes, as (aiui) messages
can be larger than 16 bytes. What's the deal?

> +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> +         (reg.pad != 0) )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ring_id.partner_id = reg.partner_id;
> +    ring_id.port = reg.port;
> +    ring_id.domain_id = currd->domain_id;
> +
> +    read_lock(&argo_lock);

From here to ...

> +    if ( !currd->argo )
> +    {
> +        ret = -ENODEV;
> +        goto out_unlock;
> +    }
> +
> +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> +    {
> +        if ( opt_argo_mac_enforcing )
> +        {
> +            ret = -EPERM;
> +            goto out_unlock;
> +        }
> +    }
> +    else
> +    {
> +        dst_d = get_domain_by_id(reg.partner_id);
> +        if ( !dst_d )
> +        {
> +            argo_dprintk("!dst_d, ESRCH\n");
> +            ret = -ESRCH;
> +            goto out_unlock;
> +        }
> +
> +        if ( !dst_d->argo )
> +        {
> +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +            ret = -ECONNREFUSED;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +
> +        send_info = xzalloc(struct argo_send_info);
> +        if ( !send_info )
> +        {
> +            ret = -ENOMEM;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +        send_info->id = ring_id;
> +    }

... here, what exactly is it that requires the global read lock
to be held ...

> +    write_lock(&currd->argo->lock);

... prior to this? Holding locks around allocations is not
forbidden, but should be avoided whenever possible.

And then further why does the global read lock need
continued holding until the end of the function?

> +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> +    {
> +        ret = -ENOSPC;
> +        goto out_unlock2;
> +    }
> +
> +    ring_info = ring_find_info(currd, &ring_id);
> +    if ( !ring_info )
> +    {
> +        ring_info = xzalloc(struct argo_ring_info);
> +        if ( !ring_info )
> +        {
> +            ret = -ENOMEM;
> +            goto out_unlock2;
> +        }
> +
> +        spin_lock_init(&ring_info->lock);
> +
> +        ring_info->id = ring_id;
> +        INIT_HLIST_HEAD(&ring_info->pending);
> +
> +        hlist_add_head(&ring_info->node,
> +                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> +
> +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +    }
> +    else
> +    {
> +        if ( ring_info->len )
> +        {

Please fold into "else if ( )", removing a level of indentation.

> +            /*
> +             * If the caller specified that the ring must not already exist,
> +             * fail at attempt to add a completed ring which already exists.
> +             */
> +            if ( fail_exist )
> +            {
> +                argo_dprintk("disallowed reregistration of existing ring\n");
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            if ( ring_info->len != reg.len )
> +            {
> +                /*
> +                 * Change of ring size could result in entries on the pending
> +                 * notifications list that will never trigger.
> +                 * Simple blunt solution: disallow ring resize for now.
> +                 * TODO: investigate enabling ring resize.
> +                 */
> +                gprintk(XENLOG_ERR,
> +                    "argo: vm%u attempted to change ring size(vm%u:%x vm%d)\n",
> +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> +                        ring_id.partner_id);
> +                /*
> +                 * Could return EINVAL here, but if the ring didn't already
> +                 * exist then the arguments would have been valid, so: EEXIST.
> +                 */
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            gprintk(XENLOG_DEBUG,
> +                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
> +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> +                    ring_id.partner_id);
> +        }
> +    }
> +
> +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +
> +    /*
> +     * The first page of the memory supplied for the ring has the xen_argo_ring
> +     * structure at its head, which is where the ring indexes reside.
> +     */
> +    ret = ring_map_page(ring_info, 0, &map_ringp);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +    ringp = map_ringp;
> +
> +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> +
> +    if ( (private_tx_ptr >= reg.len) ||
> +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> +    {
> +        /*
> +         * Since the ring is a mess, attempt to flush the contents of it
> +         * here by setting the tx_ptr to the next aligned message slot past
> +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> +         */
> +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> +
> +        if ( private_tx_ptr >= reg.len )
> +            private_tx_ptr = 0;
> +
> +        update_tx_ptr(ring_info, private_tx_ptr);
> +    }
> +
> +    ring_info->tx_ptr = private_tx_ptr;
> +    ring_info->len = reg.len;
> +    currd->argo->ring_count++;
> +
> +    if ( send_info )
> +    {
> +        spin_lock(&dst_d->argo->send_lock);
> +
> +        hlist_add_head(&send_info->node,
> +                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> +
> +        spin_unlock(&dst_d->argo->send_lock);
> +    }
> +
> + out_unlock2:
> +    if ( !ret && send_info )
> +        xfree(send_info);
> +
> +    if ( dst_d )
> +        put_domain(dst_d);
> +
> +    write_unlock(&currd->argo->lock);

Surely you can drop the lock before the other two cleanup
actions? That would then allow you to add another label to
absorb the two separate put_domain() calls on error paths.

> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>  
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))

This is unused throughout the patch afaics.

> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -31,6 +31,26 @@
>  
>  #include "xen.h"
>  
> +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x1000000 bytes.
> + * A byte index into the ring is at most 24 bits.
> + */
> +#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
> +
> +/*
> + * Page descriptor: encoding both page address and size in a 64-bit value.
> + * Intended to allow ABI to support use of different granularity pages.
> + * example of how to populate:
> + * xen_argo_page_descr_t pg_desc =
> + *      (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K;
> + */
> +typedef uint64_t xen_argo_page_descr_t;
> +#define XEN_ARGO_PAGE_DESCR_SIZE_MASK   0x0000000000000fffULL
> +#define XEN_ARGO_PAGE_DESCR_SIZE_4K     0

Are the _DESCR_ infixes here really useful?

> @@ -56,4 +76,56 @@ typedef struct xen_argo_ring
>  #endif
>  } xen_argo_ring_t;
>  
> +typedef struct xen_argo_register_ring
> +{
> +    uint32_t port;
> +    domid_t partner_id;
> +    uint16_t pad;
> +    uint32_t len;
> +} xen_argo_register_ring_t;
> +
> +/* Messages on the ring are padded to a multiple of this size. */
> +#define XEN_ARGO_MSG_SLOT_SIZE 0x10
> +
> +struct xen_argo_ring_message_header
> +{
> +    uint32_t len;
> +    xen_argo_addr_t source;
> +    uint32_t message_type;
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +    uint8_t data[];
> +#elif defined(__GNUC__)
> +    uint8_t data[0];
> +#endif
> +};
> +
> +/*
> + * Hypercall operations
> + */
> +
> +/*
> + * XEN_ARGO_OP_register_ring
> + *
> + * Register a ring using the indicated memory.
> + * Also used to reregister an existing ring (eg. after resume from hibernate).
> + *
> + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t)
> + * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t)
> + * arg3: unsigned long npages
> + * arg4: unsigned long flags

The "unsigned long"-s here are not necessarily compatible with
compat mode. At the very least flags above bit 31 won't be
usable by compat mode guests. Hence I also question ...

> + */
> +#define XEN_ARGO_OP_register_ring     1
> +
> +/* Register op flags */
> +/*
> + * Fail exist:
> + * If set, reject attempts to (re)register an existing established ring.
> + * If clear, reregistration occurs if the ring exists, with the new ring
> + * taking the place of the old, preserving tx_ptr if it remains valid.
> + */
> +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
> +
> +/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit */
> +#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL

... the UL suffix here. Also this last item should not be exposed
(perhaps framed by "#ifdef __XEN__") and would perhaps anyway
better be defined in terms of the other
XEN_ARGO_REGISTER_FLAG_*.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 11:32           ` Roger Pau Monné
@ 2019-01-14 14:28             ` Rich Persaud
  0 siblings, 0 replies; 104+ messages in thread
From: Rich Persaud @ 2019-01-14 14:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Christopher Clark, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 18420 bytes --]


> On Jan 14, 2019, at 06:32, Roger Pau Monné <royger@freebsd.org> wrote:
> 
> On Mon, Jan 14, 2019 at 9:32 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
>> 
>> I've written a document about the locking to add to the tree with the
>> series, and a copy is at github here:
>> 
>> https://github.com/dozylynx/xen/blob/0cb95385eba696ecf4856075a524c5e528e60455/docs/misc/argo-locking.md
> 
> Thanks. It would have been better to send the contents of the document
> to the list, so inline comments can be added. It's hard to comment on
> the document now since it's only on github AFAICT.

Here's an inline copy of the doc:

=== begin document ===

# Argo: Locking

## Introduction

Argo is an interdomain communication mechanism. It has requirements for performance isolation between domains, to prevent negative performance impact from malicious or disruptive activity of other domains, or even other vcpus of the same domain operating other rings.

Since Argo operates a data path between domains, sections of this code are *hot* when the communication paths are in use. To encourage high performance, a goal is to limit mutual exclusion to only where required and enable significant concurrency.

Avoidance of deadlock is essential and since state must frequently be updated that pertains to more than one domain, a locking protocol defines which locks are needed and the order of their acquistion.

## Structure

The granular locking structure of Argo enables:

1. Performance isolation of guests
2. Avoidance of DoS of rings by domains that are not authorized to send to them
3. Deadlock-free teardown of state across multiple domains on domain destroy
4. Performance of guests using Argo with concurrent operation of rings.

Argo uses three per-domain locks to protect three separate data structures.  Access to the ring_hash data structure is confined to domains that a ring-registering domain has authorized to send data via the ring.  The complete set of Argo locks is:

* Global : `L1_global_argo_rwlock`
* Per-domain: `rings_L2_rwlock`
* Per-domain: `send_L2_lock`
* Per-domain: `wildcard_L2_lock`
* Per-ring: `L3_lock`

## Protected State

The data structures being protected by the locks are all per-domain. The only global Argo state is the `L1_global_argo_rwlock` used to coordinate access to data structures of other domains.

### State: Rings registered and owned by a domain

This includes the state to run that ring, such as memory frame numbers and established mappings. Per-ring state is protected by its own lock, so that multiple VCPUs of the same domain operating different rings do not inhibit the performance of each other.

The per-domain ring state also includes the list of pending notifications for other domains that are waiting for ring space availability.

### State: Partner rings registered by other domains that this domain is the single allowed sender

This state belonging to the permitted sender is written to when a ring is registered by another domain. The lock that protects this state is subject to locking at arbitrary frequency by those foreign domains when registering rings -- which do not need any permission granted by this domain in order to register a ring to communicate with it --  so it must not inhibit the domain's own ability to use its own rings, to protect them from DoS. For this reason, this state is protected by its own lock.

### State: Pending notifications to this domain about wildcard rings registered by other domains

This data structure is needed when a domain is destroyed, to cancel the outstanding space availability notifications about the wildcard rings of other domains that this domain has queried.

Data is entered into this data structure by the domain that owns it, either by a space-inhibited sendv or a notify operation.

Data is removed from this data structure in one of three cases: when space becomes available in the destination ring and the notification is sent, when the ring is torn down, or when the awaiting domain is destroyed.

In the case where a notification is sent, access to the data structure is triggered by the ring owner domain, rather than the domain waiting for notification. This data structure is protected by its own lock since doing so entails less contention than the alternative of reusing an existing lock owned by the domain.

## Hierarchical Locking Model and Protocol

The locking discipline within the Argo code is heirarchical and utilizes reader/writer locks to enable increased concurrency when operations do not conflict. None of the Argo locks are reentrant.

The hierarchy:

* There is a global rwlock (`L1`) to protect access to all of the per-domain argo data structures. 
* There is a rwlock per-domain (`rings_L2`) to protect the hashtable of the per-ring data structures. 
* There is a lock per ring (`L3`) to protect the per-ring data structure, `struct argo_ring_info`. 

There are a two other per-domain L2 locks; their operation is similar and they are described later.

The protocol to safely acquire write access to the per-ring data structure, `struct argo_ring_info`, is:

1) Acquire a Read lock on L1.
2) Acquire a Read lock on L2.
3) Acquire L3.

An alternative valid sequence is:

1) Acquire a Read lock on L1.
2) Acquire a Write lock on L2.

This second sequence grants write access to _all_ of the `argo_ring_info` structs belonging to the domain, but at the expense of less concurrency: no other operation can access those structs while the locks are held, which will inhibit operations on those rings until the locks are released.

Another alternative valid sequence is:

1) Acquire a Write lock on L1.

This grants write access to _all_ of the `argo_ring_info` structs belonging to _all domains_, but again at the expense of far less concurrency: no other operation can operate on Argo rings until the locks are released.

## Lock Definitions

The full set of locks that are directly operated upon by the Argo code are described in the following section.

### The global singleton lock:

* `L1_global_argo_rwlock`

The rationale for having a global lock is to be able to enforce system-wide exclusion for a critical region and simplify the logic required to avoid deadlock, for teardown of state across multiple domains when a domain is destroyed.

The majority of operations take a read-lock on this lock, allowing concurrent Argo operations by many domains.

The pointer d->argo on every domain is protected by this lock. A set of more granular per-domain locks could be used to do that, but since domain start and stop is expected to be a far less frequent operation than the other argo operations, acquiring a single read lock to enable access to all the argo structs of all domains simplifies the protocol.

Points of write-locking on this lock:

* `argo_destroy`, where:
  * All of the domain's own rings are destroyed.
      * All of the notifications pending for other domains are cancelled.
   * All of the unicast partner rings owned by other domains for this domain to send to, are destroyed.
      * All of the notifications pending on those rings are cancelled.
   * All of the notifications pending for this domain on wildcard rings owned by other domains are cancelled.
* `argo_soft_reset`, for similar teardown operations as argo_destroy.
* `argo_init`, where the `d->argo` pointer is first populated.
  * Since the write lock is taken here, there is serialization all concurrent Argo operations around this single pointer write; this is the cost of using the simpler one global lock approach.

Enforcing that the write_lock is acquired on `L1_global_argo_rwlock` before executing teardown, ensures that no teardown operations act concurrently and no other Argo operations happen concurrently with a teardown. The teardown logic is free to safely modify the Argo state across all domains without having to acquire per-domain locks and deadlock cannot occur.

### Per-Domain: Ring hash lock

`rings_L2_rwlock`

Protects: the per-domain ring hash table of `argo_ring_info` structs.

Holding a read lock on `rings_L2` protects the ring hash table and the elements in the hash table `d->argo->ring_hash`, and the `node` and `id` fields in struct `argo_ring_info` in the hash table.

Holding a write lock on `rings_L2` protects all of the elements of all the struct `argo_ring_info` belonging to this domain.

To take `rings_L2` you must already have `R(L1)`. `W(L1)` implies `W(rings_L2)` and `L3`.

Prerequisites:

* `R(L1_global_argo_rwlock)` must be acquired before taking either read or write on `rings_L2_rwlock`.
* `W(L1_global_argo_rwlock)` implies `W(rings_L2_rwlock)`, so if `W(L1_global_argo_rwlock)` is held, then `rings_L2_rwlock` does not need to be acquired, and all the data structures that `rings_L2_rwlock` protects can be accessed as if `W(ring_L2_rwlock)` was held.

Is accessed by the hypervisor on behalf of:

* The domain that registered the ring.
* Any domain that is allowed to send to the ring -- so that's the partner domain, for unicast rings, or any domain, for wildcard rings.

### Send hash lock

`send_L2_lock`

Protects: the per-domain send hash table of `argo_send_info` structs.

Is accessed by the hypervisor on behalf of:

* Any domain that registers a ring that specifies the domain as the unicast sender.
* The domain that has been allowed to send, as part of teardown when the domain is being destroyed.


### Wildcard pending list lock

`wildcard_L2_lock`

Protects: the per-domain list of pending notifications to the domain from wildcard rings owned by other domains.

Is accessed by the hypervisor on behalf of:

* The domain that issued a query to another about space availability in one of its wildcard rings - this can be done by attempting a send operation when there is insufficient ring space available at the time.
* Any domain that the domain has issued a query to about space availability in one of their wildcard rings.

### Per-Ring locks:

* `L3_lock`

This lock protects the members of a `struct ring_info` which is the primary state for a domain's own registered ring.


## Reasoning Model

A common model for reasoning about concurrent code focusses on accesses to individual variables: if code touches this variable, see that it first acquires the corresponding lock and then drops it afterwards. A challenge with this model is in ensuring that the sequence of locks acquired within nested functions, when operating on data from multiple domains with concurrent operations, is safe from deadlock.

An alternative method that is better suited to the Argo software is to consider the execution path, the full sequence of locks acquired, accesses performed, and locks released, from entering an operation, to the completion of the work.

An example code path for an operation:

`[entry] > -- [ take R(L1) ] -- [ take R(L2) ] -- loop [ take a L3 / drop L3 ] --  [ drop R(L2) ] -- [ drop R(L1)] -- > [exit]`

If a function implements a section of the path, it is important to know not only what variables the function itself operates upon, but also the locking state that will already have been established at the point when the function is invoked, since this will affect what data the function can access. For this reason, comments in the code, or ASSERTs that explicitly check lock state, communicate what the locking state is expected and intended to be when that code is invoked. See the macros defined to support this for Argo later in this document.


## Macros to Validate and Document Lock State

These macros encode the logic to verify that the locking has adhered to the locking discipline.

eg. On entry to logic that requires holding at least `R(rings_L2)`, this:

`ASSERT(LOCKING_Read_rings_L2(d));`

checks that the lock state is sufficient, validating that one of the following must be true when executed:

`R(rings_L2) && R(L1)`
or:  `W(rings_L2) && R(L1)`
or:  `W(L1)`

The macros are defined thus:

```
/* RAW macros here are only used to assist defining the other macros below */
#define RAW_LOCKING_Read_L1 (rw_is_locked(&L1_global_argo_rwlock))
#define RAW_LOCKING_Read_rings_L2(d) \
  (rw_is_locked(&d->argo->rings_L2_rwlock) && RAW_LOCKING_Read_L1)

/* The LOCKING macros defined below here are for use at verification points */
#define LOCKING_Write_L1 (rw_is_write_locked(&L1_global_argo_rwlock))
#define LOCKING_Read_L1 (RAW_LOCKING_Read_L1 || LOCKING_Write_L1)

#define LOCKING_Write_rings_L2(d) \
  ((RAW_LOCKING_Read_L1 && rw_is_write_locked(&d->argo->rings_L2_rwlock)) || \
   LOCKING_Write_L1)

#define LOCKING_Read_rings_L2(d) \
  ((RAW_LOCKING_Read_L1 && rw_is_locked(&d->argo->rings_L2_rwlock)) || \
   LOCKING_Write_rings_L2(d) || LOCKING_Write_L1)

#define LOCKING_L3(d, r) \
  ((RAW_LOCKING_Read_rings_L2(d) && spin_is_locked(&r->L3_lock)) || \
   LOCKING_Write_rings_L2(d) || LOCKING_Write_L1)

#define LOCKING_send_L2(d) \
  ((RAW_LOCKING_Read_L1 && spin_is_locked(&d->argo->send_L2_lock)) || \
   LOCKING_Write_L1)
```

Here is an example of a macro in use:

```
static void
notify_ring(const struct domain *d, struct argo_ring_info *ring_info,
          struct hlist_head *to_notify)
{
  uint32_t space;

  ASSERT(LOCKING_Read_rings_L2(d));

  spin_lock(&ring_info->L3_lock);

  if ( ring_info->len )
      space = ringbuf_payload_space(d, ring_info);
  else
      space = 0;

  spin_unlock(&ring_info->L3_lock);

  if ( space )
      pending_find(d, ring_info, space, to_notify);
}

```

In the above example, it can be seen that it is safe to acquire the `L3` lock because _at least_ `R(rings_L2)` is already held, as documented and verified by the macro.

## Appendix:  FAQ / Other Considerations 

### Why not have a single per-domain lock?

Due to performance isolation / DoS avoidance: if there is a single per-domain lock, acquiring this lock will stall operations on other active rings owned by the domain. A malicious domain can loop registering and unregistering rings, without any consent by the targetted domain, which would experience decreased throughput due to the contention on the single per-domain lock. The granular locking structure of Argo prevents this. It also allows concurrent operation of different rings by multiple VCPUs of the same domain without contention, to avoid negative application performance interaction.

## Rationale for Using a Singleton Global Lock: L1

### Teardown on domain destroy

The single global lock enables exclusive access to the argo data structures across domains when a domain is destroyed. Every unicast ring that the dying domain is the authorized sender is torn down and any pending space-available notifications in other domain's wildcard rings are cancelled. This requires gaining safe access to the data structures on each of the domains involved.

The 'send hashtable' data structure is needed in order to perform the teardown of rings when a domain is destroyed. To populate it, whenever a unicast ring is registered, the lock that protects that data structure must be taken exclusively.

There are granular per-domain locks which protect the per-domain data structures. The global singleton L1 lock operates with-and-above the per-domain locks and is used to obtain exclusive access to multiple domain's argo data structures in the infrequent case where it is used -- for domain destroy -- whilst otherwise allowing concurrent access, via acquiring it with 'read' access, for the majority of the time.

To perform the required state teardown on domain destruction, which can require removing state from the data structures of multiple domains, a locking protocol to obtain mutual exclusion and safe access to the state is required, without deadlocking.

Using the single global lock avoids the need for sequencing the acquisition of multiple individual per-domain locks (and lower level data structure locks) to prevent deadlock: taking W(L1) grants access to all and taking R(L1) ensures that teardown of any domain will not interfere with any Argo hypercall operation. It enables introducing granular locking without complex or error-prone lock acquisition logic.

=== end document ===

> There are also several claims that fine-grainer locking provides
> better performance in order to justify the need of such locks. IMO
> without providing any evidence of such performance benefit it's hard
> to be convinced so many locks are needed.

Benchmarks would be useful for regression testing.  We can investigate resourcing for Argo synthetic benchmarks in the Xen 4.13 release cycle.  In the meantime, we can cite the shipment of Citrix XenClient in 2011, followed by customer production deployments of v4v in OpenXT and Bromium uXen (including HP business laptops).  Argo is derived from v4v.

Ian Pratt's PSEC 2018 presentation [1] on hypervisor security indirectly referenced v4v and uXen isolation/performance requirements, it may illustrate the scope of possible Argo use cases.  Click the "uXen" button below the video to navigate to the clip.  There's also a uXen source code link.  An excerpt (my annotations in []):

"PV device interfaces all built on simple hypervisor copy-based primitive 

We came up with a very simple primitive [v4v] for communication between VMs, between VMs and the host, and then just built everything on that primitive.  It's a simple copy-based primitive.  We didn't want any memory sharing of any kind.  Anything involving memory sharing ends up being complex ... Grant tables being a huge example ... a copy-based approach is just much simpler, and actually, performance-wise, it turns out being equivalent from a performance point of view ... 

Getting rid of other terrible ideas, like Xenstore ... we still want the primitive [v4v] interface to be able to support things like device reconnection.  That was a good idea, this idea that you can restart these driver domains, have VMs reconnect and be able to continue, that's very useful.  We want to enable this simple primitive [v4v] to do that.  Then just build everything using very simple, narrow interfaces."

[1]  https://www.platformsecuritysummit.com/2018/speaker/pratt/


Regards,
Rich

[-- Attachment #1.2: Type: text/html, Size: 25619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
                     ` (4 preceding siblings ...)
  2019-01-11 11:54   ` Jan Beulich
@ 2019-01-14 14:46   ` Wei Liu
  2019-01-14 15:29     ` Lars Kurth
                       ` (3 more replies)
  2019-01-14 14:58   ` Andrew Cooper
  6 siblings, 4 replies; 104+ messages in thread
From: Wei Liu @ 2019-01-14 14:46 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

Hi all

The locking scheme seems to be remaining sticking point. The rest are
mostly cosmetic issues (FAOD, they still need to be addressed).  Frankly
I don't think there is enough time to address all the technical details,
but let me sum up each side's position and see if we can reach an
amicable solution.

From maintainers and reviewers' point of view:

1. Maintainers / reviewers don't like complexity unless absolutely
   necessary.
2. Maintainers / reviewers feel they have a responsibility to understand
   the code and algorithm.

Yet being the gatekeepers doesn't necessarily mean we understand every
technical details and every usecase. We would like to, but most of the
time it is unrealistic.

Down to this specific patch series:

Roger thinks the locking scheme is too complex. Christopher argues
that's necessary for short-live channels to be performant.

Both have their point.

I think having a complex locking scheme is inevitable, just like we did
for performant grant table several years ago.  Regardless of the timing
issue we have at hand, asking Christopher to implement a stripped down
version creates more work for him.

Yet ignoring Roger's concerns is unfair to him as well, since he put in
so much time and effort to understand the algorithm and provide
suggestions. It is in fact unreasonable to ask anyone to fully
understand the locking mechanism and check the implementation is correct
in a few days (given the series was posted in Dec and there were major
holidays in between, plus everyone had other commitments).

To unblock this, how about we make Christopher maintainer of Argo? He
and OpenXT will be on the hook for further improvement. And I believe it
would be in their best interest to keep Argo bug-free and eventually
make it become supported.

So:

1. Make sure Argo is self-contained -- this requires careful review for
   interaction between Argo and other parts of the hypervisor.
2. Argo is going to be experimental and off-by-default -- this is the
   default status for new feature anyway.
3. Make Christopher maintainer of Argo -- this would be a natural thing
   to do anyway.

Does this work for everyone?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
                     ` (5 preceding siblings ...)
  2019-01-14 14:46   ` Wei Liu
@ 2019-01-14 14:58   ` Andrew Cooper
  2019-01-14 15:12     ` Jan Beulich
  2019-01-15  7:21     ` Christopher Clark
  6 siblings, 2 replies; 104+ messages in thread
From: Andrew Cooper @ 2019-01-14 14:58 UTC (permalink / raw)
  To: Christopher Clark, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Jason Andryuk,
	Ian Jackson, Rich Persaud, James McKenzie, Daniel Smith,
	Julien Grall, Paul Durrant, Jan Beulich, Eric Chanudet,
	Roger Pau Monne

On 07/01/2019 07:42, Christopher Clark wrote:
> Initialises basic data structures and performs teardown of argo state
> for domain shutdown.
>
> Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
>
> Introduces a new Xen command line parameter 'argo': bool to enable/disable
> the argo hypercall. Defaults to disabled.
>
> New headers:
>   public/argo.h: with definions of addresses and ring structure, including
>   indexes for atomic update for communication between domain and hypervisor.
>
>   xen/argo.h: to expose the hooks for integration into domain lifecycle:
>     argo_init: per-domain init of argo data structures for domain_create.
>     argo_destroy: teardown for domain_destroy and the error exit
>                   path of domain_create.
>     argo_soft_reset: reset of domain state for domain_soft_reset.
>
> Adds two new fields to struct domain:
>     rwlock_t argo_lock;
>     struct argo_domain *argo;
>
> In accordance with recent work on _domain_destroy, argo_destroy is
> idempotent. It will tear down: all rings registered by this domain, all
> rings where this domain is the single sender (ie. specified partner,
> non-wildcard rings), and all pending notifications where this domain is
> awaiting signal about available space in the rings of other domains.
>
> A count will be maintained of the number of rings that a domain has
> registered in order to limit it below the fixed maximum limit defined here.
>
> The software license on the public header is the BSD license, standard
> procedure for the public Xen headers. The public header was originally
> posted under a GPL license at: [1]:
> https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
>
> The following ACK by Lars Kurth is to confirm that only people being
> employees of Citrix contributed to the header files in the series posted at
> [1] and that thus the copyright of the files in question is fully owned by
> Citrix. The ACK also confirms that Citrix is happy for the header files to
> be published under a BSD license in this series (which is based on [1]).
>
> Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> Acked-by: Lars Kurth <lars.kurth@citrix.com>

I hope I've not trodden on the toes of any other reviews.  I've got some
minor requests, but hopefully its all fairly trivial.

> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index a755a67..aea13eb 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>  in combination with cpuidle.  This option is only expected to be useful for
>  developers wishing Xen to fall back to older timing methods on newer hardware.
>  
> +### argo
> +> `= <boolean>`
> +
> +> Default: `false`
> +
> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> +
> +This allows domains access to the Argo hypercall, which supports registration
> +of memory rings with the hypervisor to receive messages, sending messages to
> +other domains by hypercall and querying the ring status of other domains.

Please do include a note about CONFIG_ARGO.  I know this doc is
inconsistent on the matter (as Kconfig postdates the written entries
here), but I have been trying to fix up, and now about half of the
documentation does mention appropriate Kconfig information.

> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 6f782f7..86195d3 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
>  long
>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,

I know I'm commenting on the wrong patch, but please use unsigned long
cmd, so the type definition here doesn't truncate the caller provided
value.  We have similar buggy code all over Xen, but its too late to fix
that, and I'd prefer not to propagate the error.

>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>             unsigned long arg4)
>  {
> -    return -ENOSYS;
> +    struct domain *currd = current->domain;
> +    long rc = -EFAULT;
> +
> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);

For debugging purposes, you don't want to truncate any of these values,
or you'll have a print message which doesn't match what the guest
provided.  I'd use %ld for arg3 and arg4.

> +
> +    if ( unlikely(!opt_argo_enabled) )
> +    {
> +        rc = -EOPNOTSUPP;

Shouldn't this be ENOSYS instead?  There isn't a conceptual difference
between CONFIG_ARGO compiled out, and opt_argo clear on the command
line, and I don't think a guest should be able to tell the difference.

> +        return rc;
> +    }
> +
> +    domain_lock(currd);
> +
> +    switch (cmd)
> +    {
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    domain_unlock(currd);
> +
> +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> +
> +    return rc;
> +}
> +
> +static void
> +argo_domain_init(struct argo_domain *argo)
> +{
> +    unsigned int i;
> +
> +    rwlock_init(&argo->lock);
> +    spin_lock_init(&argo->send_lock);
> +    spin_lock_init(&argo->wildcard_lock);
> +    argo->ring_count = 0;
> +
> +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> +    {
> +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> +    }
> +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> +}
> +
> +int
> +argo_init(struct domain *d)

Are there any per-vcpu argo resources?  I don't see any in the series,
but I'd be tempted to name the external functions as
argo_domain_{init,destroy}() which is slightly clearer in the caller
context.

> +{
> +    struct argo_domain *argo;
> +
> +    if ( !opt_argo_enabled )
> +    {
> +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> +        return 0;
> +    }
> +
> +    argo_dprintk("init: domid: %d\n", d->domain_id);
> +
> +    argo = xmalloc(struct argo_domain);

For sanity sake, I'd suggest xzalloc() here, not that I can spot
anything wrong with the current code.

> +    if ( !argo )
> +        return -ENOMEM;
> +
> +    write_lock(&argo_lock);
> +
> +    argo_domain_init(argo);

This call doesn't need to be within the global argo_lock critical
region, because it exclusively operates on state which is inaccessible
to the rest of the system until d->argo is written.  This then shrinks
the critical region to a single pointer write.  (Further, with a patch I
haven't posted yet, the memset(0) in zxalloc() can be write-merged with
the setup code to avoid repeated writes, which can't happen with a
spinlock in between.)

> +
> +    d->argo = argo;
> +
> +    write_unlock(&argo_lock);
> +
> +    return 0;
> +}
> +
> +void
> +argo_destroy(struct domain *d)

Is this function fully idempotent?  Given its current calling context,
it needs to be.  (I think it is, but I just want to double check,
because it definitely wants to be.)

I ask, because...

> +{
> +    BUG_ON(!d->is_dying);
> +
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +        xfree(d->argo);
> +        d->argo = NULL;
> +    }
> +    write_unlock(&argo_lock);
> +}
> +
> +void
> +argo_soft_reset(struct domain *d)
> +{
> +    write_lock(&argo_lock);
> +
> +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> +
> +    if ( d->argo )
> +    {
> +        domain_rings_remove_all(d);
> +        partner_rings_remove(d);
> +        wildcard_rings_pending_remove(d);
> +
> +        if ( !opt_argo_enabled )
> +        {
> +            xfree(d->argo);
> +            d->argo = NULL;
> +        }
> +        else
> +            argo_domain_init(d->argo);
> +    }
> +
> +    write_unlock(&argo_lock);
>  }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index c623dae..9596840 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -32,6 +32,7 @@
>  #include <xen/grant_table.h>
>  #include <xen/xenoprof.h>
>  #include <xen/irq.h>
> +#include <xen/argo.h>
>  #include <asm/debugger.h>
>  #include <asm/p2m.h>
>  #include <asm/processor.h>
> @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
>  
>      xfree(d->pbuf);
>  
> +#ifdef CONFIG_ARGO
> +    argo_destroy(d);
> +#endif

... given this call (which is correct), ...

> +
>      rangeset_domain_destroy(d);
>  
>      free_cpumask_var(d->dirty_cpumask);
> @@ -376,6 +381,9 @@ struct domain *domain_create(domid_t domid,
>      spin_lock_init(&d->hypercall_deadlock_mutex);
>      INIT_PAGE_LIST_HEAD(&d->page_list);
>      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
> +#ifdef CONFIG_ARGO
> +    rwlock_init(&d->argo_lock);
> +#endif
>  
>      spin_lock_init(&d->node_affinity_lock);
>      d->node_affinity = NODE_MASK_ALL;
> @@ -445,6 +453,11 @@ struct domain *domain_create(domid_t domid,
>              goto fail;
>          init_status |= INIT_gnttab;
>  
> +#ifdef CONFIG_ARGO
> +        if ( (err = argo_init(d)) != 0 )
> +            goto fail;
> +#endif
> +
>          err = -ENOMEM;
>  
>          d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
> @@ -717,6 +730,9 @@ int domain_kill(struct domain *d)
>          if ( d->is_dying != DOMDYING_alive )
>              return domain_kill(d);
>          d->is_dying = DOMDYING_dying;
> +#ifdef CONFIG_ARGO
> +        argo_destroy(d);
> +#endif

... this one isn't necessary.

I'm in the middle of fixing all this destruction logic, and
_domain_destroy() is called below.

The rule is that everything in _domain_destroy() should be idempotent,
and all destruction logic needs moving there, so I can remove
DOMCTL_setmaxvcpus and fix a load of toolstack-triggerable NULL pointer
dereferences in Xen.

Eventually, everything in this hunk will disappear.

>          evtchn_destroy(d);
>          gnttab_release_mappings(d);
>          tmem_destroy(d->tmem_client);
> diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
> new file mode 100644
> index 0000000..29d32a9
> --- /dev/null
> +++ b/xen/include/xen/argo.h
> @@ -0,0 +1,23 @@
> +/******************************************************************************
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#ifndef __XEN_ARGO_H__
> +#define __XEN_ARGO_H__
> +
> +int argo_init(struct domain *d);
> +void argo_destroy(struct domain *d);
> +void argo_soft_reset(struct domain *d);

Instead of the #ifdefary in the calling code, please could you stub
these out in this file?  See the tail of include/asm-x86/pv/domain.h for
an example based on CONFIG_PV.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 08/15] argo: implement the unregister op
  2019-01-07  7:42 ` [PATCH v3 08/15] argo: implement the unregister op Christopher Clark
  2019-01-10 11:40   ` Roger Pau Monné
@ 2019-01-14 15:06   ` Jan Beulich
  2019-01-15  8:11     ` Christopher Clark
  1 sibling, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-14 15:06 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> @@ -666,6 +667,105 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
>      return NULL;
>  }
>  
> +static struct argo_send_info *
> +send_find_info(const struct domain *d, const struct argo_ring_id *id)

As per the comment on patch 7, perhaps find_send_info()?

> +{
> +    struct hlist_node *node;

const?

> +static long
> +unregister_ring(struct domain *currd,
> +                XEN_GUEST_HANDLE_PARAM(xen_argo_unregister_ring_t) unreg_hnd)
> +{
> +    xen_argo_unregister_ring_t unreg;
> +    struct argo_ring_id ring_id;
> +    struct argo_ring_info *ring_info;
> +    struct argo_send_info *send_info;
> +    struct domain *dst_d = NULL;
> +    int ret;
> +
> +    ret = copy_from_guest(&unreg, unreg_hnd, 1) ? -EFAULT : 0;
> +    if ( ret )
> +        goto out;
> +
> +    ret = unreg.pad ? -EINVAL : 0;
> +    if ( ret )
> +        goto out;
> +
> +    ring_id.partner_id = unreg.partner_id;
> +    ring_id.port = unreg.port;
> +    ring_id.domain_id = currd->domain_id;
> +
> +    read_lock(&argo_lock);
> +
> +    if ( !currd->argo )
> +    {
> +        ret = -ENODEV;
> +        goto out_unlock;
> +    }
> +
> +    write_lock(&currd->argo->lock);
> +
> +    ring_info = ring_find_info(currd, &ring_id);
> +    if ( ring_info )
> +    {
> +        ring_remove_info(currd, ring_info);
> +        currd->argo->ring_count--;
> +    }
> +
> +    dst_d = get_domain_by_id(ring_id.partner_id);
> +    if ( dst_d )
> +    {
> +        if ( dst_d->argo )
> +        {
> +            spin_lock(&dst_d->argo->send_lock);
> +
> +            send_info = send_find_info(dst_d, &ring_id);
> +            if ( send_info )
> +            {
> +                hlist_del(&send_info->node);
> +                xfree(send_info);
> +            }
> +
> +            spin_unlock(&dst_d->argo->send_lock);

As per the comment to an earlier patch, if at all possible call
allocation (and hence also freeing) functions with as little
locks held as possible. Pulling it out of the innermost lock
here looks straightforward at least.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 14:58   ` Andrew Cooper
@ 2019-01-14 15:12     ` Jan Beulich
  2019-01-15  7:24       ` Christopher Clark
  2019-01-15  7:21     ` Christopher Clark
  1 sibling, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-14 15:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, ross.philipson, Jason Andryuk,
	Daniel Smith, Tim Deegan, Konrad Rzeszutek Wilk, Ian Jackson,
	Christopher Clark, Rich Persaud, James McKenzie, George Dunlap,
	Julien Grall, Paul Durrant, xen-devel, eric chanudet,
	Roger Pau Monne

>>> On 14.01.19 at 15:58, <andrew.cooper3@citrix.com> wrote:
> On 07/01/2019 07:42, Christopher Clark wrote:
>> --- a/xen/common/argo.c
>> +++ b/xen/common/argo.c
>>  long
>>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> 
> I know I'm commenting on the wrong patch, but please use unsigned long
> cmd, so the type definition here doesn't truncate the caller provided
> value.  We have similar buggy code all over Xen, but its too late to fix
> that, and I'd prefer not to propagate the error.

Why buggy? It all depends on how the interface is specified. If
the input is 32 bits wide, it is clear that higher bits are supposed
to be ignored. Nothing says that the full register width is
significant.

>>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
>>             unsigned long arg4)
>>  {
>> -    return -ENOSYS;
>> +    struct domain *currd = current->domain;
>> +    long rc = -EFAULT;
>> +
>> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
>> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> 
> For debugging purposes, you don't want to truncate any of these values,
> or you'll have a print message which doesn't match what the guest
> provided.  I'd use %ld for arg3 and arg4.

Perhaps better %lx, for the output being easier to recognize
for both bitmaps (e.g. flag values) and sufficiently large values.

>> +
>> +    if ( unlikely(!opt_argo_enabled) )
>> +    {
>> +        rc = -EOPNOTSUPP;
> 
> Shouldn't this be ENOSYS instead?  There isn't a conceptual difference
> between CONFIG_ARGO compiled out, and opt_argo clear on the command
> line, and I don't think a guest should be able to tell the difference.

I admit it's a boundary case, but I think ENOSYS should strictly
only ever be (and have been) returned for unrecognized major
hypercall numbers.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 14:46   ` Wei Liu
@ 2019-01-14 15:29     ` Lars Kurth
  2019-01-14 18:16     ` Christopher Clark
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 104+ messages in thread
From: Lars Kurth @ 2019-01-14 15:29 UTC (permalink / raw)
  To: Wei Liu
  Cc: Juergen Gross, Julien Grall, Stefano Stabellini, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Christopher Clark, Tim (Xen.org),
	Daniel Smith, Rich Persaud, Paul Durrant, 'Jan Beulich',
	xen-devel, James McKenzie, Eric Chanudet, Roger Pau Monne

Adding Juergen,

to make sure he is aware of the discussion, as this is an excellent summary of the status of this series.

> On 14 Jan 2019, at 14:46, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> Hi all
> 
> The locking scheme seems to be remaining sticking point. The rest are
> mostly cosmetic issues (FAOD, they still need to be addressed).  Frankly
> I don't think there is enough time to address all the technical details,
> but let me sum up each side's position and see if we can reach an
> amicable solution.

snip

> To unblock this, how about we make Christopher maintainer of Argo? He
> and OpenXT will be on the hook for further improvement. And I believe it
> would be in their best interest to keep Argo bug-free and eventually
> make it become supported.
> 
> So:
> 
> 1. Make sure Argo is self-contained -- this requires careful review for
>   interaction between Argo and other parts of the hypervisor.
> 2. Argo is going to be experimental and off-by-default -- this is the
>   default status for new feature anyway.
> 3. Make Christopher maintainer of Argo -- this would be a natural thing
>   to do anyway.
> 
> Does this work for everyone?

I think this is a good way forward. Thank you Wei for putting this mail together.

Best Regards
Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
                     ` (3 preceding siblings ...)
  2019-01-14 14:19   ` Jan Beulich
@ 2019-01-14 15:31   ` Andrew Cooper
  2019-01-15  8:02     ` Christopher Clark
  4 siblings, 1 reply; 104+ messages in thread
From: Andrew Cooper @ 2019-01-14 15:31 UTC (permalink / raw)
  To: Christopher Clark, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Jason Andryuk,
	Ian Jackson, Rich Persaud, James McKenzie, Daniel Smith,
	Julien Grall, Paul Durrant, Jan Beulich, Eric Chanudet,
	Roger Pau Monne

On 07/01/2019 07:42, Christopher Clark wrote:
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index aea13eb..68d4415 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which supports registration
>  of memory rings with the hypervisor to receive messages, sending messages to
>  other domains by hypercall and querying the ring status of other domains.
>  
> +### argo-mac
> +> `= permissive | enforcing`

Are these command line options already in use in the OpenXT community?

I ask, because we are trying to avoid gaining multiple top level options
for related functionatliy.

IMO, this functionality could be covered more succinctly with:

  argo = List of [ <bool>, mac = permissive | enforcing ]

which also allows for cleaner addition of future options.

(Unfortunately, to implement this, you need my cmdline_strcmp() fixes,
which are still pending an ack.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 14:46   ` Wei Liu
  2019-01-14 15:29     ` Lars Kurth
@ 2019-01-14 18:16     ` Christopher Clark
  2019-01-14 19:42     ` Roger Pau Monné
  2019-02-04 20:56     ` Christopher Clark
  3 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-14 18:16 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, Ross Philipson, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Jason Andryuk, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 6:47 AM Wei Liu <wei.liu2@citrix.com> wrote:
>
> Hi all
>
> The locking scheme seems to be remaining sticking point. The rest are
> mostly cosmetic issues (FAOD, they still need to be addressed).  Frankly
> I don't think there is enough time to address all the technical details,
> but let me sum up each side's position and see if we can reach an
> amicable solution.
>
> From maintainers and reviewers' point of view:
>
> 1. Maintainers / reviewers don't like complexity unless absolutely
>    necessary.
> 2. Maintainers / reviewers feel they have a responsibility to understand
>    the code and algorithm.
>
> Yet being the gatekeepers doesn't necessarily mean we understand every
> technical details and every usecase. We would like to, but most of the
> time it is unrealistic.
>
> Down to this specific patch series:
>
> Roger thinks the locking scheme is too complex. Christopher argues
> that's necessary for short-live channels to be performant.
>
> Both have their point.
>
> I think having a complex locking scheme is inevitable, just like we did
> for performant grant table several years ago.  Regardless of the timing
> issue we have at hand, asking Christopher to implement a stripped down
> version creates more work for him.
>
> Yet ignoring Roger's concerns is unfair to him as well, since he put in
> so much time and effort to understand the algorithm and provide
> suggestions. It is in fact unreasonable to ask anyone to fully
> understand the locking mechanism and check the implementation is correct
> in a few days (given the series was posted in Dec and there were major
> holidays in between, plus everyone had other commitments).
>
> To unblock this, how about we make Christopher maintainer of Argo? He
> and OpenXT will be on the hook for further improvement. And I believe it
> would be in their best interest to keep Argo bug-free and eventually
> make it become supported.
>
> So:
>
> 1. Make sure Argo is self-contained -- this requires careful review for
>    interaction between Argo and other parts of the hypervisor.
> 2. Argo is going to be experimental and off-by-default -- this is the
>    default status for new feature anyway.
> 3. Make Christopher maintainer of Argo -- this would be a natural thing
>    to do anyway.
>
> Does this work for everyone?

Yes, this will work for me. Thanks for the summary and proposal.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 14:46   ` Wei Liu
  2019-01-14 15:29     ` Lars Kurth
  2019-01-14 18:16     ` Christopher Clark
@ 2019-01-14 19:42     ` Roger Pau Monné
  2019-02-04 20:56     ` Christopher Clark
  3 siblings, 0 replies; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-14 19:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: Julien Grall, Stefano Stabellini, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Christopher Clark, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 3:48 PM Wei Liu <wei.liu2@citrix.com> wrote:
>
> Hi all
>
> The locking scheme seems to be remaining sticking point. The rest are
> mostly cosmetic issues (FAOD, they still need to be addressed).  Frankly
> I don't think there is enough time to address all the technical details,
> but let me sum up each side's position and see if we can reach an
> amicable solution.
>
> From maintainers and reviewers' point of view:
>
> 1. Maintainers / reviewers don't like complexity unless absolutely
>    necessary.
> 2. Maintainers / reviewers feel they have a responsibility to understand
>    the code and algorithm.
>
> Yet being the gatekeepers doesn't necessarily mean we understand every
> technical details and every usecase. We would like to, but most of the
> time it is unrealistic.
>
> Down to this specific patch series:
>
> Roger thinks the locking scheme is too complex. Christopher argues
> that's necessary for short-live channels to be performant.
>
> Both have their point.
>
> I think having a complex locking scheme is inevitable, just like we did
> for performant grant table several years ago.  Regardless of the timing
> issue we have at hand, asking Christopher to implement a stripped down
> version creates more work for him.
>
> Yet ignoring Roger's concerns is unfair to him as well, since he put in
> so much time and effort to understand the algorithm and provide
> suggestions. It is in fact unreasonable to ask anyone to fully
> understand the locking mechanism and check the implementation is correct
> in a few days (given the series was posted in Dec and there were major
> holidays in between, plus everyone had other commitments).
>
> To unblock this, how about we make Christopher maintainer of Argo? He
> and OpenXT will be on the hook for further improvement. And I believe it
> would be in their best interest to keep Argo bug-free and eventually
> make it become supported.
>
> So:
>
> 1. Make sure Argo is self-contained -- this requires careful review for
>    interaction between Argo and other parts of the hypervisor.
> 2. Argo is going to be experimental and off-by-default -- this is the
>    default status for new feature anyway.
> 3. Make Christopher maintainer of Argo -- this would be a natural thing
>    to do anyway.
>
> Does this work for everyone?

I think this is fine.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 10/15] argo: implement the notify op
  2019-01-10 12:21   ` Roger Pau Monné
@ 2019-01-15  6:53     ` Christopher Clark
  2019-01-15  8:06       ` Roger Pau Monné
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  6:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 4:22 AM Roger Pau Monné <royger@freebsd.org> wrote:
>
>  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > Queries for data about space availability in registered rings and
> > causes notification to be sent when space has become available.
> >
> > The hypercall op populates a supplied data structure with information about
> > ring state, and if insufficent space is currently available in a given ring,
>
> insufficient

ack, fixed

> > the hypervisor will record the domain's expressed interest and notify it
> > when it observes that space has become available.
> >
> > Checks for free space occur when this notify op is invoked, so it may be
> > intentionally invoked with no data structure to populate
> > (ie. a NULL argument) to trigger such a check and consequent notifications.
> >
> > Limit the maximum number of notify requests in a single operation to a
> > simple fixed limit of 256.

> >
> > +static struct argo_ring_info *
> > +ring_find_info(const struct domain *d, const struct argo_ring_id *id);
> > +
> > +static struct argo_ring_info *
> > +ring_find_info_by_match(const struct domain *d, uint32_t port,
> > +                        domid_t partner_id);
>
> Can you place the static functions such that you don't need prototypes for them?

Ack, yes. These have now been pulled to the top of the file and the
prototypes removed.

> > +
> >  /*
> >   * This hash function is used to distribute rings within the per-domain
> >   * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> > @@ -265,6 +275,17 @@ signal_domain(struct domain *d)
> >  }
> >
> >  static void
> > +signal_domid(domid_t domain_id)
> > +{
> > +    struct domain *d = get_domain_by_id(domain_id);
>
> Newline.

ack

>
> > +    if ( !d )
> > +        return;
> > +
> > +    signal_domain(d);
> > +    put_domain(d);
> > +}
> > +
> > +static void
> >  ring_unmap(struct argo_ring_info *ring_info)
> >  {
> >      unsigned int i;
> > @@ -473,6 +494,62 @@ get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
> >      return 0;
> >  }
> >
> > +static uint32_t
> > +ringbuf_payload_space(struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    xen_argo_ring_t ring;
> > +    uint32_t len;
> > +    int32_t ret;
>
> You use a signed type to internally store the return value, but the
> return type from the function itself is unsigned. Is it guaranteed
> that ret < INT32_MAX always?

It is, yes, so I've added a explanatory comment:

    /*
     * In a sanitized ring, we can rely on:
     *              (rx_ptr < ring_info->len)           &&
     *              (tx_ptr < ring_info->len)           &&
     *      (ring_info->len <= XEN_ARGO_MAX_RING_SIZE)
     *
     * and since: XEN_ARGO_MAX_RING_SIZE < INT32_MAX
     * therefore right here: ret < INT32_MAX
     * and we are safe to return it as a unsigned value from this function.
     * The subtractions below cannot increase its value.
     */

>
> > +
> > +    ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +    len = ring_info->len;
> > +    if ( !len )
> > +        return 0;
> > +
> > +    ret = get_sanitized_ring(&ring, ring_info);
> > +    if ( ret )
> > +        return 0;
> > +
> > +    argo_dprintk("sanitized ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
> > +                 ring.tx_ptr, ring.rx_ptr);
> > +
> > +    /*
> > +     * rx_ptr == tx_ptr means that the ring has been emptied, so return
> > +     * the maximum payload size that can be accepted -- see message size
> > +     * checking logic in the entry to ringbuf_insert which ensures that
> > +     * there is always one message slot (of size ROUNDUP_MESSAGE(1)) left
> > +     * available, preventing a ring from being entirely filled. This ensures
> > +     * that matching ring indexes always indicate an empty ring and not a
> > +     * full one.
> > +     * The subtraction here will not underflow due to minimum size constraints
> > +     * enforced on ring size elsewhere.
> > +     */
> > +    if ( ring.rx_ptr == ring.tx_ptr )
> > +        return len - sizeof(struct xen_argo_ring_message_header)
> > +                   - ROUNDUP_MESSAGE(1);
>
> Why not do something like:
>
> ret = ring.rx_ptr - ring.tx_ptr;
> if ( ret <= 0)
>     ret += len;
>
> Instead of this early return?
>
> The only difference when the ring is full is that len should be used
> instead of the ptr difference.

That's much nicer -- thanks. Done.

> >
> >  static int
> > +fill_ring_data(const struct domain *currd,
> > +               XEN_GUEST_HANDLE(xen_argo_ring_data_ent_t) data_ent_hnd)
> > +{
> > +    xen_argo_ring_data_ent_t ent;
> > +    struct domain *dst_d;
> > +    struct argo_ring_info *ring_info;
> > +    int ret;
> > +
> > +    ASSERT(rw_is_locked(&argo_lock));
> > +
> > +    ret = __copy_from_guest(&ent, data_ent_hnd, 1) ? -EFAULT : 0;
> > +    if ( ret )
> > +        goto out;
>
> if ( __copy_from_guest(&ent, data_ent_hnd, 1) )
>     return -EFAULT;
>
> And you can get rid of the out label.

ack, done.

>
> > +
> > +    argo_dprintk("fill_ring_data: ent.ring.domain=%u,ent.ring.port=%x\n",
> > +                 ent.ring.domain_id, ent.ring.port);
> > +
> > +    ent.flags = 0;
>
> Please memset ent to 0 or initialize it to { }, or else you are
> leaking hypervisor stack data to the guest in the padding field.

ok - I've added the initializer, thanks.
Was it really leaking stack data though because the struct should have
been fully populated, including the padding field, with the
__copy_from_guest above?


> > +static void
> > +notify_check_pending(struct domain *currd)
> > +{
> > +    unsigned int i;
> > +    HLIST_HEAD(to_notify);
> > +
> > +    ASSERT(rw_is_locked(&argo_lock));
> > +
> > +    read_lock(&currd->argo->lock);
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; i++ )
> > +    {
> > +        struct hlist_node *node, *next;
> > +        struct argo_ring_info *ring_info;
> > +
> > +        hlist_for_each_entry_safe(ring_info, node, next,
> > +                                  &currd->argo->ring_hash[i], node)
> > +        {
> > +            notify_ring(currd, ring_info, &to_notify);
> > +        }
>
> No need for the braces.

Fixed - thanks.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 14:58   ` Andrew Cooper
  2019-01-14 15:12     ` Jan Beulich
@ 2019-01-15  7:21     ` Christopher Clark
  2019-01-15  9:01       ` Jan Beulich
  1 sibling, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  7:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Jason Andryuk,
	Ian Jackson, Rich Persaud, James McKenzie, Daniel Smith,
	Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 6:58 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 07/01/2019 07:42, Christopher Clark wrote:
> > Initialises basic data structures and performs teardown of argo state
> > for domain shutdown.
> >
> > Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
> >
> > Introduces a new Xen command line parameter 'argo': bool to enable/disable
> > the argo hypercall. Defaults to disabled.
> >
> > New headers:
> >   public/argo.h: with definions of addresses and ring structure, including
> >   indexes for atomic update for communication between domain and hypervisor.
> >
> >   xen/argo.h: to expose the hooks for integration into domain lifecycle:
> >     argo_init: per-domain init of argo data structures for domain_create.
> >     argo_destroy: teardown for domain_destroy and the error exit
> >                   path of domain_create.
> >     argo_soft_reset: reset of domain state for domain_soft_reset.
> >
> > Adds two new fields to struct domain:
> >     rwlock_t argo_lock;
> >     struct argo_domain *argo;
> >
> > In accordance with recent work on _domain_destroy, argo_destroy is
> > idempotent. It will tear down: all rings registered by this domain, all
> > rings where this domain is the single sender (ie. specified partner,
> > non-wildcard rings), and all pending notifications where this domain is
> > awaiting signal about available space in the rings of other domains.
> >
> > A count will be maintained of the number of rings that a domain has
> > registered in order to limit it below the fixed maximum limit defined here.
> >
> > The software license on the public header is the BSD license, standard
> > procedure for the public Xen headers. The public header was originally
> > posted under a GPL license at: [1]:
> > https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
> >
> > The following ACK by Lars Kurth is to confirm that only people being
> > employees of Citrix contributed to the header files in the series posted at
> > [1] and that thus the copyright of the files in question is fully owned by
> > Citrix. The ACK also confirms that Citrix is happy for the header files to
> > be published under a BSD license in this series (which is based on [1]).
> >
> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > Acked-by: Lars Kurth <lars.kurth@citrix.com>
>
> I hope I've not trodden on the toes of any other reviews.  I've got some
> minor requests, but hopefully its all fairly trivial.
>
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index a755a67..aea13eb 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
> >  in combination with cpuidle.  This option is only expected to be useful for
> >  developers wishing Xen to fall back to older timing methods on newer hardware.
> >
> > +### argo
> > +> `= <boolean>`
> > +
> > +> Default: `false`
> > +
> > +Enable the Argo hypervisor-mediated interdomain communication mechanism.
> > +
> > +This allows domains access to the Argo hypercall, which supports registration
> > +of memory rings with the hypervisor to receive messages, sending messages to
> > +other domains by hypercall and querying the ring status of other domains.
>
> Please do include a note about CONFIG_ARGO.  I know this doc is
> inconsistent on the matter (as Kconfig postdates the written entries
> here), but I have been trying to fix up, and now about half of the
> documentation does mention appropriate Kconfig information.

Ack, note added.

>
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 6f782f7..86195d3 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
>
> I know I'm commenting on the wrong patch, but please use unsigned long
> cmd, so the type definition here doesn't truncate the caller provided
> value.  We have similar buggy code all over Xen, but its too late to fix
> that, and I'd prefer not to propagate the error.

On this one, given Jan's reply, I've left it as is for the series that
I'll push tonight. That patch is carrying an Ack from Jan at the
moment, so I wasn't going to touch it if it's not required. If there's
consensus that it should change, let me know and I'll switch it.

>
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    struct domain *currd = current->domain;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
>
> For debugging purposes, you don't want to truncate any of these values,
> or you'll have a print message which doesn't match what the guest
> provided.  I'd use %ld for arg3 and arg4.

I've gone with %lu for arg3 and %lx for arg4, for the next round, but
I could be missing something: is there a reason to prefer '%ld' over
'%lu' for using with an unsigned type? Should I be using %d elsewhere,
eg. for a domid?

>
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +    {
> > +        rc = -EOPNOTSUPP;
>
> Shouldn't this be ENOSYS instead?  There isn't a conceptual difference
> between CONFIG_ARGO compiled out, and opt_argo clear on the command
> line, and I don't think a guest should be able to tell the difference.

EOPNOTSUPP has been retained here, as per an earlier review and also a
response later in this thread.

>
> > +        return rc;
> > +    }
> > +
> > +    domain_lock(currd);
> > +
> > +    switch (cmd)
> > +    {
> > +    default:
> > +        rc = -EOPNOTSUPP;
> > +        break;
> > +    }
> > +
> > +    domain_unlock(currd);
> > +
> > +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> > +
> > +    return rc;
> > +}
> > +
> > +static void
> > +argo_domain_init(struct argo_domain *argo)
> > +{
> > +    unsigned int i;
> > +
> > +    rwlock_init(&argo->lock);
> > +    spin_lock_init(&argo->send_lock);
> > +    spin_lock_init(&argo->wildcard_lock);
> > +    argo->ring_count = 0;
> > +
> > +    for ( i = 0; i < ARGO_HTABLE_SIZE; ++i )
> > +    {
> > +        INIT_HLIST_HEAD(&argo->ring_hash[i]);
> > +        INIT_HLIST_HEAD(&argo->send_hash[i]);
> > +    }
> > +    INIT_HLIST_HEAD(&argo->wildcard_pend_list);
> > +}
> > +
> > +int
> > +argo_init(struct domain *d)
>
> Are there any per-vcpu argo resources?  I don't see any in the series,
> but I'd be tempted to name the external functions as
> argo_domain_{init,destroy}() which is slightly clearer in the caller
> context.

I haven't renamed this as I'm out of time today but I slightly wary
about whether it ends up helping clarity, given 1) the name of an
existing data type 'argo_domain' and 2) the other similar init
functions that are invoked in domain_create don't seem to follow the
<subsystem>_domain_init naming scheme. Are you sure I should do this?

>
> > +{
> > +    struct argo_domain *argo;
> > +
> > +    if ( !opt_argo_enabled )
> > +    {
> > +        argo_dprintk("argo disabled, domid: %d\n", d->domain_id);
> > +        return 0;
> > +    }
> > +
> > +    argo_dprintk("init: domid: %d\n", d->domain_id);
> > +
> > +    argo = xmalloc(struct argo_domain);
>
> For sanity sake, I'd suggest xzalloc() here, not that I can spot
> anything wrong with the current code.

ack, done.

>
> > +    if ( !argo )
> > +        return -ENOMEM;
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_domain_init(argo);
>
> This call doesn't need to be within the global argo_lock critical
> region, because it exclusively operates on state which is inaccessible
> to the rest of the system until d->argo is written.  This then shrinks
> the critical region to a single pointer write.

ack, done

> (Further, with a patch I
> haven't posted yet, the memset(0) in zxalloc() can be write-merged with
> the setup code to avoid repeated writes, which can't happen with a
> spinlock in between.)
>
> > +
> > +    d->argo = argo;
> > +
> > +    write_unlock(&argo_lock);
> > +
> > +    return 0;
> > +}
> > +
> > +void
> > +argo_destroy(struct domain *d)
>
> Is this function fully idempotent?  Given its current calling context,
> it needs to be.  (I think it is, but I just want to double check,
> because it definitely wants to be.)

I think so and it is intended to be - it takes a lock, only does work
if a pointer isn't NULL, and then NULLs it before dropping the lock,
so should be ok.

>
> I ask, because...
>
> > +{
> > +    BUG_ON(!d->is_dying);
> > +
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("destroy: domid %d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +        xfree(d->argo);
> > +        d->argo = NULL;
> > +    }
> > +    write_unlock(&argo_lock);
> > +}
> > +
> > +void
> > +argo_soft_reset(struct domain *d)
> > +{
> > +    write_lock(&argo_lock);
> > +
> > +    argo_dprintk("soft reset d=%d d->argo=%p\n", d->domain_id, d->argo);
> > +
> > +    if ( d->argo )
> > +    {
> > +        domain_rings_remove_all(d);
> > +        partner_rings_remove(d);
> > +        wildcard_rings_pending_remove(d);
> > +
> > +        if ( !opt_argo_enabled )
> > +        {
> > +            xfree(d->argo);
> > +            d->argo = NULL;
> > +        }
> > +        else
> > +            argo_domain_init(d->argo);
> > +    }
> > +
> > +    write_unlock(&argo_lock);
> >  }
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index c623dae..9596840 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -32,6 +32,7 @@
> >  #include <xen/grant_table.h>
> >  #include <xen/xenoprof.h>
> >  #include <xen/irq.h>
> > +#include <xen/argo.h>
> >  #include <asm/debugger.h>
> >  #include <asm/p2m.h>
> >  #include <asm/processor.h>
> > @@ -277,6 +278,10 @@ static void _domain_destroy(struct domain *d)
> >
> >      xfree(d->pbuf);
> >
> > +#ifdef CONFIG_ARGO
> > +    argo_destroy(d);
> > +#endif
>
> ... given this call (which is correct), ...
>
> > +
> >      rangeset_domain_destroy(d);
> >
> >      free_cpumask_var(d->dirty_cpumask);
> > @@ -376,6 +381,9 @@ struct domain *domain_create(domid_t domid,
> >      spin_lock_init(&d->hypercall_deadlock_mutex);
> >      INIT_PAGE_LIST_HEAD(&d->page_list);
> >      INIT_PAGE_LIST_HEAD(&d->xenpage_list);
> > +#ifdef CONFIG_ARGO
> > +    rwlock_init(&d->argo_lock);
> > +#endif
> >
> >      spin_lock_init(&d->node_affinity_lock);
> >      d->node_affinity = NODE_MASK_ALL;
> > @@ -445,6 +453,11 @@ struct domain *domain_create(domid_t domid,
> >              goto fail;
> >          init_status |= INIT_gnttab;
> >
> > +#ifdef CONFIG_ARGO
> > +        if ( (err = argo_init(d)) != 0 )
> > +            goto fail;
> > +#endif
> > +
> >          err = -ENOMEM;
> >
> >          d->pbuf = xzalloc_array(char, DOMAIN_PBUF_SIZE);
> > @@ -717,6 +730,9 @@ int domain_kill(struct domain *d)
> >          if ( d->is_dying != DOMDYING_alive )
> >              return domain_kill(d);
> >          d->is_dying = DOMDYING_dying;
> > +#ifdef CONFIG_ARGO
> > +        argo_destroy(d);
> > +#endif
>
> ... this one isn't necessary.
>
> I'm in the middle of fixing all this destruction logic, and
> _domain_destroy() is called below.
>
> The rule is that everything in _domain_destroy() should be idempotent,
> and all destruction logic needs moving there, so I can remove
> DOMCTL_setmaxvcpus and fix a load of toolstack-triggerable NULL pointer
> dereferences in Xen.
>
> Eventually, everything in this hunk will disappear.

Thanks for the guidance. I've added a FIXME for this for the series
I'm about to push and will work on resolving it tomorrow.

>
> >          evtchn_destroy(d);
> >          gnttab_release_mappings(d);
> >          tmem_destroy(d->tmem_client);
> > diff --git a/xen/include/xen/argo.h b/xen/include/xen/argo.h
> > new file mode 100644
> > index 0000000..29d32a9
> > --- /dev/null
> > +++ b/xen/include/xen/argo.h
> > @@ -0,0 +1,23 @@
> > +/******************************************************************************
> > + * Argo : Hypervisor-Mediated data eXchange
> > + *
> > + * Copyright (c) 2018, BAE Systems
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> > + */
> > +
> > +#ifndef __XEN_ARGO_H__
> > +#define __XEN_ARGO_H__
> > +
> > +int argo_init(struct domain *d);
> > +void argo_destroy(struct domain *d);
> > +void argo_soft_reset(struct domain *d);
>
> Instead of the #ifdefary in the calling code, please could you stub
> these out in this file?  See the tail of include/asm-x86/pv/domain.h for
> an example based on CONFIG_PV.

ack, done at Roger's earlier request.

Thanks for the review,

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 15:12     ` Jan Beulich
@ 2019-01-15  7:24       ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  7:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 7:12 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 14.01.19 at 15:58, <andrew.cooper3@citrix.com> wrote:
> > On 07/01/2019 07:42, Christopher Clark wrote:
> >> --- a/xen/common/argo.c
> >> +++ b/xen/common/argo.c
> >>  long
> >>  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >
> > I know I'm commenting on the wrong patch, but please use unsigned long
> > cmd, so the type definition here doesn't truncate the caller provided
> > value.  We have similar buggy code all over Xen, but its too late to fix
> > that, and I'd prefer not to propagate the error.
>
> Why buggy? It all depends on how the interface is specified. If
> the input is 32 bits wide, it is clear that higher bits are supposed
> to be ignored. Nothing says that the full register width is
> significant.

I've left this as is (ie. unsigned int) but I can change it if it should change.

>
> >>             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >>             unsigned long arg4)
> >>  {
> >> -    return -ENOSYS;
> >> +    struct domain *currd = current->domain;
> >> +    long rc = -EFAULT;
> >> +
> >> +    argo_dprintk("->do_argo_op(%u,%p,%p,%d,%d)\n", cmd,
> >> +                 (void *)arg1.p, (void *)arg2.p, (int) arg3, (int) arg4);
> >
> > For debugging purposes, you don't want to truncate any of these values,
> > or you'll have a print message which doesn't match what the guest
> > provided.  I'd use %ld for arg3 and arg4.
>
> Perhaps better %lx, for the output being easier to recognize
> for both bitmaps (e.g. flag values) and sufficiently large values.

ack, done

>
> >> +
> >> +    if ( unlikely(!opt_argo_enabled) )
> >> +    {
> >> +        rc = -EOPNOTSUPP;
> >
> > Shouldn't this be ENOSYS instead?  There isn't a conceptual difference
> > between CONFIG_ARGO compiled out, and opt_argo clear on the command
> > line, and I don't think a guest should be able to tell the difference.
>
> I admit it's a boundary case, but I think ENOSYS should strictly
> only ever be (and have been) returned for unrecognized major
> hypercall numbers.

ack

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-14 14:19   ` Jan Beulich
@ 2019-01-15  7:56     ` Christopher Clark
  2019-01-15  8:36       ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  7:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 6:19 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -23,16 +23,41 @@
> >  #include <xen/event.h>
> >  #include <xen/domain_page.h>
> >  #include <xen/guest_access.h>
> > +#include <xen/lib.h>
> > +#include <xen/nospec.h>
> >  #include <xen/time.h>
> >  #include <public/argo.h>
> >
> > +#define MAX_RINGS_PER_DOMAIN            128U
> > +
> > +/* All messages on the ring are padded to a multiple of the slot size. */
> > +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
>
> Pointless outermost pair of parentheses.

ack, removed

>
> > @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
> >  #define argo_dprintk(format, ... ) ((void)0)
> >  #endif
> >
> > +/*
> > + * This hash function is used to distribute rings within the per-domain
> > + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> > + * will provide a struct if a match is found with a 'argo_ring_id' key:
> > + * ie. the key is a (domain id, port, partner domain id) tuple.
> > + * Since port number varies the most in expected use, and the Linux driver
> > + * allocates at both the high and low ends, incorporate high and low bits to
> > + * help with distribution.
> > + * Apply array_index_nospec as a defensive measure since this operates
> > + * on user-supplied input and the array size that it indexes into is known.
> > + */
> > +static unsigned int
> > +hash_index(const struct argo_ring_id *id)
> > +{
> > +    unsigned int hash;
> > +
> > +    hash = (uint16_t)(id->port >> 16);
> > +    hash ^= (uint16_t)id->port;
>
> I may have asked this before, but are the casts really needed
> with ...
>
> > +    hash ^= id->domain_id;
> > +    hash ^= id->partner_id;
> > +    hash &= (ARGO_HTABLE_SIZE - 1);
>
> ... the masking done here?
>
> > +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
>
> With the masking above - is this really needed?
>
> And then the question is whether the quality of the hash is
> sufficient: There won't be more set bits in the result than
> are in any of the three input values, so if they're all small,
> higher hash table entries won't be used at all. I would
> assume the goal to be that by the time 32 entities appear,
> chances be good that at least about 30 of the hash table
> entries are in use.

ok, I'll replace this function and address the above.
I'm out of time today so have added a FIXME for today's series posting
and will get it done tomorrow.

>
> > @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
> >      }
> >  }
> >
> > +static int
> > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void **out_ptr)
> > +{
> > +    if ( i >= ring_info->nmfns )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of %u\n",
>
> ring_info->id.{domain,partner}_id look to be of the same type -
> why once %u and once %d? Same elsewhere.

Fixed across the series to use %u for domid_t output.

>
> > +                ring_info->id.domain_id, ring_info->id.port,
> > +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> > +        return -ENOMEM;
> > +    }
>
>     i = array_index_nospec(i, ring_info->nmfns);
>
> considering the array indexes here? Of course at this point only
> zero can be passed in, but I assume this changes in later patches
> and the index is at least indirectly guest controlled.

Added, thanks.

>
> > @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
> >      }
> >  }
> >
> > +static int
> > +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
>
> So you have find_ring_mfn(), find_ring_mfns(), and ring_find_info().
> Any chance you could use a consistent ordering of "ring" and "find"?
> Or is there a reason behind the apparent mismatch?

I've renamed them to use 'find_' as the common prefix. Look cleaner to
me. thanks.

>
> > +{
> > +    p2m_type_t p2mt;
> > +    int ret = 0;
> > +
> > +#ifdef CONFIG_X86
> > +    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> > +#else
> > +    *mfn = p2m_lookup(d, gfn, &p2mt);
> > +#endif
> > +
> > +    if ( !mfn_valid(*mfn) )
> > +        ret = -EINVAL;
> > +#ifdef CONFIG_X86
> > +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > +        ret = -EAGAIN;
> > +#endif
> > +    else if ( (p2mt != p2m_ram_rw) ||
> > +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > +        ret = -EINVAL;
> > +
> > +#ifdef CONFIG_X86
> > +    put_gfn(d, gfn_x(gfn));
> > +#endif
> > +
> > +    return ret;
> > +}
>
> Please check whether you could leverage check_get_page_from_gfn()
> here. If you can't, please at least take inspiration as to e.g. the
> #ifdef-s from that function.

Have added a temporary FIXME for this and will do this tomorrow.

>
> > +static int
> > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > +               uint32_t npage,
> > +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > +               uint32_t len)
>
> Noticing it here, but perhaps still an issue elsewhere as well: Didn't
> we agree on removing unnecessary use of fixed width types? Or
> was that in the context on an earlier patch of v3?

These are fixed and hopefully all the others that do not belong are
also gone in v4.

>
> > +{
> > +    unsigned int i;
> > +    int ret = 0;
> > +    mfn_t *mfns;
> > +    uint8_t **mfn_mapping;
> > +
> > +    /*
> > +     * first bounds check on npage here also serves as an overflow check
> > +     * before left shifting it
> > +     */
> > +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
>
> Isn't this redundant with the check in do_argo_p()?
>
> > +         ((npage << PAGE_SHIFT) < len) )
> > +        return -EINVAL;

Answering your point inline above: Yes - do_argo_op does the bounds
checking, so I've removed the entire check above.

> > +
> > +    if ( ring_info->mfns )
> > +    {
> > +        /* Ring already existed: drop the previous mapping. */
> > +        gprintk(XENLOG_INFO,
> > +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears mapping\n",
>
> Indentation (also elsewhere).

Ack, fixed here and elsewhere.

>
> > +                d->domain_id, ring_info->id.domain_id,
> > +                ring_info->id.port, ring_info->id.partner_id);
> > +
> > +        ring_remove_mfns(d, ring_info);
> > +        ASSERT(!ring_info->mfns);
> > +    }
> > +
> > +    mfns = xmalloc_array(mfn_t, npage);
> > +    if ( !mfns )
> > +        return -ENOMEM;
> > +
> > +    for ( i = 0; i < npage; i++ )
> > +        mfns[i] = INVALID_MFN;
> > +
> > +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> > +    if ( !mfn_mapping )
> > +    {
> > +        xfree(mfns);
> > +        return -ENOMEM;
> > +    }
> > +
> > +    ring_info->npage = npage;
> > +    ring_info->mfns = mfns;
> > +    ring_info->mfn_mapping = mfn_mapping;
>
> As the inverse to the cleanup sequence in an earlier patch: Please
> set ->npage last here even if it doesn't strictly matter.

npage is now gone after implementing Roger's feedback to only keep "len".

>
> > +    ASSERT(ring_info->npage == npage);
>
> What is this trying to make sure, seeing the assignment just a
> few lines up?

removed

>
> > +    if ( ring_info->nmfns == ring_info->npage )
> > +        return 0;
>
> Can this happen with the ring_remove_mfns() call above?

No, not any more, you're right.

>
> > +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
>
> And hence can i start from other than zero here? And why not
> use the (possibly cheaper to access) function argument "npage"
> as the loop upper bound? The other similar loop a few lines up
> is coded that simpler way.

Yes, thanks, done.

>
> > +static long
> > +register_ring(struct domain *currd,
> > +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> > +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > +              uint32_t npage, bool fail_exist)
> > +{
> > +    xen_argo_register_ring_t reg;
> > +    struct argo_ring_id ring_id;
> > +    void *map_ringp;
> > +    xen_argo_ring_t *ringp;
> > +    struct argo_ring_info *ring_info;
> > +    struct argo_send_info *send_info = NULL;
> > +    struct domain *dst_d = NULL;
> > +    int ret = 0;
> > +    uint32_t private_tx_ptr;
> > +
> > +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> > +    {
> > +        ret = -EFAULT;
> > +        goto out;
> > +    }
> > +
> > +    /*
> > +     * A ring must be large enough to transmit messages, so requires space for:
> > +     * * 1 message header, plus
> > +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> > +     *   for the message payload to be written into, plus
> > +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> > +     *   single message -- see the logic in ringbuf_insert -- allowing for this
> > +     *   ensures that there can be space remaining when a message is present.
> > +     * The above determines the minimum acceptable ring size.
> > +     */
> > +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> > +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
>
> These two summands don't look to fulfill the "cannot be filled to
> capacity" constraint the comment describes, as (aiui) messages
> can be larger than 16 bytes. What's the deal?

This is intended to be about bound checking reg.len against a minimum
size: the smallest ring that you can fit a message onto, as determined
by the logic in ringbuf_insert. The smallest message you can send is:
sizeof(struct xen_argo_ring_message_header) + ROUNDUP_MESSAGE(1)

and then ringbuf_insert won't accept a message unless there is (at
least) ROUNDUP_MESSAGE(1) space remaining, so that, plus the smallest
message size, is the smallest viable ring. There's no point accepting
registration of a ring smaller than that.

You're right that messages can be larger than 16 bytes, but they can
only be sent to rings that are larger than that minimum - on a minimum
sized ring, they'll be rejected by sendv.

>
> > +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> > +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> > +         (reg.pad != 0) )
> > +    {
> > +        ret = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    ring_id.partner_id = reg.partner_id;
> > +    ring_id.port = reg.port;
> > +    ring_id.domain_id = currd->domain_id;
> > +
> > +    read_lock(&argo_lock);
>
> From here to ...
>
> > +    if ( !currd->argo )
> > +    {
> > +        ret = -ENODEV;
> > +        goto out_unlock;
> > +    }
> > +
> > +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> > +    {
> > +        if ( opt_argo_mac_enforcing )
> > +        {
> > +            ret = -EPERM;
> > +            goto out_unlock;
> > +        }
> > +    }
> > +    else
> > +    {
> > +        dst_d = get_domain_by_id(reg.partner_id);
> > +        if ( !dst_d )
> > +        {
> > +            argo_dprintk("!dst_d, ESRCH\n");
> > +            ret = -ESRCH;
> > +            goto out_unlock;
> > +        }
> > +
> > +        if ( !dst_d->argo )
> > +        {
> > +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > +            ret = -ECONNREFUSED;
> > +            put_domain(dst_d);
> > +            goto out_unlock;
> > +        }
> > +
> > +        send_info = xzalloc(struct argo_send_info);
> > +        if ( !send_info )
> > +        {
> > +            ret = -ENOMEM;
> > +            put_domain(dst_d);
> > +            goto out_unlock;
> > +        }
> > +        send_info->id = ring_id;
> > +    }
>
> ... here, what exactly is it that requires the global read lock
> to be held ...
>
> > +    write_lock(&currd->argo->lock);
>
> ... prior to this? Holding locks around allocations is not
> forbidden, but should be avoided whenever possible.
>
> And then further why does the global read lock need
> continued holding until the end of the function?

I've added FIXME to review this tomorrow. I understand the
argo-internal locking protocols and this is adhering to what they
state, in that accesses to the argo structs of currd and dst_d are
protected by the global read lock here, but at the moment I'm less
clear on what the expectations are for standard Xen domain locks,
references and lifecycle are.

>
> > +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> > +    {
> > +        ret = -ENOSPC;
> > +        goto out_unlock2;
> > +    }
> > +
> > +    ring_info = ring_find_info(currd, &ring_id);
> > +    if ( !ring_info )
> > +    {
> > +        ring_info = xzalloc(struct argo_ring_info);
> > +        if ( !ring_info )
> > +        {
> > +            ret = -ENOMEM;
> > +            goto out_unlock2;
> > +        }
> > +
> > +        spin_lock_init(&ring_info->lock);
> > +
> > +        ring_info->id = ring_id;
> > +        INIT_HLIST_HEAD(&ring_info->pending);
> > +
> > +        hlist_add_head(&ring_info->node,
> > +                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> > +
> > +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +    }
> > +    else
> > +    {
> > +        if ( ring_info->len )
> > +        {
>
> Please fold into "else if ( )", removing a level of indentation.

ack, done

>
> > +            /*
> > +             * If the caller specified that the ring must not already exist,
> > +             * fail at attempt to add a completed ring which already exists.
> > +             */
> > +            if ( fail_exist )
> > +            {
> > +                argo_dprintk("disallowed reregistration of existing ring\n");
> > +                ret = -EEXIST;
> > +                goto out_unlock2;
> > +            }
> > +
> > +            if ( ring_info->len != reg.len )
> > +            {
> > +                /*
> > +                 * Change of ring size could result in entries on the pending
> > +                 * notifications list that will never trigger.
> > +                 * Simple blunt solution: disallow ring resize for now.
> > +                 * TODO: investigate enabling ring resize.
> > +                 */
> > +                gprintk(XENLOG_ERR,
> > +                    "argo: vm%u attempted to change ring size(vm%u:%x vm%d)\n",
> > +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                        ring_id.partner_id);
> > +                /*
> > +                 * Could return EINVAL here, but if the ring didn't already
> > +                 * exist then the arguments would have been valid, so: EEXIST.
> > +                 */
> > +                ret = -EEXIST;
> > +                goto out_unlock2;
> > +            }
> > +
> > +            gprintk(XENLOG_DEBUG,
> > +                    "argo: vm%u re-registering existing ring (vm%u:%x vm%d)\n",
> > +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                    ring_id.partner_id);
> > +        }
> > +    }
> > +
> > +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +
> > +        ring_remove_info(currd, ring_info);
> > +        goto out_unlock2;
> > +    }
> > +
> > +    /*
> > +     * The first page of the memory supplied for the ring has the xen_argo_ring
> > +     * structure at its head, which is where the ring indexes reside.
> > +     */
> > +    ret = ring_map_page(ring_info, 0, &map_ringp);
> > +    if ( ret )
> > +    {
> > +        gprintk(XENLOG_ERR,
> > +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> > +                currd->domain_id, ring_id.domain_id, ring_id.port,
> > +                ring_id.partner_id);
> > +
> > +        ring_remove_info(currd, ring_info);
> > +        goto out_unlock2;
> > +    }
> > +    ringp = map_ringp;
> > +
> > +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> > +
> > +    if ( (private_tx_ptr >= reg.len) ||
> > +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> > +    {
> > +        /*
> > +         * Since the ring is a mess, attempt to flush the contents of it
> > +         * here by setting the tx_ptr to the next aligned message slot past
> > +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> > +         */
> > +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> > +
> > +        if ( private_tx_ptr >= reg.len )
> > +            private_tx_ptr = 0;
> > +
> > +        update_tx_ptr(ring_info, private_tx_ptr);
> > +    }
> > +
> > +    ring_info->tx_ptr = private_tx_ptr;
> > +    ring_info->len = reg.len;
> > +    currd->argo->ring_count++;
> > +
> > +    if ( send_info )
> > +    {
> > +        spin_lock(&dst_d->argo->send_lock);
> > +
> > +        hlist_add_head(&send_info->node,
> > +                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> > +
> > +        spin_unlock(&dst_d->argo->send_lock);
> > +    }
> > +
> > + out_unlock2:
> > +    if ( !ret && send_info )
> > +        xfree(send_info);
> > +
> > +    if ( dst_d )
> > +        put_domain(dst_d);
> > +
> > +    write_unlock(&currd->argo->lock);
>
> Surely you can drop the lock before the other two cleanup
> actions? That would then allow you to add another label to
> absorb the two separate put_domain() calls on error paths.

That looks correct. Added a FIXME note now and will fix tomorrow. thanks.

>
> > --- a/xen/include/asm-arm/guest_access.h
> > +++ b/xen/include/asm-arm/guest_access.h
> > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
> >  /* Is the guest handle a NULL reference? */
> >  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
>
> This is unused throughout the patch afaics.

Removed.

>
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -31,6 +31,26 @@
> >
> >  #include "xen.h"
> >
> > +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
> > + *  -- which is 0x1000000 bytes.
> > + * A byte index into the ring is at most 24 bits.
> > + */
> > +#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
> > +
> > +/*
> > + * Page descriptor: encoding both page address and size in a 64-bit value.
> > + * Intended to allow ABI to support use of different granularity pages.
> > + * example of how to populate:
> > + * xen_argo_page_descr_t pg_desc =
> > + *      (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K;
> > + */
> > +typedef uint64_t xen_argo_page_descr_t;
> > +#define XEN_ARGO_PAGE_DESCR_SIZE_MASK   0x0000000000000fffULL
> > +#define XEN_ARGO_PAGE_DESCR_SIZE_4K     0
>
> Are the _DESCR_ infixes here really useful?

These are now gone, with Julien's approval for the change back to use
the gfn-using interfaces.

>
> > @@ -56,4 +76,56 @@ typedef struct xen_argo_ring
> >  #endif
> >  } xen_argo_ring_t;
> >
> > +typedef struct xen_argo_register_ring
> > +{
> > +    uint32_t port;
> > +    domid_t partner_id;
> > +    uint16_t pad;
> > +    uint32_t len;
> > +} xen_argo_register_ring_t;
> > +
> > +/* Messages on the ring are padded to a multiple of this size. */
> > +#define XEN_ARGO_MSG_SLOT_SIZE 0x10
> > +
> > +struct xen_argo_ring_message_header
> > +{
> > +    uint32_t len;
> > +    xen_argo_addr_t source;
> > +    uint32_t message_type;
> > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> > +    uint8_t data[];
> > +#elif defined(__GNUC__)
> > +    uint8_t data[0];
> > +#endif
> > +};
> > +
> > +/*
> > + * Hypercall operations
> > + */
> > +
> > +/*
> > + * XEN_ARGO_OP_register_ring
> > + *
> > + * Register a ring using the indicated memory.
> > + * Also used to reregister an existing ring (eg. after resume from hibernate).
> > + *
> > + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t)
> > + * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t)
> > + * arg3: unsigned long npages
> > + * arg4: unsigned long flags
>
> The "unsigned long"-s here are not necessarily compatible with
> compat mode. At the very least flags above bit 31 won't be
> usable by compat mode guests. Hence I also question ...
>
> > + */
> > +#define XEN_ARGO_OP_register_ring     1
> > +
> > +/* Register op flags */
> > +/*
> > + * Fail exist:
> > + * If set, reject attempts to (re)register an existing established ring.
> > + * If clear, reregistration occurs if the ring exists, with the new ring
> > + * taking the place of the old, preserving tx_ptr if it remains valid.
> > + */
> > +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
> > +
> > +/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit */
> > +#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL
>
> ... the UL suffix here. Also this last item should not be exposed
> (perhaps framed by "#ifdef __XEN__") and would perhaps anyway
> better be defined in terms of the other
> XEN_ARGO_REGISTER_FLAG_*.

Notes added in place for each of the above; will work on these tomorrow.

thanks

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-14 15:31   ` Andrew Cooper
@ 2019-01-15  8:02     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  8:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Jason Andryuk,
	Ian Jackson, Rich Persaud, James McKenzie, Daniel Smith,
	Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	Eric Chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 7:31 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 07/01/2019 07:42, Christopher Clark wrote:
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index aea13eb..68d4415 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -193,6 +193,21 @@ This allows domains access to the Argo hypercall, which supports registration
> >  of memory rings with the hypervisor to receive messages, sending messages to
> >  other domains by hypercall and querying the ring status of other domains.
> >
> > +### argo-mac
> > +> `= permissive | enforcing`
>
> Are these command line options already in use in the OpenXT community?

No, there's no concern there.

> I ask, because we are trying to avoid gaining multiple top level options
> for related functionatliy.
>
> IMO, this functionality could be covered more succinctly with:
>
>   argo = List of [ <bool>, mac = permissive | enforcing ]
>
> which also allows for cleaner addition of future options.
>
> (Unfortunately, to implement this, you need my cmdline_strcmp() fixes,
> which are still pending an ack.)

At Roger's recommendation, the "argo-mac" string option has become the
much simpler "argo-mac-permissive" boolean.

For 4.13, I'd like to improve the isolation of wildcard rings to the
point where they're able to be enabled whenever argo itself is, and
hopefully that will enable retiring this option altogether. If two
bools are tolerable for now (as this is still early and an
experimental feature), would it be OK to sort this out for 4.13?

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 08/15] argo: implement the unregister op
  2019-01-10 11:40   ` Roger Pau Monné
@ 2019-01-15  8:05     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  8:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Thu, Jan 10, 2019 at 3:40 AM Roger Pau Monné <royger@freebsd.org> wrote:
>
> On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> >
> > Takes a single argument: a handle to the ring unregistration struct,
> > which specifies the port and partner domain id or wildcard.
> >
> > The ring's entry is removed from the hashtable of registered rings;
> > any entries for pending notifications are removed; and the ring is
> > unmapped from Xen's address space.
> >
> > If the ring had been registered to communicate with a single specified
> > domain (ie. a non-wildcard ring) then the partner domain state is removed
> > from the partner domain's argo send_info hash table.
> >
> > Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
> > ---
> > v2 feedback Jan: drop cookie, implement teardown
> > v2 feedback Jan: drop message from argo_message_op
> > v2 self: OVERHAUL
> > v2 self: reorder logic to shorten critical section
> > v1 #13 feedback Jan: revise use of guest_handle_okay vs __copy ops
> > v1 feedback Roger, Jan: drop argo prefix on static functions
> > v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> > v1 #5 (#14) feedback Paul: use currd in do_argo_message_op
> > v1 #5 (#14) feedback Paul: full use currd in argo_unregister_ring
> > v1 #13 (#14) feedback Paul: replace do/while with goto; reindent
> > v1 self: add blank lines in unregister case in do_argo_message_op
> > v1: #13 feedback Jan: public namespace: prefix with xen
> > v1: #13 feedback Jan: blank line after op case in do_argo_message_op
> > v1: #14 feedback Jan: replace domain id override with validation
> > v1: #18 feedback Jan: meld the ring count limit into the series
> > v1: feedback #15 Jan: verify zero in unused hypercall args
> >
> >  xen/common/argo.c         | 115 ++++++++++++++++++++++++++++++++++++++++++++++
> >  xen/include/public/argo.h |  19 ++++++++
> >  xen/include/xlat.lst      |   1 +
> >  3 files changed, 135 insertions(+)
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 11988e7..59ce8c4 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -37,6 +37,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_addr_t);
> >  DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);
> >  DEFINE_XEN_GUEST_HANDLE(xen_argo_register_ring_t);
> >  DEFINE_XEN_GUEST_HANDLE(xen_argo_ring_t);
> > +DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
> >
> >  /* Xen command line option to enable argo */
> >  static bool __read_mostly opt_argo_enabled;
> > @@ -666,6 +667,105 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> >      return NULL;
> >  }
> >
> > +static struct argo_send_info *
> > +send_find_info(const struct domain *d, const struct argo_ring_id *id)
> > +{
> > +    struct hlist_node *node;
> > +    struct argo_send_info *send_info;
> > +
> > +    hlist_for_each_entry(send_info, node, &d->argo->send_hash[hash_index(id)],
> > +                         node)
> > +    {
> > +        struct argo_ring_id *cmpid = &send_info->id;
>
> Const.

ack, done

>
> > +
> > +        if ( cmpid->port == id->port &&
> > +             cmpid->domain_id == id->domain_id &&
> > +             cmpid->partner_id == id->partner_id )
> > +        {
> > +            argo_dprintk("send_info=%p\n", send_info);
> > +            return send_info;
> > +        }
> > +    }
> > +    argo_dprintk("no send_info found\n");
>
> Is this message actually helpful without printing any of the
> parameters provided to the function?

Good point. I've added the ring data to the output.

>
> > +
> > +    return NULL;
> > +}
> > +
> > +static long
> > +unregister_ring(struct domain *currd,
>
> Same as the comment made on the other patch, if this parameter is the
> current domain there's no need to pass it around, or else it should be
> named d instead of currd.

Response is as per the other for this one: where currd has been
retained, it's now ASSERTed as matching current->domain, and otherwise
'd' is used.

>
> > +                XEN_GUEST_HANDLE_PARAM(xen_argo_unregister_ring_t) unreg_hnd)
> > +{
> > +    xen_argo_unregister_ring_t unreg;
> > +    struct argo_ring_id ring_id;
> > +    struct argo_ring_info *ring_info;
> > +    struct argo_send_info *send_info;
> > +    struct domain *dst_d = NULL;
> > +    int ret;
> > +
> > +    ret = copy_from_guest(&unreg, unreg_hnd, 1) ? -EFAULT : 0;
> > +    if ( ret )
> > +        goto out;
> > +
> > +    ret = unreg.pad ? -EINVAL : 0;
> > +    if ( ret )
> > +        goto out;
>
> I don't see the point in the out label when you could just use 'return
> -EINVAL' or -EFAULT directly here and above.

ack - have done that, much better now.

thanks

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 10/15] argo: implement the notify op
  2019-01-15  6:53     ` Christopher Clark
@ 2019-01-15  8:06       ` Roger Pau Monné
  2019-01-15  8:32         ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2019-01-15  8:06 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet

On Mon, Jan 14, 2019 at 10:53:54PM -0800, Christopher Clark wrote:
> On Thu, Jan 10, 2019 at 4:22 AM Roger Pau Monné <royger@freebsd.org> wrote:
> >
> >  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> > > +
> > > +    argo_dprintk("fill_ring_data: ent.ring.domain=%u,ent.ring.port=%x\n",
> > > +                 ent.ring.domain_id, ent.ring.port);
> > > +
> > > +    ent.flags = 0;
> >
> > Please memset ent to 0 or initialize it to { }, or else you are
> > leaking hypervisor stack data to the guest in the padding field.
> 
> ok - I've added the initializer, thanks.
> Was it really leaking stack data though because the struct should have
> been fully populated, including the padding field, with the
> __copy_from_guest above?

That's my bad, there was no leak here. I somehow missed the
copy_from_guest above, even when I made a comment on it. Please leave
the code as-is.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 08/15] argo: implement the unregister op
  2019-01-14 15:06   ` Jan Beulich
@ 2019-01-15  8:11     ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  8:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 7:06 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> > @@ -666,6 +667,105 @@ ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> >      return NULL;
> >  }
> >
> > +static struct argo_send_info *
> > +send_find_info(const struct domain *d, const struct argo_ring_id *id)
>
> As per the comment on patch 7, perhaps find_send_info()?

ack, yes, renamed.

>
> > +{
> > +    struct hlist_node *node;
>
> const?

understood; not got this one done yet, but yes.

>
> > +static long
> > +unregister_ring(struct domain *currd,
> > +                XEN_GUEST_HANDLE_PARAM(xen_argo_unregister_ring_t) unreg_hnd)
> > +{
> > +    xen_argo_unregister_ring_t unreg;
> > +    struct argo_ring_id ring_id;
> > +    struct argo_ring_info *ring_info;
> > +    struct argo_send_info *send_info;
> > +    struct domain *dst_d = NULL;
> > +    int ret;
> > +
> > +    ret = copy_from_guest(&unreg, unreg_hnd, 1) ? -EFAULT : 0;
> > +    if ( ret )
> > +        goto out;
> > +
> > +    ret = unreg.pad ? -EINVAL : 0;
> > +    if ( ret )
> > +        goto out;
> > +
> > +    ring_id.partner_id = unreg.partner_id;
> > +    ring_id.port = unreg.port;
> > +    ring_id.domain_id = currd->domain_id;
> > +
> > +    read_lock(&argo_lock);
> > +
> > +    if ( !currd->argo )
> > +    {
> > +        ret = -ENODEV;
> > +        goto out_unlock;
> > +    }
> > +
> > +    write_lock(&currd->argo->lock);
> > +
> > +    ring_info = ring_find_info(currd, &ring_id);
> > +    if ( ring_info )
> > +    {
> > +        ring_remove_info(currd, ring_info);
> > +        currd->argo->ring_count--;
> > +    }
> > +
> > +    dst_d = get_domain_by_id(ring_id.partner_id);
> > +    if ( dst_d )
> > +    {
> > +        if ( dst_d->argo )
> > +        {
> > +            spin_lock(&dst_d->argo->send_lock);
> > +
> > +            send_info = send_find_info(dst_d, &ring_id);
> > +            if ( send_info )
> > +            {
> > +                hlist_del(&send_info->node);
> > +                xfree(send_info);
> > +            }
> > +
> > +            spin_unlock(&dst_d->argo->send_lock);
>
> As per the comment to an earlier patch, if at all possible call
> allocation (and hence also freeing) functions with as little
> locks held as possible. Pulling it out of the innermost lock
> here looks straightforward at least.

ack, have pulled it out of both L2s now which will help.

thanks

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 10/15] argo: implement the notify op
  2019-01-15  8:06       ` Roger Pau Monné
@ 2019-01-15  8:32         ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  8:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Julien Grall, Tim Deegan,
	Daniel Smith, Rich Persaud, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet

On Tue, Jan 15, 2019 at 12:06 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Mon, Jan 14, 2019 at 10:53:54PM -0800, Christopher Clark wrote:
> > On Thu, Jan 10, 2019 at 4:22 AM Roger Pau Monné <royger@freebsd.org> wrote:
> > >
> > >  On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > > <christopher.w.clark@gmail.com> wrote:
> > > > +
> > > > +    argo_dprintk("fill_ring_data: ent.ring.domain=%u,ent.ring.port=%x\n",
> > > > +                 ent.ring.domain_id, ent.ring.port);
> > > > +
> > > > +    ent.flags = 0;
> > >
> > > Please memset ent to 0 or initialize it to { }, or else you are
> > > leaking hypervisor stack data to the guest in the padding field.
> >
> > ok - I've added the initializer, thanks.
> > Was it really leaking stack data though because the struct should have
> > been fully populated, including the padding field, with the
> > __copy_from_guest above?
>
> That's my bad, there was no leak here. I somehow missed the
> copy_from_guest above, even when I made a comment on it. Please leave
> the code as-is.

ok - thanks for the quick response. I've taken the initializer back out.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-15  7:56     ` Christopher Clark
@ 2019-01-15  8:36       ` Jan Beulich
  2019-01-15  8:46         ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-15  8:36 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 15.01.19 at 08:56, <christopher.w.clark@gmail.com> wrote:
> On Mon, Jan 14, 2019 at 6:19 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
>> > +    /*
>> > +     * A ring must be large enough to transmit messages, so requires space for:
>> > +     * * 1 message header, plus
>> > +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
>> > +     *   for the message payload to be written into, plus
>> > +     * * 1 more slot, so that the ring cannot be filled to capacity with a
>> > +     *   single message -- see the logic in ringbuf_insert -- allowing for this
>> > +     *   ensures that there can be space remaining when a message is present.
>> > +     * The above determines the minimum acceptable ring size.
>> > +     */
>> > +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
>> > +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
>>
>> These two summands don't look to fulfill the "cannot be filled to
>> capacity" constraint the comment describes, as (aiui) messages
>> can be larger than 16 bytes. What's the deal?
> 
> This is intended to be about bound checking reg.len against a minimum
> size: the smallest ring that you can fit a message onto, as determined
> by the logic in ringbuf_insert. The smallest message you can send is:
> sizeof(struct xen_argo_ring_message_header) + ROUNDUP_MESSAGE(1)
> 
> and then ringbuf_insert won't accept a message unless there is (at
> least) ROUNDUP_MESSAGE(1) space remaining, so that, plus the smallest
> message size, is the smallest viable ring. There's no point accepting
> registration of a ring smaller than that.
> 
> You're right that messages can be larger than 16 bytes, but they can
> only be sent to rings that are larger than that minimum - on a minimum
> sized ring, they'll be rejected by sendv.

So perhaps the comment wants to say "... cannot be filled to capacity
with a single, minimum size message"?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 07/15] argo: implement the register op
  2019-01-15  8:36       ` Jan Beulich
@ 2019-01-15  8:46         ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-15  8:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Tue, Jan 15, 2019 at 12:36 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 15.01.19 at 08:56, <christopher.w.clark@gmail.com> wrote:
> > On Mon, Jan 14, 2019 at 6:19 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> >> > +    /*
> >> > +     * A ring must be large enough to transmit messages, so requires space for:
> >> > +     * * 1 message header, plus
> >> > +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> >> > +     *   for the message payload to be written into, plus
> >> > +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> >> > +     *   single message -- see the logic in ringbuf_insert -- allowing for this
> >> > +     *   ensures that there can be space remaining when a message is present.
> >> > +     * The above determines the minimum acceptable ring size.
> >> > +     */
> >> > +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> >> > +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
> >>
> >> These two summands don't look to fulfill the "cannot be filled to
> >> capacity" constraint the comment describes, as (aiui) messages
> >> can be larger than 16 bytes. What's the deal?
> >
> > This is intended to be about bound checking reg.len against a minimum
> > size: the smallest ring that you can fit a message onto, as determined
> > by the logic in ringbuf_insert. The smallest message you can send is:
> > sizeof(struct xen_argo_ring_message_header) + ROUNDUP_MESSAGE(1)
> >
> > and then ringbuf_insert won't accept a message unless there is (at
> > least) ROUNDUP_MESSAGE(1) space remaining, so that, plus the smallest
> > message size, is the smallest viable ring. There's no point accepting
> > registration of a ring smaller than that.
> >
> > You're right that messages can be larger than 16 bytes, but they can
> > only be sent to rings that are larger than that minimum - on a minimum
> > sized ring, they'll be rejected by sendv.
>
> So perhaps the comment wants to say "... cannot be filled to capacity
> with a single, minimum size message"?

ack, done.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-15  7:21     ` Christopher Clark
@ 2019-01-15  9:01       ` Jan Beulich
  2019-01-15  9:06         ` Andrew Cooper
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-15  9:01 UTC (permalink / raw)
  To: Andrew Cooper, Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, ross.philipson, Jason Andryuk,
	Daniel Smith, Tim Deegan, Konrad Rzeszutek Wilk, Ian Jackson,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

>>> On 15.01.19 at 08:21, <christopher.w.clark@gmail.com> wrote:
> On Mon, Jan 14, 2019 at 6:58 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 07/01/2019 07:42, Christopher Clark wrote:
>> > --- a/docs/misc/xen-command-line.pandoc
>> > +++ b/docs/misc/xen-command-line.pandoc
>> > @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>> >  in combination with cpuidle.  This option is only expected to be useful for
>> >  developers wishing Xen to fall back to older timing methods on newer hardware.
>> >
>> > +### argo
>> > +> `= <boolean>`
>> > +
>> > +> Default: `false`
>> > +
>> > +Enable the Argo hypervisor-mediated interdomain communication mechanism.
>> > +
>> > +This allows domains access to the Argo hypercall, which supports registration
>> > +of memory rings with the hypervisor to receive messages, sending messages to
>> > +other domains by hypercall and querying the ring status of other domains.
>>
>> Please do include a note about CONFIG_ARGO.  I know this doc is
>> inconsistent on the matter (as Kconfig postdates the written entries
>> here), but I have been trying to fix up, and now about half of the
>> documentation does mention appropriate Kconfig information.
> 
> Ack, note added.

Just to voice my view here: While I agree that some form of indication
should be added, I don't think CONFIG_ARGO should be mentioned.
CONFIG_* in general are likely meaningless to the main audience of
this file (admins rather than developers). Hence the wording should be
mostly independent of the precise config option name; there may then
be an annotation naming the option. Omitting the option name, otoh,
has the benefit of not bearing the risk of going stale.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-15  9:01       ` Jan Beulich
@ 2019-01-15  9:06         ` Andrew Cooper
  2019-01-15  9:17           ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Andrew Cooper @ 2019-01-15  9:06 UTC (permalink / raw)
  To: Jan Beulich, Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, ross.philipson, Jason Andryuk,
	Daniel Smith, Tim Deegan, Konrad Rzeszutek Wilk, Ian Jackson,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

On 15/01/2019 09:01, Jan Beulich wrote:
>>>> On 15.01.19 at 08:21, <christopher.w.clark@gmail.com> wrote:
>> On Mon, Jan 14, 2019 at 6:58 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 07/01/2019 07:42, Christopher Clark wrote:
>>>> --- a/docs/misc/xen-command-line.pandoc
>>>> +++ b/docs/misc/xen-command-line.pandoc
>>>> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support on compatible hardware
>>>>  in combination with cpuidle.  This option is only expected to be useful for
>>>>  developers wishing Xen to fall back to older timing methods on newer hardware.
>>>>
>>>> +### argo
>>>> +> `= <boolean>`
>>>> +
>>>> +> Default: `false`
>>>> +
>>>> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
>>>> +
>>>> +This allows domains access to the Argo hypercall, which supports registration
>>>> +of memory rings with the hypervisor to receive messages, sending messages to
>>>> +other domains by hypercall and querying the ring status of other domains.
>>> Please do include a note about CONFIG_ARGO.  I know this doc is
>>> inconsistent on the matter (as Kconfig postdates the written entries
>>> here), but I have been trying to fix up, and now about half of the
>>> documentation does mention appropriate Kconfig information.
>> Ack, note added.
> Just to voice my view here: While I agree that some form of indication
> should be added, I don't think CONFIG_ARGO should be mentioned.
> CONFIG_* in general are likely meaningless to the main audience of
> this file (admins rather than developers). Hence the wording should be
> mostly independent of the precise config option name; there may then
> be an annotation naming the option. Omitting the option name, otoh,
> has the benefit of not bearing the risk of going stale.

I completely disagree.  The exact CONFIG_ name is very important for the
end user who is looking at the documentation wondering "why can't I seem
to enable ARGO?"

"This option is only available when CONFIG_ARGO is compiled in" isn't
going to put anyone off reading the document, but is useful for some who
are reading it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-15  9:06         ` Andrew Cooper
@ 2019-01-15  9:17           ` Jan Beulich
  0 siblings, 0 replies; 104+ messages in thread
From: Jan Beulich @ 2019-01-15  9:17 UTC (permalink / raw)
  To: Andrew Cooper, Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, ross.philipson, Jason Andryuk,
	Daniel Smith, Tim Deegan, Konrad Rzeszutek Wilk, Ian Jackson,
	Rich Persaud, James McKenzie, George Dunlap, Julien Grall,
	Paul Durrant, xen-devel, eric chanudet, Roger Pau Monne

>>> On 15.01.19 at 10:06, <andrew.cooper3@citrix.com> wrote:
> On 15/01/2019 09:01, Jan Beulich wrote:
>>>>> On 15.01.19 at 08:21, <christopher.w.clark@gmail.com> wrote:
>>> On Mon, Jan 14, 2019 at 6:58 AM Andrew Cooper <andrew.cooper3@citrix.com> 
> wrote:
>>>> On 07/01/2019 07:42, Christopher Clark wrote:
>>>>> --- a/docs/misc/xen-command-line.pandoc
>>>>> +++ b/docs/misc/xen-command-line.pandoc
>>>>> @@ -182,6 +182,17 @@ Permit Xen to use "Always Running APIC Timer" support 
> on compatible hardware
>>>>>  in combination with cpuidle.  This option is only expected to be useful for
>>>>>  developers wishing Xen to fall back to older timing methods on newer 
> hardware.
>>>>>
>>>>> +### argo
>>>>> +> `= <boolean>`
>>>>> +
>>>>> +> Default: `false`
>>>>> +
>>>>> +Enable the Argo hypervisor-mediated interdomain communication mechanism.
>>>>> +
>>>>> +This allows domains access to the Argo hypercall, which supports 
> registration
>>>>> +of memory rings with the hypervisor to receive messages, sending messages 
> to
>>>>> +other domains by hypercall and querying the ring status of other domains.
>>>> Please do include a note about CONFIG_ARGO.  I know this doc is
>>>> inconsistent on the matter (as Kconfig postdates the written entries
>>>> here), but I have been trying to fix up, and now about half of the
>>>> documentation does mention appropriate Kconfig information.
>>> Ack, note added.
>> Just to voice my view here: While I agree that some form of indication
>> should be added, I don't think CONFIG_ARGO should be mentioned.
>> CONFIG_* in general are likely meaningless to the main audience of
>> this file (admins rather than developers). Hence the wording should be
>> mostly independent of the precise config option name; there may then
>> be an annotation naming the option. Omitting the option name, otoh,
>> has the benefit of not bearing the risk of going stale.
> 
> I completely disagree.  The exact CONFIG_ name is very important for the
> end user who is looking at the documentation wondering "why can't I seem
> to enable ARGO?"
> 
> "This option is only available when CONFIG_ARGO is compiled in" isn't
> going to put anyone off reading the document, but is useful for some who
> are reading it.

But what's wrong with e.g. "The functionality this option controls is
dependent on a build time condition (the ARGO config option)"? This
avoids the technical detail in the main part of the statement. In
addition I'm also deliberately leaving out the CONFIG_ part of the
option name, as that's entirely an implementation detail.

In no case is "CONFIG_ARGO is compiled in" a sensible statement to
me - "CONFIG_ARGO" simply can't be "compiled in", it can only be
enabled or disabled, whereas what can be compiled in is the code
controlled by CONFIG_ARGO. (Caveat: I may wrongly apply too
much German interpretation here.)

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-14 12:57   ` Jan Beulich
@ 2019-01-17  7:22     ` Christopher Clark
  2019-01-17 11:25       ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-17  7:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 4:57 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 07.01.19 at 08:42, <christopher.w.clark@gmail.com> wrote:
> > Argo doesn't use compat hypercall or argument translation but can use some
> > of the infrastructure for validating the hypercall argument structures to
> > ensure that the struct sizes, offsets and compositions don't vary between 32
> > and 64bit, so add that here in a new dedicated source file for this purpose.
> >
> > Some of the argo hypercall argument structures contain elements that are
> > hypercall argument structure types themselves, and the standard compat
> > structure validation does not handle this, since the types differ in compat
> > vs. non-compat versions; so for some of the tests the exact-type-match check
> > is replaced with a weaker, but still sufficient, sizeof check.
>
> "Still sufficient" on what basis?

I may have overstepped with that assertion, but I meant sufficient for
the purposes of checking that the fields within the generated compat
structures were the same size and offset, so that copys of data to and
from guests that the code performs behave correctly.

> Note that to date we didn't have to  make exceptions like this (iirc),
> so I'm not happy to see some appear.

Yes, that's completely understandable.

> > Then there are additional hypercall argument structures that contain
> > elements that do not have a fixed size (last element, variable length array
> > fields), so we have to then disable that size check too for validating those
> > structures; the coverage of offset of elements is still retained.
>
> There are prior cases of such as well; I'm not sure though if any
> were actually in need of checking through these macros. Still I'd
> like to better understand what it is that doesn't work in that case.
> Quite possibly there's something that can be fixed in the scripts
> (or elsewhere).

Some details of the problem:

Without the macro overrides in place (ie. using the existing
definitions) the build fails on CHECK_argo_send_addr  because this
struct is defined with types that are themselves translated by the
compat processing:

typedef struct xen_argo_send_addr
{
    xen_argo_addr_t src;
    xen_argo_addr_t dst;
} xen_argo_send_addr_t;

compat/argo.c: In function '__checkFstruct_argo_send_addr__src':
xen/include/xen/compat.h:170:18: error: comparison of distinct pointer
types lacks a cast [-Werror]
     return &x->f == &c->f; \
                  ^
xen/include/xen/compat.h:176:5: note: in expansion of macro
'CHECK_FIELD_COMMON_'
     CHECK_FIELD_COMMON_(k, CHECK_NAME_(k, n ## __ ## f, F), n, f)
     ^~~~~~~~~~~~~~~~~~~
xen/include/compat/xlat.h:1238:5: note: in expansion of macro 'CHECK_FIELD_'
     CHECK_FIELD_(struct, argo_send_addr, src); \
     ^~~~~~~~~~~~
compat/argo.c:43:1: note: in expansion of macro 'CHECK_argo_send_addr'
 CHECK_argo_send_addr;
 ^~~~~~~~~~~~~~~~~~~~

because xen_argo_addr_t is detected as a different type than
compat_argo_addr_t -- when in practice is the same size and has the
same fields at the same offsets.

These also fail for the same reason: they also contain types that are
compat-converted:
CHECK_argo_ring_data_ent;
CHECK_argo_iov;
CHECK_argo_ring_data;

So the first override substitutes a "sizeof" check for the exact type
match, but that doesn't work for CHECK_argo_ring_data, because of the
variable-sized array field, so that CHECK has a separate override just
for it -- and again, it's only encountering this because the array is
of a compat-translated type.

>
> > --- a/xen/common/Makefile
> > +++ b/xen/common/Makefile
> > @@ -70,7 +70,7 @@ obj-y += xmalloc_tlsf.o
> >  obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
> >
> >
> > -obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
> > +obj-$(CONFIG_COMPAT) += $(addprefix compat/,argo.o domain.o kernel.o memory.o multicall.o xlat.o)
>
> While a matter of taste to a certain degree, I'm not convinced
> introducing a separate file for this is really necessary, especially
> if some of the overrides to the CHECK_* macros would go away.

ack. I wouldn't have moved them out if the overrides weren't in use;
but I will merge it into the implementation file if that is preferred.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-17  7:22     ` Christopher Clark
@ 2019-01-17 11:25       ` Jan Beulich
  2019-01-20 21:18         ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-17 11:25 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 17.01.19 at 08:22, <christopher.w.clark@gmail.com> wrote:
> Some details of the problem:
> 
> Without the macro overrides in place (ie. using the existing
> definitions) the build fails on CHECK_argo_send_addr  because this
> struct is defined with types that are themselves translated by the
> compat processing:

But that's a normal situation.

> typedef struct xen_argo_send_addr
> {
>     xen_argo_addr_t src;
>     xen_argo_addr_t dst;
> } xen_argo_send_addr_t;
> 
> compat/argo.c: In function '__checkFstruct_argo_send_addr__src':
> xen/include/xen/compat.h:170:18: error: comparison of distinct pointer
> types lacks a cast [-Werror]
>      return &x->f == &c->f; \
>                   ^
> xen/include/xen/compat.h:176:5: note: in expansion of macro
> 'CHECK_FIELD_COMMON_'
>      CHECK_FIELD_COMMON_(k, CHECK_NAME_(k, n ## __ ## f, F), n, f)
>      ^~~~~~~~~~~~~~~~~~~
> xen/include/compat/xlat.h:1238:5: note: in expansion of macro 'CHECK_FIELD_'
>      CHECK_FIELD_(struct, argo_send_addr, src); \
>      ^~~~~~~~~~~~
> compat/argo.c:43:1: note: in expansion of macro 'CHECK_argo_send_addr'
>  CHECK_argo_send_addr;
>  ^~~~~~~~~~~~~~~~~~~~
> 
> because xen_argo_addr_t is detected as a different type than
> compat_argo_addr_t -- when in practice is the same size and has the
> same fields at the same offsets.

Did you perhaps not add entries for the inner structures to xlat.lst?

>> > --- a/xen/common/Makefile
>> > +++ b/xen/common/Makefile
>> > @@ -70,7 +70,7 @@ obj-y += xmalloc_tlsf.o
>> >  obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
>> >
>> >
>> > -obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
>> > +obj-$(CONFIG_COMPAT) += $(addprefix compat/,argo.o domain.o kernel.o memory.o multicall.o xlat.o)
>>
>> While a matter of taste to a certain degree, I'm not convinced
>> introducing a separate file for this is really necessary, especially
>> if some of the overrides to the CHECK_* macros would go away.
> 
> ack. I wouldn't have moved them out if the overrides weren't in use;
> but I will merge it into the implementation file if that is preferred.

Well - let's first see whether the overrides are really needed. If so,
keeping this in a separate file might indeed be better.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-17 11:25       ` Jan Beulich
@ 2019-01-20 21:18         ` Christopher Clark
  2019-01-21 12:03           ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-01-20 21:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Thu, Jan 17, 2019 at 3:25 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 17.01.19 at 08:22, <christopher.w.clark@gmail.com> wrote:
> > Some details of the problem:
> >
> > Without the macro overrides in place (ie. using the existing
> > definitions) the build fails on CHECK_argo_send_addr  because this
> > struct is defined with types that are themselves translated by the
> > compat processing:
>
> But that's a normal situation.

I thought it would be too but I haven't found a direct equivalent to
what this header needs. I'll outline the results of my examination
below.

>
> > typedef struct xen_argo_send_addr
> > {
> >     xen_argo_addr_t src;
> >     xen_argo_addr_t dst;
> > } xen_argo_send_addr_t;
> >
> > compat/argo.c: In function '__checkFstruct_argo_send_addr__src':
> > xen/include/xen/compat.h:170:18: error: comparison of distinct pointer
> > types lacks a cast [-Werror]
> >      return &x->f == &c->f; \
> >                   ^
> > xen/include/xen/compat.h:176:5: note: in expansion of macro
> > 'CHECK_FIELD_COMMON_'
> >      CHECK_FIELD_COMMON_(k, CHECK_NAME_(k, n ## __ ## f, F), n, f)
> >      ^~~~~~~~~~~~~~~~~~~
> > xen/include/compat/xlat.h:1238:5: note: in expansion of macro 'CHECK_FIELD_'
> >      CHECK_FIELD_(struct, argo_send_addr, src); \
> >      ^~~~~~~~~~~~
> > compat/argo.c:43:1: note: in expansion of macro 'CHECK_argo_send_addr'
> >  CHECK_argo_send_addr;
> >  ^~~~~~~~~~~~~~~~~~~~
> >
> > because xen_argo_addr_t is detected as a different type than
> > compat_argo_addr_t -- when in practice is the same size and has the
> > same fields at the same offsets.
>
> Did you perhaps not add entries for the inner structures to xlat.lst?

No, unfortunately I did.

Here are my findings after exploring the compat machinery, in relation
to understanding its processing of the the Argo public header file, and
its production of the compat header file and validation macros.

1) Two alternative validation macros are possible as output, depending
on how struct fields are declared within structs in the input header.

Struct fields that are themselves structs can be declared two different
ways:

* by type (I'll call this "type form"): xen_mystruct_t field;

eg.
    typedef struct xen_argo_send_addr
    {
        xen_argo_addr_t src;
        xen_argo_addr_t dst;
    } xen_argo_send_addr_t;


* by struct name ("struct form"): struct xen_mystruct field;

eg.
    typedef struct xen_argo_send_addr
    {
        struct xen_argo_addr src;
        struct xen_argo_addr dst;
    } xen_argo_send_addr_t;


In the validation macros that are produced for xen_argo_send_addr, the
"struct form" contains:
    CHECK_mystruct;

and the "type form" contains instead:
    CHECK_FIELD_(struct, mystruct, field);

These two validation macros do different things; the CHECK_FIELD one
is stronger because it tests that the offset, size and type of the
field match, whereas the other checks the structure of the inner struct,
but not its placement within the outer or matching type in compat vs
non-compat.

After reviewing other public headers within Xen and the structs in the
xlat.lst list, it looks like some existing structs are passing
CHECKs by using the "struct" form, even though the checks are weaker.


2. The "type form" checks cannot work for fields that are themselves
compat-translated.

For fields that are struct types that are themselves compat-translated,
the CHECK_FIELD fails because it does a typeof check in the macro,
which cannot pass since the non-compat type will never equal the
compat-type -- the fields really are different types.

So when defining a struct field with a type that will be translated,
you have to declare them using the "struct" form, not the "type" form.

A prior example of this is: struct mcinfo_extended
which declares an array field using the struct form, and not its type.

A problem with what the "struct" form generates is that the test that it
produces doesn't check the offset of the field within the struct it
belongs to. Enabling CHECK_FIELD to work looks preferable as it
provides better assurance of correctness.

One way to make CHECK_FIELD work is to override the CHECK_COMMON macro,
and disable the typeof check when necessary, as has been presented in
the versions of the Argo patch series so far.


3. A challenge with using the "struct" form, following from the result
of point 2, occurs when it's a XEN_GUEST_HANDLE field within the struct.
It's not obvious how to declare that field using the "struct" form
rather than the "type" form.
This affects the argo_iov struct.

4. Macros to perform "struct form" checks cannot be repeated.

When using the "struct" form, it's problem when the struct contains two
fields of the same compat-translated type.

eg. consider the "struct form" version of xen_argo_send_addr, which has
two fields of struct xen_argo_addr:

    typedef struct xen_argo_send_addr
    {
        struct xen_argo_addr src;
        struct xen_argo_addr dst;
    } xen_argo_send_addr_t;

which then generates this in the compat header:

    #define CHECK_argo_send_addr \
        CHECK_SIZE_(struct, argo_send_addr); \
        CHECK_argo_addr; \
        CHECK_argo_addr

and the second macro invocation of CHECK_argo_addr just breaks, with the
build failing due to redefinition of a symbol that is already defined.

A (horrible, unacceptable) workaround that unblocks further discovery
is to redefine xen_argo_send_addr to use an array:

typedef struct xen_argo_send_addr
{
    struct xen_argo_addr addrs[2];
} xen_argo_send_addr_t;

which does pass, since it now generates this:

#define CHECK_argo_send_addr \
    CHECK_SIZE_(struct, argo_send_addr); \
    CHECK_argo_addr

and then CHECK_argo_send_addr passes, *but* the fields are no longer
named as they should be, which is not OK.

The "no repeated checks" problem also occurs when another separate
struct contains a field of a type that has already been checked:
whichever CHECK is performed second will break.

eg.
typedef struct xen_argo_ring_data_ent
{
    struct xen_argo_addr ring;
    uint16_t flags;
    uint16_t pad;
    uint32_t space_required;
    uint32_t max_message_size;
} xen_argo_ring_data_ent_t;

also has a field of type xen_argo_addr, which produces CHECK_argo_addr,
which then fails because that was already tested in
CHECK_argo_send_addr.

Anyway, hopefully this provides context for evaluating the method of
passing the compat tests that has been proposed in the Argo series:
selective override of the CHECK_FIELD_COMMON macro to disable the typeof
checks only for validating those structs that require it to be turned
off.


> >> > --- a/xen/common/Makefile
> >> > +++ b/xen/common/Makefile
> >> > @@ -70,7 +70,7 @@ obj-y += xmalloc_tlsf.o
> >> >  obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
> >> >
> >> >
> >> > -obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
> >> > +obj-$(CONFIG_COMPAT) += $(addprefix compat/,argo.o domain.o kernel.o memory.o multicall.o xlat.o)
> >>
> >> While a matter of taste to a certain degree, I'm not convinced
> >> introducing a separate file for this is really necessary, especially
> >> if some of the overrides to the CHECK_* macros would go away.
> >
> > ack. I wouldn't have moved them out if the overrides weren't in use;
> > but I will merge it into the implementation file if that is preferred.
>
> Well - let's first see whether the overrides are really needed. If so,
> keeping this in a separate file might indeed be better.

ack.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-20 21:18         ` Christopher Clark
@ 2019-01-21 12:03           ` Jan Beulich
  2019-01-22 11:08             ` Jan Beulich
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-21 12:03 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 20.01.19 at 22:18, <christopher.w.clark@gmail.com> wrote:
> On Thu, Jan 17, 2019 at 3:25 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 17.01.19 at 08:22, <christopher.w.clark@gmail.com> wrote:
>> > Some details of the problem:
>> >
>> > Without the macro overrides in place (ie. using the existing
>> > definitions) the build fails on CHECK_argo_send_addr  because this
>> > struct is defined with types that are themselves translated by the
>> > compat processing:
>>
>> But that's a normal situation.
> 
> I thought it would be too but I haven't found a direct equivalent to
> what this header needs. I'll outline the results of my examination
> below.

arch-x86/xen-mca.h has

struct mcinfo_global {
    struct mcinfo_common common;
    ...

which results in

#define CHECK_mcinfo_global \
    CHECK_SIZE_(struct, mcinfo_global); \
    CHECK_mcinfo_common; \
    ...

and separately

#define CHECK_mcinfo_common ...

which I would assume ought to similarly work for the Argo
structures.

> 3. A challenge with using the "struct" form, following from the result
> of point 2, occurs when it's a XEN_GUEST_HANDLE field within the struct.
> It's not obvious how to declare that field using the "struct" form
> rather than the "type" form.
> This affects the argo_iov struct.

Structures containing handles are intentionally not covered
by the CHECK_* machinery, because handles necessarily
need translation due to their different widths in 32- and
64-bit modes on x86.

> 4. Macros to perform "struct form" checks cannot be repeated.
> 
> When using the "struct" form, it's problem when the struct contains two
> fields of the same compat-translated type.
> 
> eg. consider the "struct form" version of xen_argo_send_addr, which has
> two fields of struct xen_argo_addr:
> 
>     typedef struct xen_argo_send_addr
>     {
>         struct xen_argo_addr src;
>         struct xen_argo_addr dst;
>     } xen_argo_send_addr_t;
> 
> which then generates this in the compat header:
> 
>     #define CHECK_argo_send_addr \
>         CHECK_SIZE_(struct, argo_send_addr); \
>         CHECK_argo_addr; \
>         CHECK_argo_addr
> 
> and the second macro invocation of CHECK_argo_addr just breaks, with the
> build failing due to redefinition of a symbol that is already defined.

Hmm, this looks like something that indeed wants fixing.

> The "no repeated checks" problem also occurs when another separate
> struct contains a field of a type that has already been checked:
> whichever CHECK is performed second will break.
> 
> eg.
> typedef struct xen_argo_ring_data_ent
> {
>     struct xen_argo_addr ring;
>     uint16_t flags;
>     uint16_t pad;
>     uint32_t space_required;
>     uint32_t max_message_size;
> } xen_argo_ring_data_ent_t;
> 
> also has a field of type xen_argo_addr, which produces CHECK_argo_addr,
> which then fails because that was already tested in
> CHECK_argo_send_addr.

Hmm, I think the mcinfo example above contradicts this, because
struct mcinfo_common is used by multiple other structures.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-21 12:03           ` Jan Beulich
@ 2019-01-22 11:08             ` Jan Beulich
  2019-01-23 21:14               ` Christopher Clark
  0 siblings, 1 reply; 104+ messages in thread
From: Jan Beulich @ 2019-01-22 11:08 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, ross.philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

>>> On 21.01.19 at 13:03, <JBeulich@suse.com> wrote:
>>>> On 20.01.19 at 22:18, <christopher.w.clark@gmail.com> wrote:
>> The "no repeated checks" problem also occurs when another separate
>> struct contains a field of a type that has already been checked:
>> whichever CHECK is performed second will break.
>> 
>> eg.
>> typedef struct xen_argo_ring_data_ent
>> {
>>     struct xen_argo_addr ring;
>>     uint16_t flags;
>>     uint16_t pad;
>>     uint32_t space_required;
>>     uint32_t max_message_size;
>> } xen_argo_ring_data_ent_t;
>> 
>> also has a field of type xen_argo_addr, which produces CHECK_argo_addr,
>> which then fails because that was already tested in
>> CHECK_argo_send_addr.
> 
> Hmm, I think the mcinfo example above contradicts this, because
> struct mcinfo_common is used by multiple other structures.

Due to

CHECK_mcinfo_common;
# undef xen_mcinfo_common
# undef CHECK_mcinfo_common
# define CHECK_mcinfo_common         struct mcinfo_common

which I think would be easy enough to use in your case as well
(until we could perhaps get around and address the underlying
issue, albeit it's not really clear to me how that should be done).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery
  2019-01-22 11:08             ` Jan Beulich
@ 2019-01-23 21:14               ` Christopher Clark
  0 siblings, 0 replies; 104+ messages in thread
From: Christopher Clark @ 2019-01-23 21:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Ross Philipson,
	Jason Andryuk, Daniel Smith, Andrew Cooper,
	Konrad Rzeszutek Wilk, Ian Jackson, Rich Persaud, James McKenzie,
	George Dunlap, Julien Grall, Paul Durrant, xen-devel,
	eric chanudet, Roger Pau Monne

On Mon, Jan 21, 2019 at 4:03 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 20.01.19 at 22:18, <christopher.w.clark@gmail.com> wrote:
> > On Thu, Jan 17, 2019 at 3:25 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 17.01.19 at 08:22, <christopher.w.clark@gmail.com> wrote:
> >
> > 3. A challenge with using the "struct" form, following from the result
> > of point 2, occurs when it's a XEN_GUEST_HANDLE field within the struct.
> > It's not obvious how to declare that field using the "struct" form
> > rather than the "type" form.
> > This affects the argo_iov struct.
>
> Structures containing handles are intentionally not covered
> by the CHECK_* machinery, because handles necessarily
> need translation due to their different widths in 32- and
> 64-bit modes on x86.

ack.

>
> > 4. Macros to perform "struct form" checks cannot be repeated.
> >
> > When using the "struct" form, it's problem when the struct contains two
> > fields of the same compat-translated type.
> >
> > eg. consider the "struct form" version of xen_argo_send_addr, which has
> > two fields of struct xen_argo_addr:
> >
> >     typedef struct xen_argo_send_addr
> >     {
> >         struct xen_argo_addr src;
> >         struct xen_argo_addr dst;
> >     } xen_argo_send_addr_t;
> >
> > which then generates this in the compat header:
> >
> >     #define CHECK_argo_send_addr \
> >         CHECK_SIZE_(struct, argo_send_addr); \
> >         CHECK_argo_addr; \
> >         CHECK_argo_addr
> >
> > and the second macro invocation of CHECK_argo_addr just breaks, with the
> > build failing due to redefinition of a symbol that is already defined.
>
> Hmm, this looks like something that indeed wants fixing.

I have a patch to fix that, that it turns out I will not need but can
post separately if this is still wanted -- copied here for illustration.
(apologies in advance if this gets mail-client mangled here).

diff --git a/xen/tools/get-fields.sh b/xen/tools/get-fields.sh
index 45a0e2e..14c6859 100644
--- a/xen/tools/get-fields.sh
+++ b/xen/tools/get-fields.sh
@@ -438,7 +438,7 @@ build_check ()
 {
        echo
        echo "#define CHECK_$1 \\"
-       local level=1 fields= kind= id= arrlvl=1 token
+       local level=1 fields= kind= id= arrlvl=1 token suppress_dups=
        for token in $2
        do
                case "$token" in
@@ -470,8 +470,12 @@ build_check ()
                [\,\;])
                        if [ $level = 2 -a -n "$(echo $id | $SED
's,^_pad[[:digit:]]*,,')" ]
                        then
-                               check_field $kind $1 $id "$fields"
-                               test "$token" != ";" || fields= id=
+                if [ "${suppress_dups#*|$kind $1|}" = "${suppress_dups}" ]
+                then
+                    check_field $kind $1 $id "$fields"
+                    [ -z "$fields" ] ||
suppress_dups="${suppress_dups:-|}$kind $1|"
+                    test "$token" != ";" || fields= id=
+                fi
                        fi
                        ;;
                esac

On Tue, Jan 22, 2019 at 3:08 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 21.01.19 at 13:03, <JBeulich@suse.com> wrote:
> >>>> On 20.01.19 at 22:18, <christopher.w.clark@gmail.com> wrote:
> >> The "no repeated checks" problem also occurs when another separate
> >> struct contains a field of a type that has already been checked:
> >> whichever CHECK is performed second will break.
> >>
> >> eg.
> >> typedef struct xen_argo_ring_data_ent
> >> {
> >>     struct xen_argo_addr ring;
> >>     uint16_t flags;
> >>     uint16_t pad;
> >>     uint32_t space_required;
> >>     uint32_t max_message_size;
> >> } xen_argo_ring_data_ent_t;
> >>
> >> also has a field of type xen_argo_addr, which produces CHECK_argo_addr,
> >> which then fails because that was already tested in
> >> CHECK_argo_send_addr.
> >
> > Hmm, I think the mcinfo example above contradicts this, because
> > struct mcinfo_common is used by multiple other structures.
>
> Due to
>
> CHECK_mcinfo_common;
> # undef xen_mcinfo_common
> # undef CHECK_mcinfo_common
> # define CHECK_mcinfo_common         struct mcinfo_common
>
> which I think would be easy enough to use in your case as well
> (until we could perhaps get around and address the underlying
> issue, albeit it's not really clear to me how that should be done).

ack, this technique works for the Argo data structures, so I've
applied it, dropped the previous macro overrides, moved the checks
into the common/argo.c file and dropped common/compat/argo.c,
with each check being added at the same time as the structs go in
through the series.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-01-14 14:46   ` Wei Liu
                       ` (2 preceding siblings ...)
  2019-01-14 19:42     ` Roger Pau Monné
@ 2019-02-04 20:56     ` Christopher Clark
  2019-02-05 10:32       ` Wei Liu
  3 siblings, 1 reply; 104+ messages in thread
From: Christopher Clark @ 2019-02-04 20:56 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, Ross Philipson, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Jason Andryuk, Ian Jackson,
	Rich Persaud, Tim Deegan, Daniel Smith, Julien Grall,
	Paul Durrant, Jan Beulich, xen-devel, James McKenzie,
	Eric Chanudet, Roger Pau Monne

On Mon, Jan 14, 2019 at 6:47 AM Wei Liu <wei.liu2@citrix.com> wrote:
>
> Hi all
>
> The locking scheme seems to be remaining sticking point. The rest are
> mostly cosmetic issues (FAOD, they still need to be addressed).  Frankly
> I don't think there is enough time to address all the technical details,
> but let me sum up each side's position and see if we can reach an
> amicable solution.
>
> From maintainers and reviewers' point of view:
>
> 1. Maintainers / reviewers don't like complexity unless absolutely
>    necessary.
> 2. Maintainers / reviewers feel they have a responsibility to understand
>    the code and algorithm.
>
> Yet being the gatekeepers doesn't necessarily mean we understand every
> technical details and every usecase. We would like to, but most of the
> time it is unrealistic.
>
> Down to this specific patch series:
>
> Roger thinks the locking scheme is too complex. Christopher argues
> that's necessary for short-live channels to be performant.
>
> Both have their point.
>
> I think having a complex locking scheme is inevitable, just like we did
> for performant grant table several years ago.  Regardless of the timing
> issue we have at hand, asking Christopher to implement a stripped down
> version creates more work for him.
>
> Yet ignoring Roger's concerns is unfair to him as well, since he put in
> so much time and effort to understand the algorithm and provide
> suggestions. It is in fact unreasonable to ask anyone to fully
> understand the locking mechanism and check the implementation is correct
> in a few days (given the series was posted in Dec and there were major
> holidays in between, plus everyone had other commitments).
>
> To unblock this, how about we make Christopher maintainer of Argo? He
> and OpenXT will be on the hook for further improvement. And I believe it
> would be in their best interest to keep Argo bug-free and eventually
> make it become supported.
>
> So:
>
> 1. Make sure Argo is self-contained -- this requires careful review for
>    interaction between Argo and other parts of the hypervisor.
> 2. Argo is going to be experimental and off-by-default -- this is the
>    default status for new feature anyway.
> 3. Make Christopher maintainer of Argo -- this would be a natural thing
>    to do anyway.
>

Wei,

do you have any feedback on the latest argo MAINTAINERS patch?

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt
  2019-02-04 20:56     ` Christopher Clark
@ 2019-02-05 10:32       ` Wei Liu
  0 siblings, 0 replies; 104+ messages in thread
From: Wei Liu @ 2019-02-05 10:32 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Stefano Stabellini, Wei Liu, Ross Philipson,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Jason Andryuk, Ian Jackson, Rich Persaud, Tim Deegan,
	Daniel Smith, Julien Grall, Paul Durrant, Jan Beulich, xen-devel,
	James McKenzie, Eric Chanudet, Roger Pau Monne

On Mon, Feb 04, 2019 at 12:56:13PM -0800, Christopher Clark wrote:
> 
> Wei,
> 
> do you have any feedback on the latest argo MAINTAINERS patch?

It looks fine to me.

Wei.

> 
> Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2019-02-05 10:32 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07  7:42 [PATCH v3 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
2019-01-07  7:42 ` [PATCH v3 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
2019-01-08 15:46   ` Jan Beulich
2019-01-07  7:42 ` [PATCH v3 02/15] argo: introduce the argo_op hypercall boilerplate Christopher Clark
2019-01-07  7:42 ` [PATCH v3 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
2019-01-08 15:50   ` Jan Beulich
2019-01-10  9:28   ` Roger Pau Monné
2019-01-07  7:42 ` [PATCH v3 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
2019-01-08 22:08   ` Ross Philipson
2019-01-08 22:23     ` Christopher Clark
2019-01-08 22:54   ` Jason Andryuk
2019-01-09  6:48     ` Christopher Clark
2019-01-09 14:15       ` Jason Andryuk
2019-01-09 23:24         ` Christopher Clark
2019-01-09  9:35     ` Jan Beulich
2019-01-09 14:26       ` Jason Andryuk
2019-01-09 14:38         ` Jan Beulich
2019-01-10 23:29           ` Christopher Clark
2019-01-10 10:19   ` Roger Pau Monné
2019-01-10 11:52     ` Jan Beulich
2019-01-10 12:26       ` Roger Pau Monné
2019-01-10 12:46         ` Jan Beulich
2019-01-11  6:03     ` Christopher Clark
2019-01-11  9:27       ` Roger Pau Monné
2019-01-14  8:32         ` Christopher Clark
2019-01-14 11:32           ` Roger Pau Monné
2019-01-14 14:28             ` Rich Persaud
2019-01-10 16:16   ` Eric Chanudet
2019-01-11  6:05     ` Christopher Clark
2019-01-11 11:54   ` Jan Beulich
2019-01-14  8:33     ` Christopher Clark
2019-01-14 14:46   ` Wei Liu
2019-01-14 15:29     ` Lars Kurth
2019-01-14 18:16     ` Christopher Clark
2019-01-14 19:42     ` Roger Pau Monné
2019-02-04 20:56     ` Christopher Clark
2019-02-05 10:32       ` Wei Liu
2019-01-14 14:58   ` Andrew Cooper
2019-01-14 15:12     ` Jan Beulich
2019-01-15  7:24       ` Christopher Clark
2019-01-15  7:21     ` Christopher Clark
2019-01-15  9:01       ` Jan Beulich
2019-01-15  9:06         ` Andrew Cooper
2019-01-15  9:17           ` Jan Beulich
2019-01-07  7:42 ` [PATCH v3 05/15] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
2019-01-07  7:42 ` [PATCH v3 06/15] xen/arm: introduce guest_handle_for_field() Christopher Clark
2019-01-08 22:03   ` Stefano Stabellini
2019-01-07  7:42 ` [PATCH v3 07/15] argo: implement the register op Christopher Clark
2019-01-09 15:55   ` Wei Liu
2019-01-09 16:00     ` Christopher Clark
2019-01-09 17:02       ` Julien Grall
2019-01-09 17:18         ` Stefano Stabellini
2019-01-09 18:13           ` Julien Grall
2019-01-09 20:33             ` Christopher Clark
2019-01-09 17:54         ` Wei Liu
2019-01-09 18:28           ` Julien Grall
2019-01-09 20:38             ` Christopher Clark
2019-01-10 11:24   ` Roger Pau Monné
2019-01-10 11:57     ` Jan Beulich
2019-01-11  6:30       ` Christopher Clark
2019-01-11  6:29     ` Christopher Clark
2019-01-11  9:38       ` Roger Pau Monné
2019-01-10 20:11   ` Eric Chanudet
2019-01-11  6:09     ` Christopher Clark
2019-01-14 14:19   ` Jan Beulich
2019-01-15  7:56     ` Christopher Clark
2019-01-15  8:36       ` Jan Beulich
2019-01-15  8:46         ` Christopher Clark
2019-01-14 15:31   ` Andrew Cooper
2019-01-15  8:02     ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 08/15] argo: implement the unregister op Christopher Clark
2019-01-10 11:40   ` Roger Pau Monné
2019-01-15  8:05     ` Christopher Clark
2019-01-14 15:06   ` Jan Beulich
2019-01-15  8:11     ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
2019-01-09 18:05   ` Jason Andryuk
2019-01-10  2:08     ` Christopher Clark
2019-01-09 18:57   ` Roger Pau Monné
2019-01-10  3:09     ` Christopher Clark
2019-01-10 12:01       ` Roger Pau Monné
2019-01-10 12:13         ` Jan Beulich
2019-01-10 12:40           ` Roger Pau Monné
2019-01-10 12:53             ` Jan Beulich
2019-01-11  6:37               ` Christopher Clark
2019-01-10 21:41   ` Eric Chanudet
2019-01-11  7:12     ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 10/15] argo: implement the notify op Christopher Clark
2019-01-10 12:21   ` Roger Pau Monné
2019-01-15  6:53     ` Christopher Clark
2019-01-15  8:06       ` Roger Pau Monné
2019-01-15  8:32         ` Christopher Clark
2019-01-07  7:42 ` [PATCH v3 11/15] xsm, argo: XSM control for argo register Christopher Clark
2019-01-07  7:42 ` [PATCH v3 12/15] xsm, argo: XSM control for argo message send operation Christopher Clark
2019-01-07  7:42 ` [PATCH v3 13/15] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
2019-01-07  7:42 ` [PATCH v3 14/15] xsm, argo: notify: don't describe rings that cannot be sent to Christopher Clark
2019-01-07  7:42 ` [PATCH v3 15/15] argo: validate hypercall arg structures via compat machinery Christopher Clark
2019-01-14 12:57   ` Jan Beulich
2019-01-17  7:22     ` Christopher Clark
2019-01-17 11:25       ` Jan Beulich
2019-01-20 21:18         ` Christopher Clark
2019-01-21 12:03           ` Jan Beulich
2019-01-22 11:08             ` Jan Beulich
2019-01-23 21:14               ` Christopher Clark

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.