All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] evtchn: recent XSAs follow-on
@ 2020-10-20 14:06 Jan Beulich
  2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini

These are grouped into a series largely because of their origin,
not so much because there are heavy dependencies among them.
Compared to v1, besides there being 3 new patches, some
re-ordering has been done; in particular the last patch isn't
ready yet, but I still wanted to include it to have a chance to
discuss what changes to make. See also the individual patches.

1: avoid race in get_xen_consumer()
2: replace FIFO-specific header by generic private one
3: rename and adjust guest_enabled_event()
4: let evtchn_set_priority() acquire the per-channel lock
5: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
6: convert vIRQ lock to an r/w one
7: convert domain event lock to an r/w one
8: don't call Xen consumer callback with per-channel lock held

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
@ 2020-10-20 14:08 ` Jan Beulich
  2020-10-21 15:46   ` Roger Pau Monné
                     ` (2 more replies)
  2020-10-20 14:08 ` [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one Jan Beulich
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini, Roger Pau Monné

There's no global lock around the updating of this global piece of data.
Make use of cmpxchgptr() to avoid two entities racing with their
updates.

While touching the functionality, mark xen_consumers[] read-mostly (or
else the if() condition could use the result of cmpxchgptr(), writing to
the slot unconditionally).

The use of cmpxchgptr() here points out (by way of clang warning about
it) that its original use of const was slightly wrong. Adjust the
placement, or else undefined behavior of const qualifying a function
type will result.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Use (and hence generalize) cmpxchgptr(). Add comment. Expand /
    adjust description.

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -57,7 +57,8 @@
  * with a pointer, we stash them dynamically in a small lookup array which
  * can be indexed by a small integer.
  */
-static xen_event_channel_notification_t xen_consumers[NR_XEN_CONSUMERS];
+static xen_event_channel_notification_t __read_mostly
+    xen_consumers[NR_XEN_CONSUMERS];
 
 /* Default notification action: wake up from wait_on_xen_event_channel(). */
 static void default_xen_notification_fn(struct vcpu *v, unsigned int port)
@@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
 
     for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
     {
+        /* Use cmpxchgptr() in lieu of a global lock. */
         if ( xen_consumers[i] == NULL )
-            xen_consumers[i] = fn;
+            cmpxchgptr(&xen_consumers[i], NULL, fn);
         if ( xen_consumers[i] == fn )
             break;
     }
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -148,13 +148,6 @@ static always_inline unsigned long cmpxc
     return prev;
 }
 
-#define cmpxchgptr(ptr,o,n) ({                                          \
-    const __typeof__(**(ptr)) *__o = (o);                               \
-    __typeof__(**(ptr)) *__n = (n);                                     \
-    ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)__o,            \
-                                   (unsigned long)__n,sizeof(*(ptr)))); \
-})
-
 /*
  * Undefined symbol to cause link failure if a wrong size is used with
  * arch_fetch_and_add().
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -178,6 +178,17 @@ unsigned long long parse_size_and_unit(c
 
 uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c);
 
+/*
+ * A slightly more typesafe variant of cmpxchg() when the entities dealt with
+ * are pointers.
+ */
+#define cmpxchgptr(ptr, o, n) ({                                        \
+    __typeof__(**(ptr)) *const o_ = (o);                                \
+    __typeof__(**(ptr)) *n_ = (n);                                      \
+    ((__typeof__(*(ptr)))__cmpxchg(ptr, (unsigned long)o_,              \
+                                   (unsigned long)n_, sizeof(*(ptr)))); \
+})
+
 #define TAINT_SYNC_CONSOLE              (1u << 0)
 #define TAINT_MACHINE_CHECK             (1u << 1)
 #define TAINT_ERROR_INJECT              (1u << 2)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
  2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
@ 2020-10-20 14:08 ` Jan Beulich
  2020-10-21 16:00   ` Roger Pau Monné
  2020-10-30 10:21   ` Julien Grall
  2020-10-20 14:09 ` [PATCH v2 3/8] evtchn: rename and adjust guest_enabled_event() Jan Beulich
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini

Having a FIFO specific header is not (or at least no longer) warranted
with just three function declarations left there. Introduce a private
header instead, moving there some further items from xen/event.h.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.
---
TBD: If - considering the layering violation that's there anyway - we
     allowed PV shim code to make use of this header, a few more items
     could be moved out of "public sight".

--- a/xen/common/event_2l.c
+++ b/xen/common/event_2l.c
@@ -7,11 +7,12 @@
  * Version 2 or later.  See the file COPYING for more details.
  */
 
+#include "event_channel.h"
+
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
-#include <xen/event.h>
 
 #include <asm/guest_atomics.h>
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -14,17 +14,17 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "event_channel.h"
+
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
-#include <xen/event.h>
 #include <xen/irq.h>
 #include <xen/iocap.h>
 #include <xen/compat.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
-#include <xen/event_fifo.h>
 #include <asm/current.h>
 
 #include <public/xen.h>
--- /dev/null
+++ b/xen/common/event_channel.h
@@ -0,0 +1,63 @@
+/* Event channel handling private header. */
+
+#include <xen/event.h>
+
+static inline unsigned int max_evtchns(const struct domain *d)
+{
+    return d->evtchn_fifo ? EVTCHN_FIFO_NR_CHANNELS
+                          : BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
+}
+
+static inline bool evtchn_is_busy(const struct domain *d,
+                                  const struct evtchn *evtchn)
+{
+    return d->evtchn_port_ops->is_busy &&
+           d->evtchn_port_ops->is_busy(d, evtchn);
+}
+
+static inline void evtchn_port_unmask(struct domain *d,
+                                      struct evtchn *evtchn)
+{
+    if ( evtchn_usable(evtchn) )
+        d->evtchn_port_ops->unmask(d, evtchn);
+}
+
+static inline int evtchn_port_set_priority(struct domain *d,
+                                           struct evtchn *evtchn,
+                                           unsigned int priority)
+{
+    if ( !d->evtchn_port_ops->set_priority )
+        return -ENOSYS;
+    if ( !evtchn_usable(evtchn) )
+        return -EACCES;
+    return d->evtchn_port_ops->set_priority(d, evtchn, priority);
+}
+
+static inline void evtchn_port_print_state(struct domain *d,
+                                           const struct evtchn *evtchn)
+{
+    d->evtchn_port_ops->print_state(d, evtchn);
+}
+
+/* 2-level */
+
+void evtchn_2l_init(struct domain *d);
+
+/* FIFO */
+
+struct evtchn_init_control;
+struct evtchn_expand_array;
+
+int evtchn_fifo_init_control(struct evtchn_init_control *init_control);
+int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array);
+void evtchn_fifo_destroy(struct domain *d);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -7,12 +7,12 @@
  * Version 2 or later.  See the file COPYING for more details.
  */
 
+#include "event_channel.h"
+
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
-#include <xen/event.h>
-#include <xen/event_fifo.h>
 #include <xen/paging.h>
 #include <xen/mm.h>
 #include <xen/domain_page.h>
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -105,12 +105,6 @@ void notify_via_xen_event_channel(struct
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
-static inline unsigned int max_evtchns(const struct domain *d)
-{
-    return d->evtchn_fifo ? EVTCHN_FIFO_NR_CHANNELS
-                          : BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
-}
-
 static inline bool_t port_is_valid(struct domain *d, unsigned int p)
 {
     if ( p >= read_atomic(&d->valid_evtchns) )
@@ -176,8 +170,6 @@ static bool evtchn_usable(const struct e
 
 void evtchn_check_pollers(struct domain *d, unsigned int port);
 
-void evtchn_2l_init(struct domain *d);
-
 /* Close all event channels and reset to 2-level ABI. */
 int evtchn_reset(struct domain *d, bool resuming);
 
@@ -227,13 +219,6 @@ static inline void evtchn_port_clear_pen
         d->evtchn_port_ops->clear_pending(d, evtchn);
 }
 
-static inline void evtchn_port_unmask(struct domain *d,
-                                      struct evtchn *evtchn)
-{
-    if ( evtchn_usable(evtchn) )
-        d->evtchn_port_ops->unmask(d, evtchn);
-}
-
 static inline bool evtchn_is_pending(const struct domain *d,
                                      const struct evtchn *evtchn)
 {
@@ -259,13 +244,6 @@ static inline bool evtchn_port_is_masked
     return rc;
 }
 
-static inline bool evtchn_is_busy(const struct domain *d,
-                                  const struct evtchn *evtchn)
-{
-    return d->evtchn_port_ops->is_busy &&
-           d->evtchn_port_ops->is_busy(d, evtchn);
-}
-
 /* Returns negative errno, zero for not pending, or positive for pending. */
 static inline int evtchn_port_poll(struct domain *d, evtchn_port_t port)
 {
@@ -285,21 +263,4 @@ static inline int evtchn_port_poll(struc
     return rc;
 }
 
-static inline int evtchn_port_set_priority(struct domain *d,
-                                           struct evtchn *evtchn,
-                                           unsigned int priority)
-{
-    if ( !d->evtchn_port_ops->set_priority )
-        return -ENOSYS;
-    if ( !evtchn_usable(evtchn) )
-        return -EACCES;
-    return d->evtchn_port_ops->set_priority(d, evtchn, priority);
-}
-
-static inline void evtchn_port_print_state(struct domain *d,
-                                           const struct evtchn *evtchn)
-{
-    d->evtchn_port_ops->print_state(d, evtchn);
-}
-
 #endif /* __XEN_EVENT_H__ */
--- a/xen/include/xen/event_fifo.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/*
- * FIFO-based event channel ABI.
- *
- * Copyright (C) 2013 Citrix Systems R&D Ltd.
- *
- * This source code is licensed under the GNU General Public License,
- * Version 2 or later.  See the file COPYING for more details.
- */
-#ifndef __XEN_EVENT_FIFO_H__
-#define __XEN_EVENT_FIFO_H__
-
-int evtchn_fifo_init_control(struct evtchn_init_control *init_control);
-int evtchn_fifo_expand_array(const struct evtchn_expand_array *expand_array);
-void evtchn_fifo_destroy(struct domain *domain);
-
-#endif /* __XEN_EVENT_FIFO_H__ */
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * tab-width: 4
- * indent-tabs-mode: nil
- * End:
- */



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 3/8] evtchn: rename and adjust guest_enabled_event()
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
  2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
  2020-10-20 14:08 ` [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one Jan Beulich
@ 2020-10-20 14:09 ` Jan Beulich
  2020-10-22 10:28   ` Roger Pau Monné
  2020-10-20 14:09 ` [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock Jan Beulich
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini, Roger Pau Monné

The function isn't about an "event" in general, but about a vIRQ. The
function also failed to honor global vIRQ-s, instead assuming the caller
would pass vCPU 0 in such a case.

While at it also adjust the
- types the function uses,
- single user to make use of domain_vcpu().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -5,9 +5,9 @@
 
 int vmce_init(struct cpuinfo_x86 *c);
 
-#define dom0_vmce_enabled() (hardware_domain && hardware_domain->max_vcpus \
-        && hardware_domain->vcpu[0] \
-        && guest_enabled_event(hardware_domain->vcpu[0], VIRQ_MCA))
+#define dom0_vmce_enabled() \
+    (hardware_domain && \
+     evtchn_virq_enabled(domain_vcpu(hardware_domain, 0), VIRQ_MCA))
 
 int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn);
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -778,9 +778,15 @@ out:
     return ret;
 }
 
-int guest_enabled_event(struct vcpu *v, uint32_t virq)
+bool evtchn_virq_enabled(const struct vcpu *v, unsigned int virq)
 {
-    return ((v != NULL) && (v->virq_to_evtchn[virq] != 0));
+    if ( !v )
+        return false;
+
+    if ( virq_is_global(virq) && v->vcpu_id )
+        v = domain_vcpu(v->domain, 0);
+
+    return v->virq_to_evtchn[virq];
 }
 
 void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -85,8 +85,8 @@ int alloc_unbound_xen_event_channel(
     xen_event_channel_notification_t notification_fn);
 void free_xen_event_channel(struct domain *d, int port);
 
-/* Query if event channel is in use by the guest */
-int guest_enabled_event(struct vcpu *v, uint32_t virq);
+/* Query whether a vIRQ is in use by the guest. */
+bool evtchn_virq_enabled(const struct vcpu *v, unsigned int virq);
 
 /* Notify remote end of a Xen-attached event channel.*/
 void notify_via_xen_event_channel(struct domain *ld, int lport);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
                   ` (2 preceding siblings ...)
  2020-10-20 14:09 ` [PATCH v2 3/8] evtchn: rename and adjust guest_enabled_event() Jan Beulich
@ 2020-10-20 14:09 ` Jan Beulich
  2020-10-22 11:17   ` Roger Pau Monné
  2020-10-20 14:10 ` [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq() Jan Beulich
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini

Some lock wants to be held to make sure the port doesn't change state,
but there's no point holding the per-domain event lock here. Switch to
using the finer grained per-channel lock instead.

FAOD this doesn't guarantee anything towards in particular
evtchn_fifo_set_pending(), as for interdomain channels that function
would be called with the remote side's per-channel lock held.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Drop acquiring of event lock. Re-write title and description.

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1154,20 +1154,17 @@ static long evtchn_set_priority(const st
 {
     struct domain *d = current->domain;
     unsigned int port = set_priority->port;
+    struct evtchn *chn;
     long ret;
-
-    spin_lock(&d->event_lock);
+    unsigned long flags;
 
     if ( !port_is_valid(d, port) )
-    {
-        spin_unlock(&d->event_lock);
         return -EINVAL;
-    }
-
-    ret = evtchn_port_set_priority(d, evtchn_from_port(d, port),
-                                   set_priority->priority);
 
-    spin_unlock(&d->event_lock);
+    chn = evtchn_from_port(d, port);
+    spin_lock_irqsave(&chn->lock, flags);
+    ret = evtchn_port_set_priority(d, chn, set_priority->priority);
+    spin_unlock_irqrestore(&chn->lock, flags);
 
     return ret;
 }



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
                   ` (3 preceding siblings ...)
  2020-10-20 14:09 ` [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock Jan Beulich
@ 2020-10-20 14:10 ` Jan Beulich
  2020-10-22 16:00   ` Roger Pau Monné
  2020-10-20 14:10 ` [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one Jan Beulich
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini

The per-vCPU virq_lock, which is being held anyway, together with there
not being any call to evtchn_port_set_pending() when v->virq_to_evtchn[]
is zero, provide sufficient guarantees. Undo the lock addition done for
XSA-343 (commit e045199c7c9c "evtchn: address races with
evtchn_reset()"). Update the description next to struct evtchn_port_ops
accordingly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -794,7 +794,6 @@ void send_guest_vcpu_virq(struct vcpu *v
     unsigned long flags;
     int port;
     struct domain *d;
-    struct evtchn *chn;
 
     ASSERT(!virq_is_global(virq));
 
@@ -805,10 +804,7 @@ void send_guest_vcpu_virq(struct vcpu *v
         goto out;
 
     d = v->domain;
-    chn = evtchn_from_port(d, port);
-    spin_lock(&chn->lock);
-    evtchn_port_set_pending(d, v->vcpu_id, chn);
-    spin_unlock(&chn->lock);
+    evtchn_port_set_pending(d, v->vcpu_id, evtchn_from_port(d, port));
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -837,9 +833,7 @@ void send_guest_global_virq(struct domai
         goto out;
 
     chn = evtchn_from_port(d, port);
-    spin_lock(&chn->lock);
     evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
-    spin_unlock(&chn->lock);
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
  * Low-level event channel port ops.
  *
  * All hooks have to be called with a lock held which prevents the channel
- * from changing state. This may be the domain event lock, the per-channel
- * lock, or in the case of sending interdomain events also the other side's
- * per-channel lock. Exceptions apply in certain cases for the PV shim.
+ * from changing state. This may be
+ * - the domain event lock,
+ * - the per-channel lock,
+ * - in the case of sending interdomain events the other side's per-channel
+ *   lock,
+ * - in the case of sending non-global vIRQ-s the per-vCPU virq_lock (in
+ *   combination with the ordering enforced through how the vCPU's
+ *   virq_to_evtchn[] gets updated),
+ * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
+ * Exceptions apply in certain cases for the PV shim.
  */
 struct evtchn_port_ops {
     void (*init)(struct domain *d, struct evtchn *evtchn);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
                   ` (4 preceding siblings ...)
  2020-10-20 14:10 ` [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq() Jan Beulich
@ 2020-10-20 14:10 ` Jan Beulich
  2020-10-30 10:57   ` Julien Grall
  2020-10-20 14:11 ` [PATCH v2 7/8] evtchn: convert domain event " Jan Beulich
  2020-10-20 14:13 ` [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held Jan Beulich
  7 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini

There's no need to serialize all sending of vIRQ-s; all that's needed
is serialization against the closing of the respective event channels
(so far by means of a barrier). To facilitate the conversion, switch to
an ordinary write locked region in evtchn_close().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Don't introduce/use rw_barrier() here. Add comment to
    evtchn_bind_virq(). Re-base.

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -160,7 +160,7 @@ struct vcpu *vcpu_create(struct domain *
     v->vcpu_id = vcpu_id;
     v->dirty_cpu = VCPU_CPU_CLEAN;
 
-    spin_lock_init(&v->virq_lock);
+    rwlock_init(&v->virq_lock);
 
     tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL);
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -449,6 +449,13 @@ int evtchn_bind_virq(evtchn_bind_virq_t
 
     spin_unlock_irqrestore(&chn->lock, flags);
 
+    /*
+     * If by any, the update of virq_to_evtchn[] would need guarding by
+     * virq_lock, but since this is the last action here, there's no strict
+     * need to acquire the lock. Hnece holding event_lock isn't helpful
+     * anymore at this point, but utilize that its unlocking acts as the
+     * otherwise necessary smp_wmb() here.
+     */
     v->virq_to_evtchn[virq] = bind->port = port;
 
  out:
@@ -638,10 +645,10 @@ int evtchn_close(struct domain *d1, int
     case ECS_VIRQ:
         for_each_vcpu ( d1, v )
         {
-            if ( v->virq_to_evtchn[chn1->u.virq] != port1 )
-                continue;
-            v->virq_to_evtchn[chn1->u.virq] = 0;
-            spin_barrier(&v->virq_lock);
+            write_lock_irqsave(&v->virq_lock, flags);
+            if ( v->virq_to_evtchn[chn1->u.virq] == port1 )
+                v->virq_to_evtchn[chn1->u.virq] = 0;
+            write_unlock_irqrestore(&v->virq_lock, flags);
         }
         break;
 
@@ -797,7 +804,7 @@ void send_guest_vcpu_virq(struct vcpu *v
 
     ASSERT(!virq_is_global(virq));
 
-    spin_lock_irqsave(&v->virq_lock, flags);
+    read_lock_irqsave(&v->virq_lock, flags);
 
     port = v->virq_to_evtchn[virq];
     if ( unlikely(port == 0) )
@@ -807,7 +814,7 @@ void send_guest_vcpu_virq(struct vcpu *v
     evtchn_port_set_pending(d, v->vcpu_id, evtchn_from_port(d, port));
 
  out:
-    spin_unlock_irqrestore(&v->virq_lock, flags);
+    read_unlock_irqrestore(&v->virq_lock, flags);
 }
 
 void send_guest_global_virq(struct domain *d, uint32_t virq)
@@ -826,7 +833,7 @@ void send_guest_global_virq(struct domai
     if ( unlikely(v == NULL) )
         return;
 
-    spin_lock_irqsave(&v->virq_lock, flags);
+    read_lock_irqsave(&v->virq_lock, flags);
 
     port = v->virq_to_evtchn[virq];
     if ( unlikely(port == 0) )
@@ -836,7 +843,7 @@ void send_guest_global_virq(struct domai
     evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
 
  out:
-    spin_unlock_irqrestore(&v->virq_lock, flags);
+    read_unlock_irqrestore(&v->virq_lock, flags);
 }
 
 void send_guest_pirq(struct domain *d, const struct pirq *pirq)
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -235,7 +235,7 @@ struct vcpu
 
     /* IRQ-safe virq_lock protects against delivering VIRQ to stale evtchn. */
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
-    spinlock_t       virq_lock;
+    rwlock_t         virq_lock;
 
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 7/8] evtchn: convert domain event lock to an r/w one
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
                   ` (5 preceding siblings ...)
  2020-10-20 14:10 ` [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one Jan Beulich
@ 2020-10-20 14:11 ` Jan Beulich
  2020-10-20 14:13 ` [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held Jan Beulich
  7 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Paul Durrant, Daniel de Graaf

Especially for the use in evtchn_move_pirqs() (called when moving a vCPU
across pCPU-s) and the ones in EOI handling in PCI pass-through code,
serializing perhaps an entire domain isn't helpful when no state (which
isn't e.g. further protected by the per-channel lock) changes.

Unfortunately this implies dropping of lock profiling for this lock,
until r/w locks may get enabled for such functionality.

While ->notify_vcpu_id is now meant to be consistently updated with the
per-channel lock held, an extension applies to ECS_PIRQ: The field is
also guaranteed to not change with the per-domain event lock held for
writing. Therefore the unlink_pirq_port() call from evtchn_bind_vcpu()
as well as the link_pirq_port() one from evtchn_bind_pirq() could in
principle be moved out of the per-channel locked regions, but this
further code churn didn't seem worth it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Consistently lock for writing in evtchn_reset(). Fix error path in
    pci_clean_dpci_irqs(). Lock for writing in pt_irq_time_out(),
    hvm_dirq_assist(), hvm_dpci_eoi(), and hvm_dpci_isairq_eoi(). Move
    rw_barrier() introduction here. Re-base over changes earlier in the
    series.
---
RFC:
* In evtchn_bind_vcpu() the question is whether limiting the use of
  write_lock() to just the ECS_PIRQ case is really worth it.
* In flask_get_peer_sid() the question is whether we wouldn't better
  switch to using the per-channel lock.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -917,7 +917,7 @@ int arch_domain_soft_reset(struct domain
     if ( !is_hvm_domain(d) )
         return -EINVAL;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     for ( i = 0; i < d->nr_pirqs ; i++ )
     {
         if ( domain_pirq_to_emuirq(d, i) != IRQ_UNBOUND )
@@ -927,7 +927,7 @@ int arch_domain_soft_reset(struct domain
                 break;
         }
     }
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     if ( ret )
         return ret;
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -528,9 +528,9 @@ void hvm_migrate_pirqs(struct vcpu *v)
     if ( !is_iommu_enabled(d) || !hvm_domain_irq(d)->dpci )
        return;
 
-    spin_lock(&d->event_lock);
+    read_lock(&d->event_lock);
     pt_pirq_iterate(d, migrate_pirq, v);
-    spin_unlock(&d->event_lock);
+    read_unlock(&d->event_lock);
 }
 
 static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -404,9 +404,9 @@ int hvm_inject_msi(struct domain *d, uin
             {
                 int rc;
 
-                spin_lock(&d->event_lock);
+                write_lock(&d->event_lock);
                 rc = map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU);
-                spin_unlock(&d->event_lock);
+                write_unlock(&d->event_lock);
                 if ( rc )
                     return rc;
                 info = pirq_info(d, pirq);
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -203,9 +203,9 @@ static int vioapic_hwdom_map_gsi(unsigne
     {
         gprintk(XENLOG_WARNING, "vioapic: error binding GSI %u: %d\n",
                 gsi, ret);
-        spin_lock(&currd->event_lock);
+        write_lock(&currd->event_lock);
         unmap_domain_pirq(currd, pirq);
-        spin_unlock(&currd->event_lock);
+        write_unlock(&currd->event_lock);
     }
     pcidevs_unlock();
 
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -465,7 +465,7 @@ int msixtbl_pt_register(struct domain *d
     int r = -EINVAL;
 
     ASSERT(pcidevs_locked());
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     if ( !msixtbl_initialised(d) )
         return -ENODEV;
@@ -535,7 +535,7 @@ void msixtbl_pt_unregister(struct domain
     struct msixtbl_entry *entry;
 
     ASSERT(pcidevs_locked());
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     if ( !msixtbl_initialised(d) )
         return;
@@ -589,13 +589,13 @@ void msixtbl_pt_cleanup(struct domain *d
     if ( !msixtbl_initialised(d) )
         return;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     list_for_each_entry_safe( entry, temp,
                               &d->arch.hvm.msixtbl_list, list )
         del_msixtbl_entry(entry);
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 }
 
 void msix_write_completion(struct vcpu *v)
@@ -719,9 +719,9 @@ int vpci_msi_arch_update(struct vpci_msi
                          msi->arch.pirq, msi->mask);
     if ( rc )
     {
-        spin_lock(&pdev->domain->event_lock);
+        write_lock(&pdev->domain->event_lock);
         unmap_domain_pirq(pdev->domain, msi->arch.pirq);
-        spin_unlock(&pdev->domain->event_lock);
+        write_unlock(&pdev->domain->event_lock);
         pcidevs_unlock();
         msi->arch.pirq = INVALID_PIRQ;
         return rc;
@@ -760,9 +760,9 @@ static int vpci_msi_enable(const struct
     rc = vpci_msi_update(pdev, data, address, vectors, pirq, mask);
     if ( rc )
     {
-        spin_lock(&pdev->domain->event_lock);
+        write_lock(&pdev->domain->event_lock);
         unmap_domain_pirq(pdev->domain, pirq);
-        spin_unlock(&pdev->domain->event_lock);
+        write_unlock(&pdev->domain->event_lock);
         pcidevs_unlock();
         return rc;
     }
@@ -807,9 +807,9 @@ static void vpci_msi_disable(const struc
         ASSERT(!rc);
     }
 
-    spin_lock(&pdev->domain->event_lock);
+    write_lock(&pdev->domain->event_lock);
     unmap_domain_pirq(pdev->domain, pirq);
-    spin_unlock(&pdev->domain->event_lock);
+    write_unlock(&pdev->domain->event_lock);
     pcidevs_unlock();
 }
 
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2413,10 +2413,10 @@ int ioapic_guest_write(unsigned long phy
     }
     if ( pirq >= 0 )
     {
-        spin_lock(&hardware_domain->event_lock);
+        write_lock(&hardware_domain->event_lock);
         ret = map_domain_pirq(hardware_domain, pirq, irq,
                               MAP_PIRQ_TYPE_GSI, NULL);
-        spin_unlock(&hardware_domain->event_lock);
+        write_unlock(&hardware_domain->event_lock);
         if ( ret < 0 )
             return ret;
     }
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1536,7 +1536,7 @@ int pirq_guest_bind(struct vcpu *v, stru
     irq_guest_action_t *action, *newaction = NULL;
     int                 rc = 0;
 
-    WARN_ON(!spin_is_locked(&v->domain->event_lock));
+    WARN_ON(!rw_is_write_locked(&v->domain->event_lock));
     BUG_ON(!local_irq_is_enabled());
 
  retry:
@@ -1756,7 +1756,7 @@ void pirq_guest_unbind(struct domain *d,
     struct irq_desc *desc;
     int irq = 0;
 
-    WARN_ON(!spin_is_locked(&d->event_lock));
+    WARN_ON(!rw_is_write_locked(&d->event_lock));
 
     BUG_ON(!local_irq_is_enabled());
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
@@ -1793,7 +1793,7 @@ static bool pirq_guest_force_unbind(stru
     unsigned int i;
     bool bound = false;
 
-    WARN_ON(!spin_is_locked(&d->event_lock));
+    WARN_ON(!rw_is_write_locked(&d->event_lock));
 
     BUG_ON(!local_irq_is_enabled());
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
@@ -2037,7 +2037,7 @@ int get_free_pirq(struct domain *d, int
 {
     int i;
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     if ( type == MAP_PIRQ_TYPE_GSI )
     {
@@ -2062,7 +2062,7 @@ int get_free_pirqs(struct domain *d, uns
 {
     unsigned int i, found = 0;
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     for ( i = d->nr_pirqs - 1; i >= nr_irqs_gsi; --i )
         if ( is_free_pirq(d, pirq_info(d, i)) )
@@ -2090,7 +2090,7 @@ int map_domain_pirq(
     DECLARE_BITMAP(prepared, MAX_MSI_IRQS) = {};
     DECLARE_BITMAP(granted, MAX_MSI_IRQS) = {};
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     if ( !irq_access_permitted(current->domain, irq))
         return -EPERM;
@@ -2309,7 +2309,7 @@ int unmap_domain_pirq(struct domain *d,
         return -EINVAL;
 
     ASSERT(pcidevs_locked());
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     info = pirq_info(d, pirq);
     if ( !info || (irq = info->arch.irq) <= 0 )
@@ -2436,13 +2436,13 @@ void free_domain_pirqs(struct domain *d)
     int i;
 
     pcidevs_lock();
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     for ( i = 0; i < d->nr_pirqs; i++ )
         if ( domain_pirq_to_irq(d, i) > 0 )
             unmap_domain_pirq(d, i);
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     pcidevs_unlock();
 }
 
@@ -2685,7 +2685,7 @@ int map_domain_emuirq_pirq(struct domain
     int old_emuirq = IRQ_UNBOUND, old_pirq = IRQ_UNBOUND;
     struct pirq *info;
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     if ( !is_hvm_domain(d) )
         return -EINVAL;
@@ -2751,7 +2751,7 @@ int unmap_domain_pirq_emuirq(struct doma
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     emuirq = domain_pirq_to_emuirq(d, pirq);
     if ( emuirq == IRQ_UNBOUND )
@@ -2799,7 +2799,7 @@ static int allocate_pirq(struct domain *
 {
     int current_pirq;
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
     current_pirq = domain_irq_to_pirq(d, irq);
     if ( pirq < 0 )
     {
@@ -2871,7 +2871,7 @@ int allocate_and_map_gsi_pirq(struct dom
     }
 
     /* Verify or get pirq. */
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     pirq = allocate_pirq(d, index, *pirq_p, irq, MAP_PIRQ_TYPE_GSI, NULL);
     if ( pirq < 0 )
     {
@@ -2884,7 +2884,7 @@ int allocate_and_map_gsi_pirq(struct dom
         *pirq_p = pirq;
 
  done:
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return ret;
 }
@@ -2925,7 +2925,7 @@ int allocate_and_map_msi_pirq(struct dom
 
     pcidevs_lock();
     /* Verify or get pirq. */
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     pirq = allocate_pirq(d, index, *pirq_p, irq, type, &msi->entry_nr);
     if ( pirq < 0 )
     {
@@ -2938,7 +2938,7 @@ int allocate_and_map_msi_pirq(struct dom
         *pirq_p = pirq;
 
  done:
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     pcidevs_unlock();
     if ( ret )
     {
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -34,7 +34,7 @@ static int physdev_hvm_map_pirq(
 
     ASSERT(!is_hardware_domain(d));
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     switch ( type )
     {
     case MAP_PIRQ_TYPE_GSI: {
@@ -84,7 +84,7 @@ static int physdev_hvm_map_pirq(
         break;
     }
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     return ret;
 }
 
@@ -154,18 +154,18 @@ int physdev_unmap_pirq(domid_t domid, in
 
     if ( is_hvm_domain(d) && has_pirq(d) )
     {
-        spin_lock(&d->event_lock);
+        write_lock(&d->event_lock);
         if ( domain_pirq_to_emuirq(d, pirq) != IRQ_UNBOUND )
             ret = unmap_domain_pirq_emuirq(d, pirq);
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
         if ( domid == DOMID_SELF || ret )
             goto free_domain;
     }
 
     pcidevs_lock();
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     ret = unmap_domain_pirq(d, pirq);
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     pcidevs_unlock();
 
  free_domain:
@@ -192,10 +192,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         ret = -EINVAL;
         if ( eoi.irq >= currd->nr_pirqs )
             break;
-        spin_lock(&currd->event_lock);
+        read_lock(&currd->event_lock);
         pirq = pirq_info(currd, eoi.irq);
         if ( !pirq ) {
-            spin_unlock(&currd->event_lock);
+            read_unlock(&currd->event_lock);
             break;
         }
         if ( currd->arch.auto_unmask )
@@ -214,7 +214,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
                     && hvm_irq->gsi_assert_count[gsi] )
                 send_guest_pirq(currd, pirq);
         }
-        spin_unlock(&currd->event_lock);
+        read_unlock(&currd->event_lock);
         ret = 0;
         break;
     }
@@ -626,7 +626,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( copy_from_guest(&out, arg, 1) != 0 )
             break;
 
-        spin_lock(&currd->event_lock);
+        write_lock(&currd->event_lock);
 
         ret = get_free_pirq(currd, out.type);
         if ( ret >= 0 )
@@ -639,7 +639,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
                 ret = -ENOMEM;
         }
 
-        spin_unlock(&currd->event_lock);
+        write_unlock(&currd->event_lock);
 
         if ( ret >= 0 )
         {
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -448,7 +448,7 @@ static long pv_shim_event_channel_op(int
         if ( rc )                                                           \
             break;                                                          \
                                                                             \
-        spin_lock(&d->event_lock);                                          \
+        write_lock(&d->event_lock);                                         \
         rc = evtchn_allocate_port(d, op.port_field);                        \
         if ( rc )                                                           \
         {                                                                   \
@@ -457,7 +457,7 @@ static long pv_shim_event_channel_op(int
         }                                                                   \
         else                                                                \
             evtchn_reserve(d, op.port_field);                               \
-        spin_unlock(&d->event_lock);                                        \
+        write_unlock(&d->event_lock);                                       \
                                                                             \
         if ( !rc && __copy_to_guest(arg, &op, 1) )                          \
             rc = -EFAULT;                                                   \
@@ -585,11 +585,11 @@ static long pv_shim_event_channel_op(int
         if ( rc )
             break;
 
-        spin_lock(&d->event_lock);
+        write_lock(&d->event_lock);
         rc = evtchn_allocate_port(d, ipi.port);
         if ( rc )
         {
-            spin_unlock(&d->event_lock);
+            write_unlock(&d->event_lock);
 
             close.port = ipi.port;
             BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_close, &close));
@@ -598,7 +598,7 @@ static long pv_shim_event_channel_op(int
 
         evtchn_assign_vcpu(d, ipi.port, ipi.vcpu);
         evtchn_reserve(d, ipi.port);
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
 
         if ( __copy_to_guest(arg, &ipi, 1) )
             rc = -EFAULT;
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -261,7 +261,7 @@ static long evtchn_alloc_unbound(evtchn_
     if ( d == NULL )
         return -ESRCH;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     if ( (port = get_free_port(d)) < 0 )
         ERROR_EXIT_DOM(port, d);
@@ -284,7 +284,7 @@ static long evtchn_alloc_unbound(evtchn_
 
  out:
     check_free_port(d, port);
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     rcu_unlock_domain(d);
 
     return rc;
@@ -337,14 +337,14 @@ static long evtchn_bind_interdomain(evtc
     /* Avoid deadlock by first acquiring lock of domain with smaller id. */
     if ( ld < rd )
     {
-        spin_lock(&ld->event_lock);
-        spin_lock(&rd->event_lock);
+        write_lock(&ld->event_lock);
+        read_lock(&rd->event_lock);
     }
     else
     {
         if ( ld != rd )
-            spin_lock(&rd->event_lock);
-        spin_lock(&ld->event_lock);
+            read_lock(&rd->event_lock);
+        write_lock(&ld->event_lock);
     }
 
     if ( (lport = get_free_port(ld)) < 0 )
@@ -385,9 +385,9 @@ static long evtchn_bind_interdomain(evtc
 
  out:
     check_free_port(ld, lport);
-    spin_unlock(&ld->event_lock);
+    write_unlock(&ld->event_lock);
     if ( ld != rd )
-        spin_unlock(&rd->event_lock);
+        read_unlock(&rd->event_lock);
     
     rcu_unlock_domain(rd);
 
@@ -419,7 +419,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t
     if ( (v = domain_vcpu(d, vcpu)) == NULL )
         return -ENOENT;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     if ( v->virq_to_evtchn[virq] != 0 )
         ERROR_EXIT(-EEXIST);
@@ -459,7 +459,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t
     v->virq_to_evtchn[virq] = bind->port = port;
 
  out:
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return rc;
 }
@@ -476,7 +476,7 @@ static long evtchn_bind_ipi(evtchn_bind_
     if ( domain_vcpu(d, vcpu) == NULL )
         return -ENOENT;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     if ( (port = get_free_port(d)) < 0 )
         ERROR_EXIT(port);
@@ -494,7 +494,7 @@ static long evtchn_bind_ipi(evtchn_bind_
     bind->port = port;
 
  out:
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return rc;
 }
@@ -541,7 +541,7 @@ static long evtchn_bind_pirq(evtchn_bind
     if ( !is_hvm_domain(d) && !pirq_access_permitted(d, pirq) )
         return -EPERM;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     if ( pirq_to_evtchn(d, pirq) != 0 )
         ERROR_EXIT(-EEXIST);
@@ -581,7 +581,7 @@ static long evtchn_bind_pirq(evtchn_bind
 
  out:
     check_free_port(d, port);
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return rc;
 }
@@ -597,7 +597,7 @@ int evtchn_close(struct domain *d1, int
     unsigned long  flags;
 
  again:
-    spin_lock(&d1->event_lock);
+    write_lock(&d1->event_lock);
 
     if ( !port_is_valid(d1, port1) )
     {
@@ -665,13 +665,11 @@ int evtchn_close(struct domain *d1, int
                 BUG();
 
             if ( d1 < d2 )
-            {
-                spin_lock(&d2->event_lock);
-            }
+                read_lock(&d2->event_lock);
             else if ( d1 != d2 )
             {
-                spin_unlock(&d1->event_lock);
-                spin_lock(&d2->event_lock);
+                write_unlock(&d1->event_lock);
+                read_lock(&d2->event_lock);
                 goto again;
             }
         }
@@ -718,11 +716,11 @@ int evtchn_close(struct domain *d1, int
     if ( d2 != NULL )
     {
         if ( d1 != d2 )
-            spin_unlock(&d2->event_lock);
+            read_unlock(&d2->event_lock);
         put_domain(d2);
     }
 
-    spin_unlock(&d1->event_lock);
+    write_unlock(&d1->event_lock);
 
     return rc;
 }
@@ -944,7 +942,7 @@ int evtchn_status(evtchn_status_t *statu
     if ( d == NULL )
         return -ESRCH;
 
-    spin_lock(&d->event_lock);
+    read_lock(&d->event_lock);
 
     if ( !port_is_valid(d, port) )
     {
@@ -997,7 +995,7 @@ int evtchn_status(evtchn_status_t *statu
     status->vcpu = chn->notify_vcpu_id;
 
  out:
-    spin_unlock(&d->event_lock);
+    read_unlock(&d->event_lock);
     rcu_unlock_domain(d);
 
     return rc;
@@ -1010,20 +1008,19 @@ long evtchn_bind_vcpu(unsigned int port,
     struct evtchn *chn;
     long           rc = 0;
     struct vcpu   *v;
+    bool           write_locked = false;
+    unsigned long  flags;
 
     /* Use the vcpu info to prevent speculative out-of-bound accesses */
     if ( (v = domain_vcpu(d, vcpu_id)) == NULL )
         return -ENOENT;
 
-    spin_lock(&d->event_lock);
-
     if ( !port_is_valid(d, port) )
-    {
-        rc = -EINVAL;
-        goto out;
-    }
+        return -EINVAL;
 
     chn = evtchn_from_port(d, port);
+ again:
+    spin_lock_irqsave(&chn->lock, flags);
 
     /* Guest cannot re-bind a Xen-attached event channel. */
     if ( unlikely(consumer_is_xen(chn)) )
@@ -1047,19 +1044,32 @@ long evtchn_bind_vcpu(unsigned int port,
     case ECS_PIRQ:
         if ( chn->notify_vcpu_id == v->vcpu_id )
             break;
+        if ( !write_locked )
+        {
+            spin_unlock_irqrestore(&chn->lock, flags);
+            write_lock(&d->event_lock);
+            write_locked = true;
+            goto again;
+        }
+
         unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]);
         chn->notify_vcpu_id = v->vcpu_id;
+        spin_unlock_irqrestore(&chn->lock, flags);
         pirq_set_affinity(d, chn->u.pirq.irq,
                           cpumask_of(v->processor));
         link_pirq_port(port, chn, v);
-        break;
+
+        write_unlock(&d->event_lock);
+        return 0;
     default:
         rc = -EINVAL;
         break;
     }
 
  out:
-    spin_unlock(&d->event_lock);
+    spin_unlock_irqrestore(&chn->lock, flags);
+    if ( write_locked )
+        write_unlock(&d->event_lock);
 
     return rc;
 }
@@ -1103,7 +1113,7 @@ int evtchn_reset(struct domain *d, bool
     if ( d != current->domain && !d->controller_pause_count )
         return -EINVAL;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     /*
      * If we are resuming, then start where we stopped. Otherwise, check
@@ -1114,7 +1124,7 @@ int evtchn_reset(struct domain *d, bool
     if ( i > d->next_evtchn )
         d->next_evtchn = i;
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     if ( !i )
         return -EBUSY;
@@ -1126,14 +1136,14 @@ int evtchn_reset(struct domain *d, bool
         /* NB: Choice of frequency is arbitrary. */
         if ( !(i & 0x3f) && hypercall_preempt_check() )
         {
-            spin_lock(&d->event_lock);
+            write_lock(&d->event_lock);
             d->next_evtchn = i;
-            spin_unlock(&d->event_lock);
+            write_unlock(&d->event_lock);
             return -ERESTART;
         }
     }
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     d->next_evtchn = 0;
 
@@ -1146,7 +1156,7 @@ int evtchn_reset(struct domain *d, bool
         evtchn_2l_init(d);
     }
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return rc;
 }
@@ -1335,7 +1345,7 @@ int alloc_unbound_xen_event_channel(
     int            port, rc;
     unsigned long  flags;
 
-    spin_lock(&ld->event_lock);
+    write_lock(&ld->event_lock);
 
     port = rc = get_free_port(ld);
     if ( rc < 0 )
@@ -1363,7 +1373,7 @@ int alloc_unbound_xen_event_channel(
 
  out:
     check_free_port(ld, port);
-    spin_unlock(&ld->event_lock);
+    write_unlock(&ld->event_lock);
 
     return rc < 0 ? rc : port;
 }
@@ -1451,7 +1461,8 @@ int evtchn_init(struct domain *d, unsign
         return -ENOMEM;
     d->valid_evtchns = EVTCHNS_PER_BUCKET;
 
-    spin_lock_init_prof(d, event_lock);
+    rwlock_init(&d->event_lock);
+
     if ( get_free_port(d) != 0 )
     {
         free_evtchn_bucket(d, d->evtchn);
@@ -1478,7 +1489,7 @@ int evtchn_destroy(struct domain *d)
 
     /* After this barrier no new event-channel allocations can occur. */
     BUG_ON(!d->is_dying);
-    spin_barrier(&d->event_lock);
+    rw_barrier(&d->event_lock);
 
     /* Close all existing event channels. */
     for ( i = d->valid_evtchns; --i; )
@@ -1536,13 +1547,13 @@ void evtchn_move_pirqs(struct vcpu *v)
     unsigned int port;
     struct evtchn *chn;
 
-    spin_lock(&d->event_lock);
+    read_lock(&d->event_lock);
     for ( port = v->pirq_evtchn_head; port; port = chn->u.pirq.next_port )
     {
         chn = evtchn_from_port(d, port);
         pirq_set_affinity(d, chn->u.pirq.irq, mask);
     }
-    spin_unlock(&d->event_lock);
+    read_unlock(&d->event_lock);
 }
 
 
@@ -1555,7 +1566,7 @@ static void domain_dump_evtchn_info(stru
            "Polling vCPUs: {%*pbl}\n"
            "    port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask);
 
-    spin_lock(&d->event_lock);
+    read_lock(&d->event_lock);
 
     for ( port = 1; port_is_valid(d, port); ++port )
     {
@@ -1602,7 +1613,7 @@ static void domain_dump_evtchn_info(stru
         }
     }
 
-    spin_unlock(&d->event_lock);
+    read_unlock(&d->event_lock);
 }
 
 static void dump_evtchn_info(unsigned char key)
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -561,7 +561,7 @@ int evtchn_fifo_init_control(struct evtc
     if ( offset & (8 - 1) )
         return -EINVAL;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     /*
      * If this is the first control block, setup an empty event array
@@ -593,13 +593,13 @@ int evtchn_fifo_init_control(struct evtc
     else
         rc = map_control_block(v, gfn, offset);
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return rc;
 
  error:
     evtchn_fifo_destroy(d);
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     return rc;
 }
 
@@ -652,9 +652,9 @@ int evtchn_fifo_expand_array(const struc
     if ( !d->evtchn_fifo )
         return -EOPNOTSUPP;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     rc = add_page_to_event_array(d, expand_array->array_gfn);
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     return rc;
 }
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -105,7 +105,7 @@ static void pt_pirq_softirq_reset(struct
 {
     struct domain *d = pirq_dpci->dom;
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_write_locked(&d->event_lock));
 
     switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
     {
@@ -162,7 +162,7 @@ static void pt_irq_time_out(void *data)
     const struct hvm_irq_dpci *dpci;
     const struct dev_intx_gsi_link *digl;
 
-    spin_lock(&irq_map->dom->event_lock);
+    write_lock(&irq_map->dom->event_lock);
 
     if ( irq_map->flags & HVM_IRQ_DPCI_IDENTITY_GSI )
     {
@@ -177,7 +177,7 @@ static void pt_irq_time_out(void *data)
         hvm_gsi_deassert(irq_map->dom, dpci_pirq(irq_map)->pirq);
         irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH;
         pt_irq_guest_eoi(irq_map->dom, irq_map, NULL);
-        spin_unlock(&irq_map->dom->event_lock);
+        write_unlock(&irq_map->dom->event_lock);
         return;
     }
 
@@ -185,7 +185,7 @@ static void pt_irq_time_out(void *data)
     if ( unlikely(!dpci) )
     {
         ASSERT_UNREACHABLE();
-        spin_unlock(&irq_map->dom->event_lock);
+        write_unlock(&irq_map->dom->event_lock);
         return;
     }
     list_for_each_entry ( digl, &irq_map->digl_list, list )
@@ -204,7 +204,7 @@ static void pt_irq_time_out(void *data)
 
     pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
 
-    spin_unlock(&irq_map->dom->event_lock);
+    write_unlock(&irq_map->dom->event_lock);
 }
 
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *d)
@@ -288,7 +288,7 @@ int pt_irq_create_bind(
         return -EINVAL;
 
  restart:
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     hvm_irq_dpci = domain_get_irq_dpci(d);
     if ( !hvm_irq_dpci && !is_hardware_domain(d) )
@@ -304,7 +304,7 @@ int pt_irq_create_bind(
         hvm_irq_dpci = xzalloc(struct hvm_irq_dpci);
         if ( hvm_irq_dpci == NULL )
         {
-            spin_unlock(&d->event_lock);
+            write_unlock(&d->event_lock);
             return -ENOMEM;
         }
         for ( i = 0; i < NR_HVM_DOMU_IRQS; i++ )
@@ -316,7 +316,7 @@ int pt_irq_create_bind(
     info = pirq_get_info(d, pirq);
     if ( !info )
     {
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
         return -ENOMEM;
     }
     pirq_dpci = pirq_dpci(info);
@@ -331,7 +331,7 @@ int pt_irq_create_bind(
      */
     if ( pt_pirq_softirq_active(pirq_dpci) )
     {
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
         cpu_relax();
         goto restart;
     }
@@ -389,7 +389,7 @@ int pt_irq_create_bind(
                 pirq_dpci->dom = NULL;
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
-                spin_unlock(&d->event_lock);
+                write_unlock(&d->event_lock);
                 return rc;
             }
         }
@@ -399,7 +399,7 @@ int pt_irq_create_bind(
 
             if ( (pirq_dpci->flags & mask) != mask )
             {
-                spin_unlock(&d->event_lock);
+                write_unlock(&d->event_lock);
                 return -EBUSY;
             }
 
@@ -423,7 +423,7 @@ int pt_irq_create_bind(
 
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
 
         pirq_dpci->gmsi.posted = false;
         vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
@@ -483,7 +483,7 @@ int pt_irq_create_bind(
 
             if ( !digl || !girq )
             {
-                spin_unlock(&d->event_lock);
+                write_unlock(&d->event_lock);
                 xfree(girq);
                 xfree(digl);
                 return -ENOMEM;
@@ -510,7 +510,7 @@ int pt_irq_create_bind(
             if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
                  pirq >= hvm_domain_irq(d)->nr_gsis )
             {
-                spin_unlock(&d->event_lock);
+                write_unlock(&d->event_lock);
 
                 return -EINVAL;
             }
@@ -546,7 +546,7 @@ int pt_irq_create_bind(
 
                     if ( mask < 0 || trigger_mode < 0 )
                     {
-                        spin_unlock(&d->event_lock);
+                        write_unlock(&d->event_lock);
 
                         ASSERT_UNREACHABLE();
                         return -EINVAL;
@@ -594,14 +594,14 @@ int pt_irq_create_bind(
                 }
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
-                spin_unlock(&d->event_lock);
+                write_unlock(&d->event_lock);
                 xfree(girq);
                 xfree(digl);
                 return rc;
             }
         }
 
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
 
         if ( iommu_verbose )
         {
@@ -619,7 +619,7 @@ int pt_irq_create_bind(
     }
 
     default:
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
         return -EOPNOTSUPP;
     }
 
@@ -672,13 +672,13 @@ int pt_irq_destroy_bind(
         return -EOPNOTSUPP;
     }
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     hvm_irq_dpci = domain_get_irq_dpci(d);
 
     if ( !hvm_irq_dpci && !is_hardware_domain(d) )
     {
-        spin_unlock(&d->event_lock);
+        write_unlock(&d->event_lock);
         return -EINVAL;
     }
 
@@ -711,7 +711,7 @@ int pt_irq_destroy_bind(
 
         if ( girq )
         {
-            spin_unlock(&d->event_lock);
+            write_unlock(&d->event_lock);
             return -EINVAL;
         }
 
@@ -755,7 +755,7 @@ int pt_irq_destroy_bind(
         pirq_cleanup_check(pirq, d);
     }
 
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 
     if ( what && iommu_verbose )
     {
@@ -799,7 +799,7 @@ int pt_pirq_iterate(struct domain *d,
     unsigned int pirq = 0, n, i;
     struct pirq *pirqs[8];
 
-    ASSERT(spin_is_locked(&d->event_lock));
+    ASSERT(rw_is_locked(&d->event_lock));
 
     do {
         n = radix_tree_gang_lookup(&d->pirq_tree, (void **)pirqs, pirq,
@@ -880,9 +880,9 @@ void hvm_dpci_msi_eoi(struct domain *d,
          (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) )
        return;
 
-    spin_lock(&d->event_lock);
+    read_lock(&d->event_lock);
     pt_pirq_iterate(d, _hvm_dpci_msi_eoi, (void *)(long)vector);
-    spin_unlock(&d->event_lock);
+    read_unlock(&d->event_lock);
 }
 
 static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
@@ -893,7 +893,7 @@ static void hvm_dirq_assist(struct domai
         return;
     }
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
@@ -947,7 +947,7 @@ static void hvm_dirq_assist(struct domai
     }
 
  out:
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 }
 
 static void hvm_pirq_eoi(struct pirq *pirq,
@@ -1012,7 +1012,7 @@ void hvm_dpci_eoi(struct domain *d, unsi
 
     if ( is_hardware_domain(d) )
     {
-        spin_lock(&d->event_lock);
+        write_lock(&d->event_lock);
         hvm_gsi_eoi(d, guest_gsi, ent);
         goto unlock;
     }
@@ -1023,7 +1023,7 @@ void hvm_dpci_eoi(struct domain *d, unsi
         return;
     }
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     hvm_irq_dpci = domain_get_irq_dpci(d);
 
     if ( !hvm_irq_dpci )
@@ -1033,7 +1033,7 @@ void hvm_dpci_eoi(struct domain *d, unsi
         __hvm_dpci_eoi(d, girq, ent);
 
 unlock:
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 }
 
 /*
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -2,6 +2,7 @@
 #include <xen/irq.h>
 #include <xen/smp.h>
 #include <xen/time.h>
+#include <xen/rwlock.h>
 #include <xen/spinlock.h>
 #include <xen/guest_access.h>
 #include <xen/preempt.h>
@@ -334,6 +335,15 @@ void _spin_unlock_recursive(spinlock_t *
     }
 }
 
+void _rw_barrier(rwlock_t *lock)
+{
+    check_barrier(&lock->lock.debug);
+    smp_mb();
+    while ( _rw_is_locked(lock) )
+        arch_lock_relax();
+    smp_mb();
+}
+
 #ifdef CONFIG_DEBUG_LOCK_PROFILE
 
 struct lock_profile_anc {
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -883,7 +883,7 @@ static int pci_clean_dpci_irqs(struct do
     if ( !is_hvm_domain(d) )
         return 0;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
     hvm_irq_dpci = domain_get_irq_dpci(d);
     if ( hvm_irq_dpci != NULL )
     {
@@ -901,14 +901,14 @@ static int pci_clean_dpci_irqs(struct do
             ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
         if ( ret )
         {
-            spin_unlock(&d->event_lock);
+            write_unlock(&d->event_lock);
             return ret;
         }
 
         hvm_domain_irq(d)->dpci = NULL;
         free_hvm_irq_dpci(hvm_irq_dpci);
     }
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
     return 0;
 }
 
--- a/xen/drivers/passthrough/vtd/x86/hvm.c
+++ b/xen/drivers/passthrough/vtd/x86/hvm.c
@@ -54,7 +54,7 @@ void hvm_dpci_isairq_eoi(struct domain *
     if ( !is_iommu_enabled(d) )
         return;
 
-    spin_lock(&d->event_lock);
+    write_lock(&d->event_lock);
 
     dpci = domain_get_irq_dpci(d);
 
@@ -63,5 +63,5 @@ void hvm_dpci_isairq_eoi(struct domain *
         /* Multiple mirq may be mapped to one isa irq */
         pt_pirq_iterate(d, _hvm_dpci_isairq_eoi, (void *)(long)isairq);
     }
-    spin_unlock(&d->event_lock);
+    write_unlock(&d->event_lock);
 }
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -237,6 +237,8 @@ static inline int _rw_is_write_locked(rw
     return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
 }
 
+void _rw_barrier(rwlock_t *lock);
+
 #define read_lock(l)                  _read_lock(l)
 #define read_lock_irq(l)              _read_lock_irq(l)
 #define read_lock_irqsave(l, f)                                 \
@@ -266,6 +268,7 @@ static inline int _rw_is_write_locked(rw
 #define rw_is_locked(l)               _rw_is_locked(l)
 #define rw_is_write_locked(l)         _rw_is_write_locked(l)
 
+#define rw_barrier(l)                 _rw_barrier(l)
 
 typedef struct percpu_rwlock percpu_rwlock_t;
 
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -373,7 +373,7 @@ struct domain
     unsigned int     xen_evtchns;
     /* Port to resume from in evtchn_reset(), when in a continuation. */
     unsigned int     next_evtchn;
-    spinlock_t       event_lock;
+    rwlock_t         event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
     struct evtchn_fifo_domain *evtchn_fifo;
 
--- a/xen/xsm/flask/flask_op.c
+++ b/xen/xsm/flask/flask_op.c
@@ -555,7 +555,7 @@ static int flask_get_peer_sid(struct xen
     struct evtchn *chn;
     struct domain_security_struct *dsec;
 
-    spin_lock(&d->event_lock);
+    read_lock(&d->event_lock);
 
     if ( !port_is_valid(d, arg->evtchn) )
         goto out;
@@ -573,7 +573,7 @@ static int flask_get_peer_sid(struct xen
     rv = 0;
 
  out:
-    spin_unlock(&d->event_lock);
+    read_unlock(&d->event_lock);
     return rv;
 }
 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held
  2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
                   ` (6 preceding siblings ...)
  2020-10-20 14:11 ` [PATCH v2 7/8] evtchn: convert domain event " Jan Beulich
@ 2020-10-20 14:13 ` Jan Beulich
  2020-11-03 10:17   ` Isaila Alexandru
  7 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-20 14:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini, Tamas K Lengyel, Alexandru Isaila,
	Petre Pircalabu

While there don't look to be any problems with this right now, the lock
order implications from holding the lock can be very difficult to follow
(and may be easy to violate unknowingly). The present callbacks don't
(and no such callback should) have any need for the lock to be held.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TODO: vm_event_disable() frees the structures used by respective
      callbacks - need to either use call_rcu() for freeing, or maintain
      a count of in-progress calls, for evtchn_close() to wait to drop
      to zero before dropping the lock / returning.

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -763,9 +763,18 @@ int evtchn_send(struct domain *ld, unsig
         rport = lchn->u.interdomain.remote_port;
         rchn  = evtchn_from_port(rd, rport);
         if ( consumer_is_xen(rchn) )
-            xen_notification_fn(rchn)(rd->vcpu[rchn->notify_vcpu_id], rport);
-        else
-            evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
+        {
+            /* Don't keep holding the lock for the call below. */
+            xen_event_channel_notification_t fn = xen_notification_fn(rchn);
+            struct vcpu *rv = rd->vcpu[rchn->notify_vcpu_id];
+
+            rcu_lock_domain(rd);
+            spin_unlock_irqrestore(&lchn->lock, flags);
+            fn(rv, rport);
+            rcu_unlock_domain(rd);
+            return 0;
+        }
+        evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
         break;
     case ECS_IPI:
         evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
@ 2020-10-21 15:46   ` Roger Pau Monné
  2020-10-22  7:33     ` Jan Beulich
  2020-10-22  9:21   ` Roger Pau Monné
  2020-10-30 10:15   ` Julien Grall
  2 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-21 15:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
> There's no global lock around the updating of this global piece of data.
> Make use of cmpxchgptr() to avoid two entities racing with their
> updates.
> 
> While touching the functionality, mark xen_consumers[] read-mostly (or
> else the if() condition could use the result of cmpxchgptr(), writing to
> the slot unconditionally).

I'm not sure I get this, likely related to the comment I have below.

> The use of cmpxchgptr() here points out (by way of clang warning about
> it) that its original use of const was slightly wrong. Adjust the
> placement, or else undefined behavior of const qualifying a function
> type will result.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> ---
> v2: Use (and hence generalize) cmpxchgptr(). Add comment. Expand /
>     adjust description.
> 
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -57,7 +57,8 @@
>   * with a pointer, we stash them dynamically in a small lookup array which
>   * can be indexed by a small integer.
>   */
> -static xen_event_channel_notification_t xen_consumers[NR_XEN_CONSUMERS];
> +static xen_event_channel_notification_t __read_mostly
> +    xen_consumers[NR_XEN_CONSUMERS];
>  
>  /* Default notification action: wake up from wait_on_xen_event_channel(). */
>  static void default_xen_notification_fn(struct vcpu *v, unsigned int port)
> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
>  
>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
>      {
> +        /* Use cmpxchgptr() in lieu of a global lock. */
>          if ( xen_consumers[i] == NULL )
> -            xen_consumers[i] = fn;
> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
>          if ( xen_consumers[i] == fn )
>              break;

I think you could join it as:

if ( !xen_consumers[i] &&
     !cmpxchgptr(&xen_consumers[i], NULL, fn) )
    break;

As cmpxchgptr will return the previous value of &xen_consumers[i]?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one
  2020-10-20 14:08 ` [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one Jan Beulich
@ 2020-10-21 16:00   ` Roger Pau Monné
  2020-10-30 10:21   ` Julien Grall
  1 sibling, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-21 16:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Tue, Oct 20, 2020 at 04:08:33PM +0200, Jan Beulich wrote:
> Having a FIFO specific header is not (or at least no longer) warranted
> with just three function declarations left there. Introduce a private
> header instead, moving there some further items from xen/event.h.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v2: New.
> ---
> TBD: If - considering the layering violation that's there anyway - we
>      allowed PV shim code to make use of this header, a few more items
>      could be moved out of "public sight".
> 
> --- a/xen/common/event_2l.c
> +++ b/xen/common/event_2l.c
> @@ -7,11 +7,12 @@
>   * Version 2 or later.  See the file COPYING for more details.
>   */
>  
> +#include "event_channel.h"
> +
>  #include <xen/init.h>
>  #include <xen/lib.h>
>  #include <xen/errno.h>
>  #include <xen/sched.h>
> -#include <xen/event.h>
>  
>  #include <asm/guest_atomics.h>
>  
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -14,17 +14,17 @@
>   * along with this program; If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include "event_channel.h"
> +
>  #include <xen/init.h>
>  #include <xen/lib.h>
>  #include <xen/errno.h>
>  #include <xen/sched.h>
> -#include <xen/event.h>
>  #include <xen/irq.h>
>  #include <xen/iocap.h>
>  #include <xen/compat.h>
>  #include <xen/guest_access.h>
>  #include <xen/keyhandler.h>
> -#include <xen/event_fifo.h>
>  #include <asm/current.h>
>  
>  #include <public/xen.h>
> --- /dev/null
> +++ b/xen/common/event_channel.h
> @@ -0,0 +1,63 @@
> +/* Event channel handling private header. */

I've got no idea whether it matters or not, but the code moved here
from xen/events.h had an explicit copyright banner with Keirs name,
should this be kept?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-21 15:46   ` Roger Pau Monné
@ 2020-10-22  7:33     ` Jan Beulich
  2020-10-22  8:11       ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-22  7:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On 21.10.2020 17:46, Roger Pau Monné wrote:
> On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
>> There's no global lock around the updating of this global piece of data.
>> Make use of cmpxchgptr() to avoid two entities racing with their
>> updates.
>>
>> While touching the functionality, mark xen_consumers[] read-mostly (or
>> else the if() condition could use the result of cmpxchgptr(), writing to
>> the slot unconditionally).
> 
> I'm not sure I get this, likely related to the comment I have below.

This is about the alternative case of invoking cmpxchgptr()
without the if() around it. On x86 this would mean always
writing the field, even if the designated value is already in
place.

>> --- a/xen/common/event_channel.c
>> +++ b/xen/common/event_channel.c
>> @@ -57,7 +57,8 @@
>>   * with a pointer, we stash them dynamically in a small lookup array which
>>   * can be indexed by a small integer.
>>   */
>> -static xen_event_channel_notification_t xen_consumers[NR_XEN_CONSUMERS];
>> +static xen_event_channel_notification_t __read_mostly
>> +    xen_consumers[NR_XEN_CONSUMERS];
>>  
>>  /* Default notification action: wake up from wait_on_xen_event_channel(). */
>>  static void default_xen_notification_fn(struct vcpu *v, unsigned int port)
>> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
>>  
>>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
>>      {
>> +        /* Use cmpxchgptr() in lieu of a global lock. */
>>          if ( xen_consumers[i] == NULL )
>> -            xen_consumers[i] = fn;
>> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
>>          if ( xen_consumers[i] == fn )
>>              break;
> 
> I think you could join it as:
> 
> if ( !xen_consumers[i] &&
>      !cmpxchgptr(&xen_consumers[i], NULL, fn) )
>     break;
> 
> As cmpxchgptr will return the previous value of &xen_consumers[i]?

But then you also have to check whether the returned value is
fn (or retain the 2nd if()).

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-22  7:33     ` Jan Beulich
@ 2020-10-22  8:11       ` Roger Pau Monné
  2020-10-22  8:15         ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22  8:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Thu, Oct 22, 2020 at 09:33:27AM +0200, Jan Beulich wrote:
> On 21.10.2020 17:46, Roger Pau Monné wrote:
> > On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
> >> There's no global lock around the updating of this global piece of data.
> >> Make use of cmpxchgptr() to avoid two entities racing with their
> >> updates.
> >>
> >> While touching the functionality, mark xen_consumers[] read-mostly (or
> >> else the if() condition could use the result of cmpxchgptr(), writing to
> >> the slot unconditionally).
> > 
> > I'm not sure I get this, likely related to the comment I have below.
> 
> This is about the alternative case of invoking cmpxchgptr()
> without the if() around it. On x86 this would mean always
> writing the field, even if the designated value is already in
> place.
> 
> >> --- a/xen/common/event_channel.c
> >> +++ b/xen/common/event_channel.c
> >> @@ -57,7 +57,8 @@
> >>   * with a pointer, we stash them dynamically in a small lookup array which
> >>   * can be indexed by a small integer.
> >>   */
> >> -static xen_event_channel_notification_t xen_consumers[NR_XEN_CONSUMERS];
> >> +static xen_event_channel_notification_t __read_mostly
> >> +    xen_consumers[NR_XEN_CONSUMERS];
> >>  
> >>  /* Default notification action: wake up from wait_on_xen_event_channel(). */
> >>  static void default_xen_notification_fn(struct vcpu *v, unsigned int port)
> >> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
> >>  
> >>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
> >>      {
> >> +        /* Use cmpxchgptr() in lieu of a global lock. */
> >>          if ( xen_consumers[i] == NULL )
> >> -            xen_consumers[i] = fn;
> >> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
> >>          if ( xen_consumers[i] == fn )
> >>              break;
> > 
> > I think you could join it as:
> > 
> > if ( !xen_consumers[i] &&
> >      !cmpxchgptr(&xen_consumers[i], NULL, fn) )
> >     break;
> > 
> > As cmpxchgptr will return the previous value of &xen_consumers[i]?
> 
> But then you also have to check whether the returned value is
> fn (or retain the 2nd if()).

__cmpxchg comment says that success of the operation is indicated when
the returned value equals the old value, so it's my understanding that
cmpxchgptr returning NULL would mean the exchange has succeed and that
xen_consumers[i] == fn?

Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-22  8:11       ` Roger Pau Monné
@ 2020-10-22  8:15         ` Jan Beulich
  2020-10-22  8:29           ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-22  8:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On 22.10.2020 10:11, Roger Pau Monné wrote:
> On Thu, Oct 22, 2020 at 09:33:27AM +0200, Jan Beulich wrote:
>> On 21.10.2020 17:46, Roger Pau Monné wrote:
>>> On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
>>>> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
>>>>  
>>>>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
>>>>      {
>>>> +        /* Use cmpxchgptr() in lieu of a global lock. */
>>>>          if ( xen_consumers[i] == NULL )
>>>> -            xen_consumers[i] = fn;
>>>> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
>>>>          if ( xen_consumers[i] == fn )
>>>>              break;
>>>
>>> I think you could join it as:
>>>
>>> if ( !xen_consumers[i] &&
>>>      !cmpxchgptr(&xen_consumers[i], NULL, fn) )
>>>     break;
>>>
>>> As cmpxchgptr will return the previous value of &xen_consumers[i]?
>>
>> But then you also have to check whether the returned value is
>> fn (or retain the 2nd if()).
> 
> __cmpxchg comment says that success of the operation is indicated when
> the returned value equals the old value, so it's my understanding that
> cmpxchgptr returning NULL would mean the exchange has succeed and that
> xen_consumers[i] == fn?

Correct. But if xen_consumers[i] == fn before the call, the return
value will be fn. The cmpxchg() wasn't "successful" in this case
(it didn't update anything), but the state of the array slot is what
we want.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-22  8:15         ` Jan Beulich
@ 2020-10-22  8:29           ` Roger Pau Monné
  2020-10-22  8:56             ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22  8:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Thu, Oct 22, 2020 at 10:15:45AM +0200, Jan Beulich wrote:
> On 22.10.2020 10:11, Roger Pau Monné wrote:
> > On Thu, Oct 22, 2020 at 09:33:27AM +0200, Jan Beulich wrote:
> >> On 21.10.2020 17:46, Roger Pau Monné wrote:
> >>> On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
> >>>> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
> >>>>  
> >>>>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
> >>>>      {
> >>>> +        /* Use cmpxchgptr() in lieu of a global lock. */
> >>>>          if ( xen_consumers[i] == NULL )
> >>>> -            xen_consumers[i] = fn;
> >>>> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
> >>>>          if ( xen_consumers[i] == fn )
> >>>>              break;
> >>>
> >>> I think you could join it as:
> >>>
> >>> if ( !xen_consumers[i] &&
> >>>      !cmpxchgptr(&xen_consumers[i], NULL, fn) )
> >>>     break;
> >>>
> >>> As cmpxchgptr will return the previous value of &xen_consumers[i]?
> >>
> >> But then you also have to check whether the returned value is
> >> fn (or retain the 2nd if()).
> > 
> > __cmpxchg comment says that success of the operation is indicated when
> > the returned value equals the old value, so it's my understanding that
> > cmpxchgptr returning NULL would mean the exchange has succeed and that
> > xen_consumers[i] == fn?
> 
> Correct. But if xen_consumers[i] == fn before the call, the return
> value will be fn. The cmpxchg() wasn't "successful" in this case
> (it didn't update anything), but the state of the array slot is what
> we want.

Oh, I get it now. You don't want the same fn populating more than one
slot.

I assume the reads of xen_consumers are not using ACCESS_ONCE or
read_atomic because we rely on the compiler performing such reads as
single instructions?

Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-22  8:29           ` Roger Pau Monné
@ 2020-10-22  8:56             ` Jan Beulich
  2020-10-22  9:25               ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-22  8:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On 22.10.2020 10:29, Roger Pau Monné wrote:
> On Thu, Oct 22, 2020 at 10:15:45AM +0200, Jan Beulich wrote:
>> On 22.10.2020 10:11, Roger Pau Monné wrote:
>>> On Thu, Oct 22, 2020 at 09:33:27AM +0200, Jan Beulich wrote:
>>>> On 21.10.2020 17:46, Roger Pau Monné wrote:
>>>>> On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
>>>>>> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
>>>>>>  
>>>>>>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
>>>>>>      {
>>>>>> +        /* Use cmpxchgptr() in lieu of a global lock. */
>>>>>>          if ( xen_consumers[i] == NULL )
>>>>>> -            xen_consumers[i] = fn;
>>>>>> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
>>>>>>          if ( xen_consumers[i] == fn )
>>>>>>              break;
>>>>>
>>>>> I think you could join it as:
>>>>>
>>>>> if ( !xen_consumers[i] &&
>>>>>      !cmpxchgptr(&xen_consumers[i], NULL, fn) )
>>>>>     break;
>>>>>
>>>>> As cmpxchgptr will return the previous value of &xen_consumers[i]?
>>>>
>>>> But then you also have to check whether the returned value is
>>>> fn (or retain the 2nd if()).
>>>
>>> __cmpxchg comment says that success of the operation is indicated when
>>> the returned value equals the old value, so it's my understanding that
>>> cmpxchgptr returning NULL would mean the exchange has succeed and that
>>> xen_consumers[i] == fn?
>>
>> Correct. But if xen_consumers[i] == fn before the call, the return
>> value will be fn. The cmpxchg() wasn't "successful" in this case
>> (it didn't update anything), but the state of the array slot is what
>> we want.
> 
> Oh, I get it now. You don't want the same fn populating more than one
> slot.

FAOD it's not just "want", it's a strict requirement.

> I assume the reads of xen_consumers are not using ACCESS_ONCE or
> read_atomic because we rely on the compiler performing such reads as
> single instructions?

Yes, as in so many other places in the code base.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
  2020-10-21 15:46   ` Roger Pau Monné
@ 2020-10-22  9:21   ` Roger Pau Monné
  2020-10-30 10:15   ` Julien Grall
  2 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22  9:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
> There's no global lock around the updating of this global piece of data.
> Make use of cmpxchgptr() to avoid two entities racing with their
> updates.
> 
> While touching the functionality, mark xen_consumers[] read-mostly (or
> else the if() condition could use the result of cmpxchgptr(), writing to
> the slot unconditionally).
> 
> The use of cmpxchgptr() here points out (by way of clang warning about
> it) that its original use of const was slightly wrong. Adjust the
> placement, or else undefined behavior of const qualifying a function
> type will result.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-22  8:56             ` Jan Beulich
@ 2020-10-22  9:25               ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22  9:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Thu, Oct 22, 2020 at 10:56:15AM +0200, Jan Beulich wrote:
> On 22.10.2020 10:29, Roger Pau Monné wrote:
> > On Thu, Oct 22, 2020 at 10:15:45AM +0200, Jan Beulich wrote:
> >> On 22.10.2020 10:11, Roger Pau Monné wrote:
> >>> On Thu, Oct 22, 2020 at 09:33:27AM +0200, Jan Beulich wrote:
> >>>> On 21.10.2020 17:46, Roger Pau Monné wrote:
> >>>>> On Tue, Oct 20, 2020 at 04:08:13PM +0200, Jan Beulich wrote:
> >>>>>> @@ -80,8 +81,9 @@ static uint8_t get_xen_consumer(xen_even
> >>>>>>  
> >>>>>>      for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
> >>>>>>      {
> >>>>>> +        /* Use cmpxchgptr() in lieu of a global lock. */
> >>>>>>          if ( xen_consumers[i] == NULL )
> >>>>>> -            xen_consumers[i] = fn;
> >>>>>> +            cmpxchgptr(&xen_consumers[i], NULL, fn);
> >>>>>>          if ( xen_consumers[i] == fn )
> >>>>>>              break;
> >>>>>
> >>>>> I think you could join it as:
> >>>>>
> >>>>> if ( !xen_consumers[i] &&
> >>>>>      !cmpxchgptr(&xen_consumers[i], NULL, fn) )
> >>>>>     break;
> >>>>>
> >>>>> As cmpxchgptr will return the previous value of &xen_consumers[i]?
> >>>>
> >>>> But then you also have to check whether the returned value is
> >>>> fn (or retain the 2nd if()).
> >>>
> >>> __cmpxchg comment says that success of the operation is indicated when
> >>> the returned value equals the old value, so it's my understanding that
> >>> cmpxchgptr returning NULL would mean the exchange has succeed and that
> >>> xen_consumers[i] == fn?
> >>
> >> Correct. But if xen_consumers[i] == fn before the call, the return
> >> value will be fn. The cmpxchg() wasn't "successful" in this case
> >> (it didn't update anything), but the state of the array slot is what
> >> we want.
> > 
> > Oh, I get it now. You don't want the same fn populating more than one
> > slot.
> 
> FAOD it's not just "want", it's a strict requirement.

I wouldn't mind having a comment to that effect in the function, but I
won't insist.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/8] evtchn: rename and adjust guest_enabled_event()
  2020-10-20 14:09 ` [PATCH v2 3/8] evtchn: rename and adjust guest_enabled_event() Jan Beulich
@ 2020-10-22 10:28   ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22 10:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Tue, Oct 20, 2020 at 04:09:16PM +0200, Jan Beulich wrote:
> The function isn't about an "event" in general, but about a vIRQ. The
> function also failed to honor global vIRQ-s, instead assuming the caller
> would pass vCPU 0 in such a case.
> 
> While at it also adjust the
> - types the function uses,
> - single user to make use of domain_vcpu().
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v2: New.
> 
> --- a/xen/arch/x86/cpu/mcheck/vmce.h
> +++ b/xen/arch/x86/cpu/mcheck/vmce.h
> @@ -5,9 +5,9 @@
>  
>  int vmce_init(struct cpuinfo_x86 *c);
>  
> -#define dom0_vmce_enabled() (hardware_domain && hardware_domain->max_vcpus \
> -        && hardware_domain->vcpu[0] \
> -        && guest_enabled_event(hardware_domain->vcpu[0], VIRQ_MCA))
> +#define dom0_vmce_enabled() \
> +    (hardware_domain && \
> +     evtchn_virq_enabled(domain_vcpu(hardware_domain, 0), VIRQ_MCA))
>  
>  int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn);
>  
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -778,9 +778,15 @@ out:
>      return ret;
>  }
>  
> -int guest_enabled_event(struct vcpu *v, uint32_t virq)
> +bool evtchn_virq_enabled(const struct vcpu *v, unsigned int virq)
>  {
> -    return ((v != NULL) && (v->virq_to_evtchn[virq] != 0));
> +    if ( !v )

Not sure it's worth adding a check that virq < NR_VIRQS here just to
be on the safe side...

> +        return false;
> +
> +    if ( virq_is_global(virq) && v->vcpu_id )

...as virq_is_global has an ASSERT to that extend (but that would be a
nop on release builds).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock
  2020-10-20 14:09 ` [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock Jan Beulich
@ 2020-10-22 11:17   ` Roger Pau Monné
  2020-10-22 13:34     ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22 11:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Tue, Oct 20, 2020 at 04:09:41PM +0200, Jan Beulich wrote:
> Some lock wants to be held to make sure the port doesn't change state,
> but there's no point holding the per-domain event lock here. Switch to
> using the finer grained per-channel lock instead.

While true that's a fine grained lock, it also disables interrupts,
which the global event_lock didn't.

> FAOD this doesn't guarantee anything towards in particular
> evtchn_fifo_set_pending(), as for interdomain channels that function
> would be called with the remote side's per-channel lock held.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock
  2020-10-22 11:17   ` Roger Pau Monné
@ 2020-10-22 13:34     ` Jan Beulich
  0 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2020-10-22 13:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On 22.10.2020 13:17, Roger Pau Monné wrote:
> On Tue, Oct 20, 2020 at 04:09:41PM +0200, Jan Beulich wrote:
>> Some lock wants to be held to make sure the port doesn't change state,
>> but there's no point holding the per-domain event lock here. Switch to
>> using the finer grained per-channel lock instead.
> 
> While true that's a fine grained lock, it also disables interrupts,
> which the global event_lock didn't.

True, yet we're aiming at dropping this aspect again. Hence I've
added "(albeit as a downside for the time being this requires
disabling interrupts for a short period of time)".

>> FAOD this doesn't guarantee anything towards in particular
>> evtchn_fifo_set_pending(), as for interdomain channels that function
>> would be called with the remote side's per-channel lock held.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-20 14:10 ` [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq() Jan Beulich
@ 2020-10-22 16:00   ` Roger Pau Monné
  2020-10-22 16:17     ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2020-10-22 16:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
> The per-vCPU virq_lock, which is being held anyway, together with there
> not being any call to evtchn_port_set_pending() when v->virq_to_evtchn[]
> is zero, provide sufficient guarantees.

This is also fine because closing the event channel will be done with
the VIRQ hold held also AFAICT.

> Undo the lock addition done for
> XSA-343 (commit e045199c7c9c "evtchn: address races with
> evtchn_reset()"). Update the description next to struct evtchn_port_ops
> accordingly.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: New.
> 
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -794,7 +794,6 @@ void send_guest_vcpu_virq(struct vcpu *v
>      unsigned long flags;
>      int port;
>      struct domain *d;
> -    struct evtchn *chn;
>  
>      ASSERT(!virq_is_global(virq));
>  
> @@ -805,10 +804,7 @@ void send_guest_vcpu_virq(struct vcpu *v
>          goto out;
>  
>      d = v->domain;
> -    chn = evtchn_from_port(d, port);
> -    spin_lock(&chn->lock);
> -    evtchn_port_set_pending(d, v->vcpu_id, chn);
> -    spin_unlock(&chn->lock);
> +    evtchn_port_set_pending(d, v->vcpu_id, evtchn_from_port(d, port));
>  
>   out:
>      spin_unlock_irqrestore(&v->virq_lock, flags);
> @@ -837,9 +833,7 @@ void send_guest_global_virq(struct domai
>          goto out;
>  
>      chn = evtchn_from_port(d, port);
> -    spin_lock(&chn->lock);
>      evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
> -    spin_unlock(&chn->lock);
>  
>   out:
>      spin_unlock_irqrestore(&v->virq_lock, flags);
> --- a/xen/include/xen/event.h
> +++ b/xen/include/xen/event.h
> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>   * Low-level event channel port ops.
>   *
>   * All hooks have to be called with a lock held which prevents the channel
> - * from changing state. This may be the domain event lock, the per-channel
> - * lock, or in the case of sending interdomain events also the other side's
> - * per-channel lock. Exceptions apply in certain cases for the PV shim.
> + * from changing state. This may be
> + * - the domain event lock,
> + * - the per-channel lock,
> + * - in the case of sending interdomain events the other side's per-channel
> + *   lock,
> + * - in the case of sending non-global vIRQ-s the per-vCPU virq_lock (in
> + *   combination with the ordering enforced through how the vCPU's
> + *   virq_to_evtchn[] gets updated),
> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
> + * Exceptions apply in certain cases for the PV shim.

Having such a wide locking discipline looks dangerous to me, it's easy
to get things wrong without notice IMO.

Maybe we could add an assert to that effect in
evtchn_port_set_pending in order to make sure callers follow the
discipline?

Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-22 16:00   ` Roger Pau Monné
@ 2020-10-22 16:17     ` Jan Beulich
  2020-10-30 10:38       ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-22 16:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Wei Liu, Stefano Stabellini

On 22.10.2020 18:00, Roger Pau Monné wrote:
> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>> The per-vCPU virq_lock, which is being held anyway, together with there
>> not being any call to evtchn_port_set_pending() when v->virq_to_evtchn[]
>> is zero, provide sufficient guarantees.
> 
> This is also fine because closing the event channel will be done with
> the VIRQ hold held also AFAICT.

Right, I'm not going into these details (or else binding would also
need mentioning) here, as the code comment update should sufficiently
cover it. Hence just saying "sufficient guarantees".

>> --- a/xen/include/xen/event.h
>> +++ b/xen/include/xen/event.h
>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>   * Low-level event channel port ops.
>>   *
>>   * All hooks have to be called with a lock held which prevents the channel
>> - * from changing state. This may be the domain event lock, the per-channel
>> - * lock, or in the case of sending interdomain events also the other side's
>> - * per-channel lock. Exceptions apply in certain cases for the PV shim.
>> + * from changing state. This may be
>> + * - the domain event lock,
>> + * - the per-channel lock,
>> + * - in the case of sending interdomain events the other side's per-channel
>> + *   lock,
>> + * - in the case of sending non-global vIRQ-s the per-vCPU virq_lock (in
>> + *   combination with the ordering enforced through how the vCPU's
>> + *   virq_to_evtchn[] gets updated),
>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>> + * Exceptions apply in certain cases for the PV shim.
> 
> Having such a wide locking discipline looks dangerous to me, it's easy
> to get things wrong without notice IMO.

It is effectively only describing how things are (or were before
XSA-343, getting restored here).

> Maybe we could add an assert to that effect in
> evtchn_port_set_pending in order to make sure callers follow the
> discipline?

Would be nice, but (a) see the last sentence of the comment still
in context above and (b) it shouldn't be just set_pending(). The
comment starts with "All hooks ..." after all.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer()
  2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
  2020-10-21 15:46   ` Roger Pau Monné
  2020-10-22  9:21   ` Roger Pau Monné
@ 2020-10-30 10:15   ` Julien Grall
  2 siblings, 0 replies; 44+ messages in thread
From: Julien Grall @ 2020-10-30 10:15 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné

Hi Jan,

On 20/10/2020 15:08, Jan Beulich wrote:
> There's no global lock around the updating of this global piece of data.
> Make use of cmpxchgptr() to avoid two entities racing with their
> updates.
> 
> While touching the functionality, mark xen_consumers[] read-mostly (or
> else the if() condition could use the result of cmpxchgptr(), writing to
> the slot unconditionally).
> 
> The use of cmpxchgptr() here points out (by way of clang warning about
> it) that its original use of const was slightly wrong. Adjust the
> placement, or else undefined behavior of const qualifying a function
> type will result.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one
  2020-10-20 14:08 ` [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one Jan Beulich
  2020-10-21 16:00   ` Roger Pau Monné
@ 2020-10-30 10:21   ` Julien Grall
  2020-10-30 10:42     ` Jan Beulich
  1 sibling, 1 reply; 44+ messages in thread
From: Julien Grall @ 2020-10-30 10:21 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu, Stefano Stabellini

Hi Jan,

On 20/10/2020 15:08, Jan Beulich wrote:
> Having a FIFO specific header is not (or at least no longer) warranted
> with just three function declarations left there. Introduce a private
> header instead, moving there some further items from xen/event.h.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Julien Grall <jgrall@amazon.com>

> ---
> v2: New.
> ---
> TBD: If - considering the layering violation that's there anyway - we
>       allowed PV shim code to make use of this header, a few more items
>       could be moved out of "public sight".

Are you referring to PV shim calling function such as evtchn_bind_vcpu()?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-22 16:17     ` Jan Beulich
@ 2020-10-30 10:38       ` Julien Grall
  2020-10-30 10:49         ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Julien Grall @ 2020-10-30 10:38 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini

Hi Jan,

On 22/10/2020 17:17, Jan Beulich wrote:
> On 22.10.2020 18:00, Roger Pau Monné wrote:
>> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>>> The per-vCPU virq_lock, which is being held anyway, together with there
>>> not being any call to evtchn_port_set_pending() when v->virq_to_evtchn[]
>>> is zero, provide sufficient guarantees.
>>
>> This is also fine because closing the event channel will be done with
>> the VIRQ hold held also AFAICT.
> 
> Right, I'm not going into these details (or else binding would also
> need mentioning) here, as the code comment update should sufficiently
> cover it. Hence just saying "sufficient guarantees".
> 
>>> --- a/xen/include/xen/event.h
>>> +++ b/xen/include/xen/event.h
>>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>>    * Low-level event channel port ops.
>>>    *
>>>    * All hooks have to be called with a lock held which prevents the channel
>>> - * from changing state. This may be the domain event lock, the per-channel
>>> - * lock, or in the case of sending interdomain events also the other side's
>>> - * per-channel lock. Exceptions apply in certain cases for the PV shim.
>>> + * from changing state. This may be
>>> + * - the domain event lock,
>>> + * - the per-channel lock,
>>> + * - in the case of sending interdomain events the other side's per-channel
>>> + *   lock,
>>> + * - in the case of sending non-global vIRQ-s the per-vCPU virq_lock (in
>>> + *   combination with the ordering enforced through how the vCPU's
>>> + *   virq_to_evtchn[] gets updated),
>>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>>> + * Exceptions apply in certain cases for the PV shim.
>>
>> Having such a wide locking discipline looks dangerous to me, it's easy
>> to get things wrong without notice IMO.
> 
> It is effectively only describing how things are (or were before
> XSA-343, getting restored here).

I agree with Roger here, the new/old locking discipline is dangerous and 
it is only a matter of time before it will bite us again.

I think we should consider Juergen's series because the locking for the 
event channel is easier to understand.

With his series in place, this patch will become unecessary.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one
  2020-10-30 10:21   ` Julien Grall
@ 2020-10-30 10:42     ` Jan Beulich
  2020-10-30 10:44       ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 10:42 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, xen-devel

On 30.10.2020 11:21, Julien Grall wrote:
> On 20/10/2020 15:08, Jan Beulich wrote:
>> Having a FIFO specific header is not (or at least no longer) warranted
>> with just three function declarations left there. Introduce a private
>> header instead, moving there some further items from xen/event.h.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Julien Grall <jgrall@amazon.com>

Thanks; perhaps you didn't notice this went in already?

>> ---
>> v2: New.
>> ---
>> TBD: If - considering the layering violation that's there anyway - we
>>       allowed PV shim code to make use of this header, a few more items
>>       could be moved out of "public sight".
> 
> Are you referring to PV shim calling function such as evtchn_bind_vcpu()?

Yes.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one
  2020-10-30 10:42     ` Jan Beulich
@ 2020-10-30 10:44       ` Julien Grall
  0 siblings, 0 replies; 44+ messages in thread
From: Julien Grall @ 2020-10-30 10:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, xen-devel



On 30/10/2020 10:42, Jan Beulich wrote:
> On 30.10.2020 11:21, Julien Grall wrote:
>> On 20/10/2020 15:08, Jan Beulich wrote:
>>> Having a FIFO specific header is not (or at least no longer) warranted
>>> with just three function declarations left there. Introduce a private
>>> header instead, moving there some further items from xen/event.h.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> Acked-by: Julien Grall <jgrall@amazon.com>
> 
> Thanks; perhaps you didn't notice this went in already?

I only noticed it afterwards. I find useful when committers send a quick 
message mentioning that part of the series has been committed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 10:38       ` Julien Grall
@ 2020-10-30 10:49         ` Jan Beulich
  2020-10-30 10:57           ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 10:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné

On 30.10.2020 11:38, Julien Grall wrote:
> On 22/10/2020 17:17, Jan Beulich wrote:
>> On 22.10.2020 18:00, Roger Pau Monné wrote:
>>> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>>>> --- a/xen/include/xen/event.h
>>>> +++ b/xen/include/xen/event.h
>>>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>>>    * Low-level event channel port ops.
>>>>    *
>>>>    * All hooks have to be called with a lock held which prevents the channel
>>>> - * from changing state. This may be the domain event lock, the per-channel
>>>> - * lock, or in the case of sending interdomain events also the other side's
>>>> - * per-channel lock. Exceptions apply in certain cases for the PV shim.
>>>> + * from changing state. This may be
>>>> + * - the domain event lock,
>>>> + * - the per-channel lock,
>>>> + * - in the case of sending interdomain events the other side's per-channel
>>>> + *   lock,
>>>> + * - in the case of sending non-global vIRQ-s the per-vCPU virq_lock (in
>>>> + *   combination with the ordering enforced through how the vCPU's
>>>> + *   virq_to_evtchn[] gets updated),
>>>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>>>> + * Exceptions apply in certain cases for the PV shim.
>>>
>>> Having such a wide locking discipline looks dangerous to me, it's easy
>>> to get things wrong without notice IMO.
>>
>> It is effectively only describing how things are (or were before
>> XSA-343, getting restored here).
> 
> I agree with Roger here, the new/old locking discipline is dangerous and 
> it is only a matter of time before it will bite us again.
> 
> I think we should consider Juergen's series because the locking for the 
> event channel is easier to understand.

We should, yes. The one thing I'm a little uneasy with is the
new lock "variant" that gets introduced. Custom locking methods
also are a common source of problems (which isn't to say I see
any here).

> With his series in place, this patch will become unecessary.

It'll become less important, but not pointless - any unnecessary
locking would better be removed imo.

I'd also like to note that the non-straightforward locking rules
wouldn't really change with his series; the benefit there really
is the dropping of the need for IRQ-safe locking.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 10:49         ` Jan Beulich
@ 2020-10-30 10:57           ` Julien Grall
  2020-10-30 11:15             ` Jürgen Groß
  0 siblings, 1 reply; 44+ messages in thread
From: Julien Grall @ 2020-10-30 10:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné



On 30/10/2020 10:49, Jan Beulich wrote:
> On 30.10.2020 11:38, Julien Grall wrote:
>> On 22/10/2020 17:17, Jan Beulich wrote:
>>> On 22.10.2020 18:00, Roger Pau Monné wrote:
>>>> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>>>>> --- a/xen/include/xen/event.h
>>>>> +++ b/xen/include/xen/event.h
>>>>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>>>>     * Low-level event channel port ops.
>>>>>     *
>>>>>     * All hooks have to be called with a lock held which prevents the channel
>>>>> - * from changing state. This may be the domain event lock, the per-channel
>>>>> - * lock, or in the case of sending interdomain events also the other side's
>>>>> - * per-channel lock. Exceptions apply in certain cases for the PV shim.
>>>>> + * from changing state. This may be
>>>>> + * - the domain event lock,
>>>>> + * - the per-channel lock,
>>>>> + * - in the case of sending interdomain events the other side's per-channel
>>>>> + *   lock,
>>>>> + * - in the case of sending non-global vIRQ-s the per-vCPU virq_lock (in
>>>>> + *   combination with the ordering enforced through how the vCPU's
>>>>> + *   virq_to_evtchn[] gets updated),
>>>>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>>>>> + * Exceptions apply in certain cases for the PV shim.
>>>>
>>>> Having such a wide locking discipline looks dangerous to me, it's easy
>>>> to get things wrong without notice IMO.
>>>
>>> It is effectively only describing how things are (or were before
>>> XSA-343, getting restored here).
>>
>> I agree with Roger here, the new/old locking discipline is dangerous and
>> it is only a matter of time before it will bite us again.
>>
>> I think we should consider Juergen's series because the locking for the
>> event channel is easier to understand.
> 
> We should, yes. The one thing I'm a little uneasy with is the
> new lock "variant" that gets introduced. Custom locking methods
> also are a common source of problems (which isn't to say I see
> any here).

I am also unease with a new lock "variant". However, this is the best 
proposal I have seen so far to unblock the issue.

I am open to other suggestion with simple locking discipline.

> 
>> With his series in place, this patch will become unecessary.
> 
> It'll become less important, but not pointless - any unnecessary
> locking would better be removed imo.

They may be unnecessary today but if tomorrow someone decide to rework 
the other lock, then you are just re-opening a security hole.

IHMO, having a sane locking system is far more important than removing 
locking that look "unnecessary".

> 
> I'd also like to note that the non-straightforward locking rules
> wouldn't really change with his series; the benefit there really
> is the dropping of the need for IRQ-safe locking.

Well, it is at least going towards that...

Cheers,

> 
> Jan
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one
  2020-10-20 14:10 ` [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one Jan Beulich
@ 2020-10-30 10:57   ` Julien Grall
  2020-10-30 12:00     ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Julien Grall @ 2020-10-30 10:57 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu, Stefano Stabellini

Hi Jan,

On 20/10/2020 15:10, Jan Beulich wrote:
> There's no need to serialize all sending of vIRQ-s; all that's needed
> is serialization against the closing of the respective event channels
> (so far by means of a barrier). To facilitate the conversion, switch to
> an ordinary write locked region in evtchn_close().
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Don't introduce/use rw_barrier() here. Add comment to
>      evtchn_bind_virq(). Re-base.
> 
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -160,7 +160,7 @@ struct vcpu *vcpu_create(struct domain *
>       v->vcpu_id = vcpu_id;
>       v->dirty_cpu = VCPU_CPU_CLEAN;
>   
> -    spin_lock_init(&v->virq_lock);
> +    rwlock_init(&v->virq_lock);
>   
>       tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL);
>   
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -449,6 +449,13 @@ int evtchn_bind_virq(evtchn_bind_virq_t
>   
>       spin_unlock_irqrestore(&chn->lock, flags);
>   
> +    /*
> +     * If by any, the update of virq_to_evtchn[] would need guarding by
> +     * virq_lock, but since this is the last action here, there's no strict
> +     * need to acquire the lock. Hnece holding event_lock isn't helpful

s/Hnece/Hence/

> +     * anymore at this point, but utilize that its unlocking acts as the
> +     * otherwise necessary smp_wmb() here.
> +     */
>       v->virq_to_evtchn[virq] = bind->port = port;

I think all access to v->virq_to_evtchn[virq] should use ACCESS_ONCE() 
or {read, write}_atomic() to avoid any store/load tearing.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 10:57           ` Julien Grall
@ 2020-10-30 11:15             ` Jürgen Groß
  2020-10-30 11:55               ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Jürgen Groß @ 2020-10-30 11:15 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné

On 30.10.20 11:57, Julien Grall wrote:
> 
> 
> On 30/10/2020 10:49, Jan Beulich wrote:
>> On 30.10.2020 11:38, Julien Grall wrote:
>>> On 22/10/2020 17:17, Jan Beulich wrote:
>>>> On 22.10.2020 18:00, Roger Pau Monné wrote:
>>>>> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>>>>>> --- a/xen/include/xen/event.h
>>>>>> +++ b/xen/include/xen/event.h
>>>>>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>>>>>     * Low-level event channel port ops.
>>>>>>     *
>>>>>>     * All hooks have to be called with a lock held which prevents 
>>>>>> the channel
>>>>>> - * from changing state. This may be the domain event lock, the 
>>>>>> per-channel
>>>>>> - * lock, or in the case of sending interdomain events also the 
>>>>>> other side's
>>>>>> - * per-channel lock. Exceptions apply in certain cases for the PV 
>>>>>> shim.
>>>>>> + * from changing state. This may be
>>>>>> + * - the domain event lock,
>>>>>> + * - the per-channel lock,
>>>>>> + * - in the case of sending interdomain events the other side's 
>>>>>> per-channel
>>>>>> + *   lock,
>>>>>> + * - in the case of sending non-global vIRQ-s the per-vCPU 
>>>>>> virq_lock (in
>>>>>> + *   combination with the ordering enforced through how the vCPU's
>>>>>> + *   virq_to_evtchn[] gets updated),
>>>>>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>>>>>> + * Exceptions apply in certain cases for the PV shim.
>>>>>
>>>>> Having such a wide locking discipline looks dangerous to me, it's easy
>>>>> to get things wrong without notice IMO.
>>>>
>>>> It is effectively only describing how things are (or were before
>>>> XSA-343, getting restored here).
>>>
>>> I agree with Roger here, the new/old locking discipline is dangerous and
>>> it is only a matter of time before it will bite us again.
>>>
>>> I think we should consider Juergen's series because the locking for the
>>> event channel is easier to understand.
>>
>> We should, yes. The one thing I'm a little uneasy with is the
>> new lock "variant" that gets introduced. Custom locking methods
>> also are a common source of problems (which isn't to say I see
>> any here).
> 
> I am also unease with a new lock "variant". However, this is the best 
> proposal I have seen so far to unblock the issue.
> 
> I am open to other suggestion with simple locking discipline.

In theory my new lock variant could easily be replaced by a rwlock and
using the try-variant for the readers only. The disadvantage of that
approach would be a growth of struct evtchn.


Juergen


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 11:15             ` Jürgen Groß
@ 2020-10-30 11:55               ` Jan Beulich
  2020-10-30 12:27                 ` Jürgen Groß
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 11:55 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Julien Grall

On 30.10.2020 12:15, Jürgen Groß wrote:
> On 30.10.20 11:57, Julien Grall wrote:
>>
>>
>> On 30/10/2020 10:49, Jan Beulich wrote:
>>> On 30.10.2020 11:38, Julien Grall wrote:
>>>> On 22/10/2020 17:17, Jan Beulich wrote:
>>>>> On 22.10.2020 18:00, Roger Pau Monné wrote:
>>>>>> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>>>>>>> --- a/xen/include/xen/event.h
>>>>>>> +++ b/xen/include/xen/event.h
>>>>>>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>>>>>>     * Low-level event channel port ops.
>>>>>>>     *
>>>>>>>     * All hooks have to be called with a lock held which prevents 
>>>>>>> the channel
>>>>>>> - * from changing state. This may be the domain event lock, the 
>>>>>>> per-channel
>>>>>>> - * lock, or in the case of sending interdomain events also the 
>>>>>>> other side's
>>>>>>> - * per-channel lock. Exceptions apply in certain cases for the PV 
>>>>>>> shim.
>>>>>>> + * from changing state. This may be
>>>>>>> + * - the domain event lock,
>>>>>>> + * - the per-channel lock,
>>>>>>> + * - in the case of sending interdomain events the other side's 
>>>>>>> per-channel
>>>>>>> + *   lock,
>>>>>>> + * - in the case of sending non-global vIRQ-s the per-vCPU 
>>>>>>> virq_lock (in
>>>>>>> + *   combination with the ordering enforced through how the vCPU's
>>>>>>> + *   virq_to_evtchn[] gets updated),
>>>>>>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>>>>>>> + * Exceptions apply in certain cases for the PV shim.
>>>>>>
>>>>>> Having such a wide locking discipline looks dangerous to me, it's easy
>>>>>> to get things wrong without notice IMO.
>>>>>
>>>>> It is effectively only describing how things are (or were before
>>>>> XSA-343, getting restored here).
>>>>
>>>> I agree with Roger here, the new/old locking discipline is dangerous and
>>>> it is only a matter of time before it will bite us again.
>>>>
>>>> I think we should consider Juergen's series because the locking for the
>>>> event channel is easier to understand.
>>>
>>> We should, yes. The one thing I'm a little uneasy with is the
>>> new lock "variant" that gets introduced. Custom locking methods
>>> also are a common source of problems (which isn't to say I see
>>> any here).
>>
>> I am also unease with a new lock "variant". However, this is the best 
>> proposal I have seen so far to unblock the issue.
>>
>> I am open to other suggestion with simple locking discipline.
> 
> In theory my new lock variant could easily be replaced by a rwlock and
> using the try-variant for the readers only.

Well, only until we would add check_lock() there, which I think
we should really have (not just on the slow paths, thanks to
the use of spin_lock() there). The read-vs-write properties
you're utilizing aren't applicable in the general case afaict,
and hence such checking would get in the way.

> The disadvantage of that approach would be a growth of struct evtchn.

Wasn't it you who had pointed out to me the aligned(64) attribute
on the struct (in a different context), which afaict would subsume
any possible growth?

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one
  2020-10-30 10:57   ` Julien Grall
@ 2020-10-30 12:00     ` Jan Beulich
  2020-10-30 12:08       ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 12:00 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, xen-devel

On 30.10.2020 11:57, Julien Grall wrote:
> On 20/10/2020 15:10, Jan Beulich wrote:
>> --- a/xen/common/event_channel.c
>> +++ b/xen/common/event_channel.c
>> @@ -449,6 +449,13 @@ int evtchn_bind_virq(evtchn_bind_virq_t
>>   
>>       spin_unlock_irqrestore(&chn->lock, flags);
>>   
>> +    /*
>> +     * If by any, the update of virq_to_evtchn[] would need guarding by
>> +     * virq_lock, but since this is the last action here, there's no strict
>> +     * need to acquire the lock. Hnece holding event_lock isn't helpful
> 
> s/Hnece/Hence/
> 
>> +     * anymore at this point, but utilize that its unlocking acts as the
>> +     * otherwise necessary smp_wmb() here.
>> +     */
>>       v->virq_to_evtchn[virq] = bind->port = port;
> 
> I think all access to v->virq_to_evtchn[virq] should use ACCESS_ONCE() 
> or {read, write}_atomic() to avoid any store/load tearing.

IOW you're suggesting this to be the subject of a separate patch?
I don't think such a conversion belongs here (nor even in this
series, seeing the much wider applicability of such a change
throughout the code base). Or are you seeing anything here which
would require such a conversion to be done as a prereq?

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one
  2020-10-30 12:00     ` Jan Beulich
@ 2020-10-30 12:08       ` Julien Grall
  2020-10-30 12:25         ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Julien Grall @ 2020-10-30 12:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, xen-devel



On 30/10/2020 12:00, Jan Beulich wrote:
> On 30.10.2020 11:57, Julien Grall wrote:
>> On 20/10/2020 15:10, Jan Beulich wrote:
>>> --- a/xen/common/event_channel.c
>>> +++ b/xen/common/event_channel.c
>>> @@ -449,6 +449,13 @@ int evtchn_bind_virq(evtchn_bind_virq_t
>>>    
>>>        spin_unlock_irqrestore(&chn->lock, flags);
>>>    
>>> +    /*
>>> +     * If by any, the update of virq_to_evtchn[] would need guarding by
>>> +     * virq_lock, but since this is the last action here, there's no strict
>>> +     * need to acquire the lock. Hnece holding event_lock isn't helpful
>>
>> s/Hnece/Hence/
>>
>>> +     * anymore at this point, but utilize that its unlocking acts as the
>>> +     * otherwise necessary smp_wmb() here.
>>> +     */
>>>        v->virq_to_evtchn[virq] = bind->port = port;
>>
>> I think all access to v->virq_to_evtchn[virq] should use ACCESS_ONCE()
>> or {read, write}_atomic() to avoid any store/load tearing.
> 
> IOW you're suggesting this to be the subject of a separate patch?
> I don't think such a conversion belongs here (nor even in this
> series, seeing the much wider applicability of such a change
> throughout the code base).
> Or are you seeing anything here which
> would require such a conversion to be done as a prereq?

Yes, your comment implies that it is fine to write to virq_to_evtchn[] 
without the lock taken. However, this is *only* valid if the compiler 
doesn't tear down load/store.

So this is a pre-req to get your comment valid.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one
  2020-10-30 12:08       ` Julien Grall
@ 2020-10-30 12:25         ` Jan Beulich
  2020-10-30 12:46           ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 12:25 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, xen-devel

On 30.10.2020 13:08, Julien Grall wrote:
> On 30/10/2020 12:00, Jan Beulich wrote:
>> On 30.10.2020 11:57, Julien Grall wrote:
>>> On 20/10/2020 15:10, Jan Beulich wrote:
>>>> --- a/xen/common/event_channel.c
>>>> +++ b/xen/common/event_channel.c
>>>> @@ -449,6 +449,13 @@ int evtchn_bind_virq(evtchn_bind_virq_t
>>>>    
>>>>        spin_unlock_irqrestore(&chn->lock, flags);
>>>>    
>>>> +    /*
>>>> +     * If by any, the update of virq_to_evtchn[] would need guarding by
>>>> +     * virq_lock, but since this is the last action here, there's no strict
>>>> +     * need to acquire the lock. Hnece holding event_lock isn't helpful
>>>
>>> s/Hnece/Hence/
>>>
>>>> +     * anymore at this point, but utilize that its unlocking acts as the
>>>> +     * otherwise necessary smp_wmb() here.
>>>> +     */
>>>>        v->virq_to_evtchn[virq] = bind->port = port;
>>>
>>> I think all access to v->virq_to_evtchn[virq] should use ACCESS_ONCE()
>>> or {read, write}_atomic() to avoid any store/load tearing.
>>
>> IOW you're suggesting this to be the subject of a separate patch?
>> I don't think such a conversion belongs here (nor even in this
>> series, seeing the much wider applicability of such a change
>> throughout the code base).
>> Or are you seeing anything here which
>> would require such a conversion to be done as a prereq?
> 
> Yes, your comment implies that it is fine to write to virq_to_evtchn[] 
> without the lock taken. However, this is *only* valid if the compiler 
> doesn't tear down load/store.
> 
> So this is a pre-req to get your comment valid.

That's an interesting way to view it. I'm only adding the comment,
without changing the code here. Iirc it was you who asked me to
add the comment. And now this is leading to you wanting me to do
entirely unrelated changes, when the code base is full of similar
assumptions towards no torn accesses? (Yes, I certainly can add
such a patch, but no, I don't think this should be a requirement
here.)

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 11:55               ` Jan Beulich
@ 2020-10-30 12:27                 ` Jürgen Groß
  2020-10-30 12:52                   ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Jürgen Groß @ 2020-10-30 12:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Julien Grall

On 30.10.20 12:55, Jan Beulich wrote:
> On 30.10.2020 12:15, Jürgen Groß wrote:
>> On 30.10.20 11:57, Julien Grall wrote:
>>>
>>>
>>> On 30/10/2020 10:49, Jan Beulich wrote:
>>>> On 30.10.2020 11:38, Julien Grall wrote:
>>>>> On 22/10/2020 17:17, Jan Beulich wrote:
>>>>>> On 22.10.2020 18:00, Roger Pau Monné wrote:
>>>>>>> On Tue, Oct 20, 2020 at 04:10:09PM +0200, Jan Beulich wrote:
>>>>>>>> --- a/xen/include/xen/event.h
>>>>>>>> +++ b/xen/include/xen/event.h
>>>>>>>> @@ -177,9 +177,16 @@ int evtchn_reset(struct domain *d, bool
>>>>>>>>      * Low-level event channel port ops.
>>>>>>>>      *
>>>>>>>>      * All hooks have to be called with a lock held which prevents
>>>>>>>> the channel
>>>>>>>> - * from changing state. This may be the domain event lock, the
>>>>>>>> per-channel
>>>>>>>> - * lock, or in the case of sending interdomain events also the
>>>>>>>> other side's
>>>>>>>> - * per-channel lock. Exceptions apply in certain cases for the PV
>>>>>>>> shim.
>>>>>>>> + * from changing state. This may be
>>>>>>>> + * - the domain event lock,
>>>>>>>> + * - the per-channel lock,
>>>>>>>> + * - in the case of sending interdomain events the other side's
>>>>>>>> per-channel
>>>>>>>> + *   lock,
>>>>>>>> + * - in the case of sending non-global vIRQ-s the per-vCPU
>>>>>>>> virq_lock (in
>>>>>>>> + *   combination with the ordering enforced through how the vCPU's
>>>>>>>> + *   virq_to_evtchn[] gets updated),
>>>>>>>> + * - in the case of sending global vIRQ-s vCPU 0's virq_lock.
>>>>>>>> + * Exceptions apply in certain cases for the PV shim.
>>>>>>>
>>>>>>> Having such a wide locking discipline looks dangerous to me, it's easy
>>>>>>> to get things wrong without notice IMO.
>>>>>>
>>>>>> It is effectively only describing how things are (or were before
>>>>>> XSA-343, getting restored here).
>>>>>
>>>>> I agree with Roger here, the new/old locking discipline is dangerous and
>>>>> it is only a matter of time before it will bite us again.
>>>>>
>>>>> I think we should consider Juergen's series because the locking for the
>>>>> event channel is easier to understand.
>>>>
>>>> We should, yes. The one thing I'm a little uneasy with is the
>>>> new lock "variant" that gets introduced. Custom locking methods
>>>> also are a common source of problems (which isn't to say I see
>>>> any here).
>>>
>>> I am also unease with a new lock "variant". However, this is the best
>>> proposal I have seen so far to unblock the issue.
>>>
>>> I am open to other suggestion with simple locking discipline.
>>
>> In theory my new lock variant could easily be replaced by a rwlock and
>> using the try-variant for the readers only.
> 
> Well, only until we would add check_lock() there, which I think
> we should really have (not just on the slow paths, thanks to
> the use of spin_lock() there). The read-vs-write properties
> you're utilizing aren't applicable in the general case afaict,
> and hence such checking would get in the way.

No, I don't think so.

As long as there is no read_lock() user with interrupts off we should be
fine. read_trylock() is no problem as it can't block.

> 
>> The disadvantage of that approach would be a growth of struct evtchn.
> 
> Wasn't it you who had pointed out to me the aligned(64) attribute
> on the struct (in a different context), which afaict would subsume
> any possible growth?

Oh, indeed.

The growth would be 8 bytes, leading to a max of 56 bytes then.


Juergen


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one
  2020-10-30 12:25         ` Jan Beulich
@ 2020-10-30 12:46           ` Julien Grall
  0 siblings, 0 replies; 44+ messages in thread
From: Julien Grall @ 2020-10-30 12:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, xen-devel



On 30/10/2020 12:25, Jan Beulich wrote:
> On 30.10.2020 13:08, Julien Grall wrote:
>> On 30/10/2020 12:00, Jan Beulich wrote:
>>> On 30.10.2020 11:57, Julien Grall wrote:
>>>> On 20/10/2020 15:10, Jan Beulich wrote:
>>>>> --- a/xen/common/event_channel.c
>>>>> +++ b/xen/common/event_channel.c
>>>>> @@ -449,6 +449,13 @@ int evtchn_bind_virq(evtchn_bind_virq_t
>>>>>     
>>>>>         spin_unlock_irqrestore(&chn->lock, flags);
>>>>>     
>>>>> +    /*
>>>>> +     * If by any, the update of virq_to_evtchn[] would need guarding by
>>>>> +     * virq_lock, but since this is the last action here, there's no strict
>>>>> +     * need to acquire the lock. Hnece holding event_lock isn't helpful
>>>>
>>>> s/Hnece/Hence/
>>>>
>>>>> +     * anymore at this point, but utilize that its unlocking acts as the
>>>>> +     * otherwise necessary smp_wmb() here.
>>>>> +     */
>>>>>         v->virq_to_evtchn[virq] = bind->port = port;
>>>>
>>>> I think all access to v->virq_to_evtchn[virq] should use ACCESS_ONCE()
>>>> or {read, write}_atomic() to avoid any store/load tearing.
>>>
>>> IOW you're suggesting this to be the subject of a separate patch?
>>> I don't think such a conversion belongs here (nor even in this
>>> series, seeing the much wider applicability of such a change
>>> throughout the code base).
>>> Or are you seeing anything here which
>>> would require such a conversion to be done as a prereq?
>>
>> Yes, your comment implies that it is fine to write to virq_to_evtchn[]
>> without the lock taken. However, this is *only* valid if the compiler
>> doesn't tear down load/store.
>>
>> So this is a pre-req to get your comment valid.
> 
> That's an interesting way to view it. I'm only adding the comment,
> without changing the code here. Iirc it was you who asked me to
> add the comment. 

Yes I asked this comment because it makes easier for reader to figure 
out how your code works. But this doesn't mean I wanted to have a 
misleading comment.

If you don't plan to handle the ACCESS_ONCE(), then it is best to be 
open to say it in the comment.

> And now this is leading to you wanting me to do
> entirely unrelated changes, when the code base is full of similar
> assumptions towards no torn accesses? 

I was expecting you writing that :). Well, small steps are always easier 
in order to achieve a consistent code base.

To me, this is similar to some of the reviewers pushing contributors to 
update the coding style of area touched. The difference here is without 
following a memory model, you are at the mercy of the compiler and XSAs.

> (Yes, I certainly can add
> such a patch, but no, I don't think this should be a requirement
> here.)

That's ok. But please update the comment to avoid misleading the reader.

An alternative would be to use the write_lock() here. This shouldn't 
impact that much the performance and nicely fix the issue.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 12:27                 ` Jürgen Groß
@ 2020-10-30 12:52                   ` Jan Beulich
  2020-10-30 13:02                     ` Jürgen Groß
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 12:52 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Julien Grall

On 30.10.2020 13:27, Jürgen Groß wrote:
> On 30.10.20 12:55, Jan Beulich wrote:
>> On 30.10.2020 12:15, Jürgen Groß wrote:
>>> On 30.10.20 11:57, Julien Grall wrote:
>>>> On 30/10/2020 10:49, Jan Beulich wrote:
>>>>> On 30.10.2020 11:38, Julien Grall wrote:
>>>>>> I think we should consider Juergen's series because the locking for the
>>>>>> event channel is easier to understand.
>>>>>
>>>>> We should, yes. The one thing I'm a little uneasy with is the
>>>>> new lock "variant" that gets introduced. Custom locking methods
>>>>> also are a common source of problems (which isn't to say I see
>>>>> any here).
>>>>
>>>> I am also unease with a new lock "variant". However, this is the best
>>>> proposal I have seen so far to unblock the issue.
>>>>
>>>> I am open to other suggestion with simple locking discipline.
>>>
>>> In theory my new lock variant could easily be replaced by a rwlock and
>>> using the try-variant for the readers only.
>>
>> Well, only until we would add check_lock() there, which I think
>> we should really have (not just on the slow paths, thanks to
>> the use of spin_lock() there). The read-vs-write properties
>> you're utilizing aren't applicable in the general case afaict,
>> and hence such checking would get in the way.
> 
> No, I don't think so.
> 
> As long as there is no read_lock() user with interrupts off we should be
> fine. read_trylock() is no problem as it can't block.

How would check_lock() notice the difference? It would be all the
same for read and write acquires of the lock, I think, and hence
it would still get unhappy about uses from IRQ paths afaict.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 12:52                   ` Jan Beulich
@ 2020-10-30 13:02                     ` Jürgen Groß
  2020-10-30 13:38                       ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Jürgen Groß @ 2020-10-30 13:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Julien Grall

On 30.10.20 13:52, Jan Beulich wrote:
> On 30.10.2020 13:27, Jürgen Groß wrote:
>> On 30.10.20 12:55, Jan Beulich wrote:
>>> On 30.10.2020 12:15, Jürgen Groß wrote:
>>>> On 30.10.20 11:57, Julien Grall wrote:
>>>>> On 30/10/2020 10:49, Jan Beulich wrote:
>>>>>> On 30.10.2020 11:38, Julien Grall wrote:
>>>>>>> I think we should consider Juergen's series because the locking for the
>>>>>>> event channel is easier to understand.
>>>>>>
>>>>>> We should, yes. The one thing I'm a little uneasy with is the
>>>>>> new lock "variant" that gets introduced. Custom locking methods
>>>>>> also are a common source of problems (which isn't to say I see
>>>>>> any here).
>>>>>
>>>>> I am also unease with a new lock "variant". However, this is the best
>>>>> proposal I have seen so far to unblock the issue.
>>>>>
>>>>> I am open to other suggestion with simple locking discipline.
>>>>
>>>> In theory my new lock variant could easily be replaced by a rwlock and
>>>> using the try-variant for the readers only.
>>>
>>> Well, only until we would add check_lock() there, which I think
>>> we should really have (not just on the slow paths, thanks to
>>> the use of spin_lock() there). The read-vs-write properties
>>> you're utilizing aren't applicable in the general case afaict,
>>> and hence such checking would get in the way.
>>
>> No, I don't think so.
>>
>> As long as there is no read_lock() user with interrupts off we should be
>> fine. read_trylock() is no problem as it can't block.
> 
> How would check_lock() notice the difference? It would be all the
> same for read and write acquires of the lock, I think, and hence
> it would still get unhappy about uses from IRQ paths afaict.

check_lock() isn't applicable here, at least not without modification.

I think our spinlock implementation is wrong in this regard in case a
lock is entered via spin_trylock(), BTW. Using spin_trylock() with
interrupts off for a lock normally taken with interrupts enabled is
perfectly fine IMO.


Juergen



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 13:02                     ` Jürgen Groß
@ 2020-10-30 13:38                       ` Jan Beulich
  2020-10-30 13:43                         ` Jürgen Groß
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2020-10-30 13:38 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Julien Grall

On 30.10.2020 14:02, Jürgen Groß wrote:
> On 30.10.20 13:52, Jan Beulich wrote:
>> On 30.10.2020 13:27, Jürgen Groß wrote:
>>> On 30.10.20 12:55, Jan Beulich wrote:
>>>> On 30.10.2020 12:15, Jürgen Groß wrote:
>>>>> On 30.10.20 11:57, Julien Grall wrote:
>>>>>> On 30/10/2020 10:49, Jan Beulich wrote:
>>>>>>> On 30.10.2020 11:38, Julien Grall wrote:
>>>>>>>> I think we should consider Juergen's series because the locking for the
>>>>>>>> event channel is easier to understand.
>>>>>>>
>>>>>>> We should, yes. The one thing I'm a little uneasy with is the
>>>>>>> new lock "variant" that gets introduced. Custom locking methods
>>>>>>> also are a common source of problems (which isn't to say I see
>>>>>>> any here).
>>>>>>
>>>>>> I am also unease with a new lock "variant". However, this is the best
>>>>>> proposal I have seen so far to unblock the issue.
>>>>>>
>>>>>> I am open to other suggestion with simple locking discipline.
>>>>>
>>>>> In theory my new lock variant could easily be replaced by a rwlock and
>>>>> using the try-variant for the readers only.
>>>>
>>>> Well, only until we would add check_lock() there, which I think
>>>> we should really have (not just on the slow paths, thanks to
>>>> the use of spin_lock() there). The read-vs-write properties
>>>> you're utilizing aren't applicable in the general case afaict,
>>>> and hence such checking would get in the way.
>>>
>>> No, I don't think so.
>>>
>>> As long as there is no read_lock() user with interrupts off we should be
>>> fine. read_trylock() is no problem as it can't block.
>>
>> How would check_lock() notice the difference? It would be all the
>> same for read and write acquires of the lock, I think, and hence
>> it would still get unhappy about uses from IRQ paths afaict.
> 
> check_lock() isn't applicable here, at least not without modification.
> 
> I think our spinlock implementation is wrong in this regard in case a
> lock is entered via spin_trylock(), BTW. Using spin_trylock() with
> interrupts off for a lock normally taken with interrupts enabled is
> perfectly fine IMO.

Hmm, I think you're right, in which I case guess we ought to relax
this.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()
  2020-10-30 13:38                       ` Jan Beulich
@ 2020-10-30 13:43                         ` Jürgen Groß
  0 siblings, 0 replies; 44+ messages in thread
From: Jürgen Groß @ 2020-10-30 13:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Stefano Stabellini, Roger Pau Monné,
	Julien Grall

On 30.10.20 14:38, Jan Beulich wrote:
> On 30.10.2020 14:02, Jürgen Groß wrote:
>> On 30.10.20 13:52, Jan Beulich wrote:
>>> On 30.10.2020 13:27, Jürgen Groß wrote:
>>>> On 30.10.20 12:55, Jan Beulich wrote:
>>>>> On 30.10.2020 12:15, Jürgen Groß wrote:
>>>>>> On 30.10.20 11:57, Julien Grall wrote:
>>>>>>> On 30/10/2020 10:49, Jan Beulich wrote:
>>>>>>>> On 30.10.2020 11:38, Julien Grall wrote:
>>>>>>>>> I think we should consider Juergen's series because the locking for the
>>>>>>>>> event channel is easier to understand.
>>>>>>>>
>>>>>>>> We should, yes. The one thing I'm a little uneasy with is the
>>>>>>>> new lock "variant" that gets introduced. Custom locking methods
>>>>>>>> also are a common source of problems (which isn't to say I see
>>>>>>>> any here).
>>>>>>>
>>>>>>> I am also unease with a new lock "variant". However, this is the best
>>>>>>> proposal I have seen so far to unblock the issue.
>>>>>>>
>>>>>>> I am open to other suggestion with simple locking discipline.
>>>>>>
>>>>>> In theory my new lock variant could easily be replaced by a rwlock and
>>>>>> using the try-variant for the readers only.
>>>>>
>>>>> Well, only until we would add check_lock() there, which I think
>>>>> we should really have (not just on the slow paths, thanks to
>>>>> the use of spin_lock() there). The read-vs-write properties
>>>>> you're utilizing aren't applicable in the general case afaict,
>>>>> and hence such checking would get in the way.
>>>>
>>>> No, I don't think so.
>>>>
>>>> As long as there is no read_lock() user with interrupts off we should be
>>>> fine. read_trylock() is no problem as it can't block.
>>>
>>> How would check_lock() notice the difference? It would be all the
>>> same for read and write acquires of the lock, I think, and hence
>>> it would still get unhappy about uses from IRQ paths afaict.
>>
>> check_lock() isn't applicable here, at least not without modification.
>>
>> I think our spinlock implementation is wrong in this regard in case a
>> lock is entered via spin_trylock(), BTW. Using spin_trylock() with
>> interrupts off for a lock normally taken with interrupts enabled is
>> perfectly fine IMO.
> 
> Hmm, I think you're right, in which I case guess we ought to relax
> this.

Just writing a patch. :-)

And one for adding check_lock() to rwlocks, too.


Juergen


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held
  2020-10-20 14:13 ` [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held Jan Beulich
@ 2020-11-03 10:17   ` Isaila Alexandru
  2020-11-03 14:54     ` Tamas K Lengyel
  0 siblings, 1 reply; 44+ messages in thread
From: Isaila Alexandru @ 2020-11-03 10:17 UTC (permalink / raw)
  To: Jan Beulich, xen-devel, Tamas K Lengyel, Petre Pircalabu
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini


Hi Jan and sorry for the late reply,

On 20.10.2020 17:13, Jan Beulich wrote:
> While there don't look to be any problems with this right now, the lock
> order implications from holding the lock can be very difficult to follow
> (and may be easy to violate unknowingly). The present callbacks don't
> (and no such callback should) have any need for the lock to be held.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TODO: vm_event_disable() frees the structures used by respective
>        callbacks - need to either use call_rcu() for freeing, or maintain
>        a count of in-progress calls, for evtchn_close() to wait to drop
>        to zero before dropping the lock / returning.

I would go with the second solution and maintain a count of in-progress 
calls.

Tamas, Petre how does this sound?

Alex

> 
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -763,9 +763,18 @@ int evtchn_send(struct domain *ld, unsig
>           rport = lchn->u.interdomain.remote_port;
>           rchn  = evtchn_from_port(rd, rport);
>           if ( consumer_is_xen(rchn) )
> -            xen_notification_fn(rchn)(rd->vcpu[rchn->notify_vcpu_id], rport);
> -        else
> -            evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
> +        {
> +            /* Don't keep holding the lock for the call below. */
> +            xen_event_channel_notification_t fn = xen_notification_fn(rchn);
> +            struct vcpu *rv = rd->vcpu[rchn->notify_vcpu_id];
> +
> +            rcu_lock_domain(rd);
> +            spin_unlock_irqrestore(&lchn->lock, flags);
> +            fn(rv, rport);
> +            rcu_unlock_domain(rd);
> +            return 0;
> +        }
> +        evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
>           break;
>       case ECS_IPI:
>           evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held
  2020-11-03 10:17   ` Isaila Alexandru
@ 2020-11-03 14:54     ` Tamas K Lengyel
  0 siblings, 0 replies; 44+ messages in thread
From: Tamas K Lengyel @ 2020-11-03 14:54 UTC (permalink / raw)
  To: Isaila Alexandru
  Cc: Jan Beulich, xen-devel, Petre Pircalabu, Andrew Cooper,
	George Dunlap, Ian Jackson, Julien Grall, Wei Liu,
	Stefano Stabellini

On Tue, Nov 3, 2020 at 5:17 AM Isaila Alexandru <aisaila@bitdefender.com> wrote:
>
>
> Hi Jan and sorry for the late reply,
>
> On 20.10.2020 17:13, Jan Beulich wrote:
> > While there don't look to be any problems with this right now, the lock
> > order implications from holding the lock can be very difficult to follow
> > (and may be easy to violate unknowingly). The present callbacks don't
> > (and no such callback should) have any need for the lock to be held.
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > ---
> > TODO: vm_event_disable() frees the structures used by respective
> >        callbacks - need to either use call_rcu() for freeing, or maintain
> >        a count of in-progress calls, for evtchn_close() to wait to drop
> >        to zero before dropping the lock / returning.
>
> I would go with the second solution and maintain a count of in-progress
> calls.
>
> Tamas, Petre how does this sound?

Agree, doing a reference count before freeing is preferred.

Thanks,
Tamas


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-11-03 14:55 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20 14:06 [PATCH v2 0/8] evtchn: recent XSAs follow-on Jan Beulich
2020-10-20 14:08 ` [PATCH v2 1/8] evtchn: avoid race in get_xen_consumer() Jan Beulich
2020-10-21 15:46   ` Roger Pau Monné
2020-10-22  7:33     ` Jan Beulich
2020-10-22  8:11       ` Roger Pau Monné
2020-10-22  8:15         ` Jan Beulich
2020-10-22  8:29           ` Roger Pau Monné
2020-10-22  8:56             ` Jan Beulich
2020-10-22  9:25               ` Roger Pau Monné
2020-10-22  9:21   ` Roger Pau Monné
2020-10-30 10:15   ` Julien Grall
2020-10-20 14:08 ` [PATCH v2 2/8] evtchn: replace FIFO-specific header by generic private one Jan Beulich
2020-10-21 16:00   ` Roger Pau Monné
2020-10-30 10:21   ` Julien Grall
2020-10-30 10:42     ` Jan Beulich
2020-10-30 10:44       ` Julien Grall
2020-10-20 14:09 ` [PATCH v2 3/8] evtchn: rename and adjust guest_enabled_event() Jan Beulich
2020-10-22 10:28   ` Roger Pau Monné
2020-10-20 14:09 ` [PATCH v2 4/8] evtchn: let evtchn_set_priority() acquire the per-channel lock Jan Beulich
2020-10-22 11:17   ` Roger Pau Monné
2020-10-22 13:34     ` Jan Beulich
2020-10-20 14:10 ` [PATCH v2 5/8] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq() Jan Beulich
2020-10-22 16:00   ` Roger Pau Monné
2020-10-22 16:17     ` Jan Beulich
2020-10-30 10:38       ` Julien Grall
2020-10-30 10:49         ` Jan Beulich
2020-10-30 10:57           ` Julien Grall
2020-10-30 11:15             ` Jürgen Groß
2020-10-30 11:55               ` Jan Beulich
2020-10-30 12:27                 ` Jürgen Groß
2020-10-30 12:52                   ` Jan Beulich
2020-10-30 13:02                     ` Jürgen Groß
2020-10-30 13:38                       ` Jan Beulich
2020-10-30 13:43                         ` Jürgen Groß
2020-10-20 14:10 ` [PATCH v2 6/8] evtchn: convert vIRQ lock to an r/w one Jan Beulich
2020-10-30 10:57   ` Julien Grall
2020-10-30 12:00     ` Jan Beulich
2020-10-30 12:08       ` Julien Grall
2020-10-30 12:25         ` Jan Beulich
2020-10-30 12:46           ` Julien Grall
2020-10-20 14:11 ` [PATCH v2 7/8] evtchn: convert domain event " Jan Beulich
2020-10-20 14:13 ` [PATCH RFC v2 8/8] evtchn: don't call Xen consumer callback with per-channel lock held Jan Beulich
2020-11-03 10:17   ` Isaila Alexandru
2020-11-03 14:54     ` Tamas K Lengyel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.