All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/3] XSA-343 followup patches
@ 2020-11-09 16:38 Juergen Gross
  2020-11-09 16:38 ` [PATCH v6 1/3] xen/events: access last_priority and last_vcpu_id together Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Juergen Gross @ 2020-11-09 16:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Daniel De Graaf

The patches for XSA-343 produced some fallout, especially the event
channel locking has shown to be problematic.

Patch 1 is targeting fifo event channels for avoiding any races for the
case that the fifo queue has been changed for a specific event channel.

The second patch is modifying the per event channel locking scheme in
order to avoid deadlocks and problems due to the event channel lock
having been changed to require IRQs off by the XSA-343 patches.

Changes in V6:
- added patch 3 (Jan Beulich)
- switched some more read_trylock() cases to read_lock() (Jan Beulich)

Changes in V5:
- moved evtchn_write_[un]lock() to event_channel.c (Jan Beulich)
- used normal read_lock() in some cases (Jan Beulich)

Changes in V4:
- switched to real rwlock

Changes in V3:
- addressed comments

Juergen Gross (3):
  xen/events: access last_priority and last_vcpu_id together
  xen/evtchn: rework per event channel lock
  xen/evtchn: revert 52e1fc47abc3a0123

 xen/arch/x86/irq.c          |   6 +-
 xen/arch/x86/pv/shim.c      |   9 ++-
 xen/common/event_channel.c  | 144 +++++++++++++++++++-----------------
 xen/common/event_fifo.c     |  25 +++++--
 xen/include/xen/event.h     |  29 ++++++--
 xen/include/xen/sched.h     |   8 +-
 xen/include/xsm/xsm.h       |   1 -
 xen/xsm/flask/avc.c         |  78 ++-----------------
 xen/xsm/flask/hooks.c       |  10 ---
 xen/xsm/flask/include/avc.h |   2 -
 10 files changed, 139 insertions(+), 173 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v6 1/3] xen/events: access last_priority and last_vcpu_id together
  2020-11-09 16:38 [PATCH v6 0/3] XSA-343 followup patches Juergen Gross
@ 2020-11-09 16:38 ` Juergen Gross
  2020-11-09 16:38 ` [PATCH v6 2/3] xen/evtchn: rework per event channel lock Juergen Gross
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Juergen Gross @ 2020-11-09 16:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Julien Grall

The queue for a fifo event is depending on the vcpu_id and the
priority of the event. When sending an event it might happen the
event needs to change queues and the old queue needs to be kept for
keeping the links between queue elements intact. For this purpose
the event channel contains last_priority and last_vcpu_id values
elements for being able to identify the old queue.

In order to avoid races always access last_priority and last_vcpu_id
with a single atomic operation avoiding any inconsistencies.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
---
 xen/common/event_fifo.c | 25 +++++++++++++++++++------
 xen/include/xen/sched.h |  3 +--
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index c6e58d2a1a..79090c04ca 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -42,6 +42,14 @@ struct evtchn_fifo_domain {
     unsigned int num_evtchns;
 };
 
+union evtchn_fifo_lastq {
+    uint32_t raw;
+    struct {
+        uint8_t last_priority;
+        uint16_t last_vcpu_id;
+    };
+};
+
 static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d,
                                                        unsigned int port)
 {
@@ -86,16 +94,18 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
     struct vcpu *v;
     struct evtchn_fifo_queue *q, *old_q;
     unsigned int try;
+    union evtchn_fifo_lastq lastq;
 
     for ( try = 0; try < 3; try++ )
     {
-        v = d->vcpu[evtchn->last_vcpu_id];
-        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
+        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+        v = d->vcpu[lastq.last_vcpu_id];
+        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
 
         spin_lock_irqsave(&old_q->lock, *flags);
 
-        v = d->vcpu[evtchn->last_vcpu_id];
-        q = &v->evtchn_fifo->queue[evtchn->last_priority];
+        v = d->vcpu[lastq.last_vcpu_id];
+        q = &v->evtchn_fifo->queue[lastq.last_priority];
 
         if ( old_q == q )
             return old_q;
@@ -246,8 +256,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         /* Moved to a different queue? */
         if ( old_q != q )
         {
-            evtchn->last_vcpu_id = v->vcpu_id;
-            evtchn->last_priority = q->priority;
+            union evtchn_fifo_lastq lastq = { };
+
+            lastq.last_vcpu_id = v->vcpu_id;
+            lastq.last_priority = q->priority;
+            write_atomic(&evtchn->fifo_lastq, lastq.raw);
 
             spin_unlock_irqrestore(&old_q->lock, flags);
             spin_lock_irqsave(&q->lock, flags);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d8ed83f869..a298ff4df8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -114,8 +114,7 @@ struct evtchn
         u16 virq;      /* state == ECS_VIRQ */
     } u;
     u8 priority;
-    u8 last_priority;
-    u16 last_vcpu_id;
+    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
 #ifdef CONFIG_XSM
     union {
 #ifdef XSM_NEED_GENERIC_EVTCHN_SSID
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 16:38 [PATCH v6 0/3] XSA-343 followup patches Juergen Gross
  2020-11-09 16:38 ` [PATCH v6 1/3] xen/events: access last_priority and last_vcpu_id together Juergen Gross
@ 2020-11-09 16:38 ` Juergen Gross
  2020-11-09 16:48   ` Jan Beulich
                     ` (2 more replies)
  2020-11-09 16:38 ` [PATCH v6 3/3] xen/evtchn: revert 52e1fc47abc3a0123 Juergen Gross
  2020-11-17 18:13 ` [PATCH v6 0/3] XSA-343 followup patches Julien Grall
  3 siblings, 3 replies; 21+ messages in thread
From: Juergen Gross @ 2020-11-09 16:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

Currently the lock for a single event channel needs to be taken with
interrupts off, which causes deadlocks in some cases.

Rework the per event channel lock to be non-blocking for the case of
sending an event and removing the need for disabling interrupts for
taking the lock.

The lock is needed for avoiding races between event channel state
changes (creation, closing, binding) against normal operations (set
pending, [un]masking, priority changes).

Use a rwlock, but with some restrictions:

- Changing the state of an event channel (creation, closing, binding)
  needs to use write_lock(), with ASSERT()ing that the lock is taken as
  writer only when the state of the event channel is either before or
  after the locked region appropriate (either free or unbound).

- Sending an event needs to use read_trylock() mostly, in case of not
  obtaining the lock the operation is omitted. This is needed as
  sending an event can happen with interrupts off (at least in some
  cases).

- Dumping the event channel state for debug purposes is using
  read_trylock(), too, in order to avoid blocking in case the lock is
  taken as writer for a long time.

- All other cases can use read_lock().

Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V4:
- switch to rwlock
- add ASSERT() to verify correct write_lock() usage

V3:
- corrected a copy-and-paste error (Jan Beulich)
- corrected unlocking in two cases (Jan Beulich)
- renamed evtchn_read_trylock() (Jan Beulich)
- added some comments and an ASSERT() for evtchn_write_lock()
- set EVENT_WRITE_LOCK_INC to INT_MIN

V2:
- added needed barriers

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/irq.c         |   6 +-
 xen/arch/x86/pv/shim.c     |   9 +--
 xen/common/event_channel.c | 138 +++++++++++++++++++++----------------
 xen/include/xen/event.h    |  29 ++++++--
 xen/include/xen/sched.h    |   5 +-
 5 files changed, 112 insertions(+), 75 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 93c4fb9a79..8d1f9a9fc6 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2495,14 +2495,12 @@ static void dump_irqs(unsigned char key)
                 pirq = domain_irq_to_pirq(d, irq);
                 info = pirq_info(d, pirq);
                 evtchn = evtchn_from_port(d, info->evtchn);
-                local_irq_disable();
-                if ( spin_trylock(&evtchn->lock) )
+                if ( evtchn_read_trylock(evtchn) )
                 {
                     pending = evtchn_is_pending(d, evtchn);
                     masked = evtchn_is_masked(d, evtchn);
-                    spin_unlock(&evtchn->lock);
+                    evtchn_read_unlock(evtchn);
                 }
-                local_irq_enable();
                 printk("d%d:%3d(%c%c%c)%c",
                        d->domain_id, pirq, "-P?"[pending],
                        "-M?"[masked], info->masked ? 'M' : '-',
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 9aef7a860a..b4e83e0778 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -660,11 +660,12 @@ void pv_shim_inject_evtchn(unsigned int port)
     if ( port_is_valid(guest, port) )
     {
         struct evtchn *chn = evtchn_from_port(guest, port);
-        unsigned long flags;
 
-        spin_lock_irqsave(&chn->lock, flags);
-        evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
-        spin_unlock_irqrestore(&chn->lock, flags);
+        if ( evtchn_read_trylock(chn) )
+        {
+            evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
+            evtchn_read_unlock(chn);
+        }
     }
 }
 
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index cd4a2c0501..43e3520df6 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -50,6 +50,29 @@
 
 #define consumer_is_xen(e) (!!(e)->xen_consumer)
 
+/*
+ * Lock an event channel exclusively. This is allowed only when the channel is
+ * free or unbound either when taking or when releasing the lock, as any
+ * concurrent operation on the event channel using evtchn_read_trylock() will
+ * just assume the event channel is free or unbound at the moment when the
+ * evtchn_read_trylock() returns false.
+ */
+static inline void evtchn_write_lock(struct evtchn *evtchn)
+{
+    write_lock(&evtchn->lock);
+
+    evtchn->old_state = evtchn->state;
+}
+
+static inline void evtchn_write_unlock(struct evtchn *evtchn)
+{
+    /* Enforce lock discipline. */
+    ASSERT(evtchn->old_state == ECS_FREE || evtchn->old_state == ECS_UNBOUND ||
+           evtchn->state == ECS_FREE || evtchn->state == ECS_UNBOUND);
+
+    write_unlock(&evtchn->lock);
+}
+
 /*
  * The function alloc_unbound_xen_event_channel() allows an arbitrary
  * notifier function to be specified. However, very few unique functions
@@ -133,7 +156,7 @@ static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port)
             return NULL;
         }
         chn[i].port = port + i;
-        spin_lock_init(&chn[i].lock);
+        rwlock_init(&chn[i].lock);
     }
     return chn;
 }
@@ -255,7 +278,6 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
     int            port;
     domid_t        dom = alloc->dom;
     long           rc;
-    unsigned long  flags;
 
     d = rcu_lock_domain_by_any_id(dom);
     if ( d == NULL )
@@ -271,14 +293,14 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
     if ( rc )
         goto out;
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state = ECS_UNBOUND;
     if ( (chn->u.unbound.remote_domid = alloc->remote_dom) == DOMID_SELF )
         chn->u.unbound.remote_domid = current->domain->domain_id;
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     alloc->port = port;
 
@@ -291,32 +313,26 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 }
 
 
-static unsigned long double_evtchn_lock(struct evtchn *lchn,
-                                        struct evtchn *rchn)
+static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
 {
-    unsigned long flags;
-
     if ( lchn <= rchn )
     {
-        spin_lock_irqsave(&lchn->lock, flags);
+        evtchn_write_lock(lchn);
         if ( lchn != rchn )
-            spin_lock(&rchn->lock);
+            evtchn_write_lock(rchn);
     }
     else
     {
-        spin_lock_irqsave(&rchn->lock, flags);
-        spin_lock(&lchn->lock);
+        evtchn_write_lock(rchn);
+        evtchn_write_lock(lchn);
     }
-
-    return flags;
 }
 
-static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn,
-                                 unsigned long flags)
+static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn)
 {
     if ( lchn != rchn )
-        spin_unlock(&lchn->lock);
-    spin_unlock_irqrestore(&rchn->lock, flags);
+        evtchn_write_unlock(lchn);
+    evtchn_write_unlock(rchn);
 }
 
 static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
@@ -326,7 +342,6 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     int            lport, rport = bind->remote_port;
     domid_t        rdom = bind->remote_dom;
     long           rc;
-    unsigned long  flags;
 
     if ( rdom == DOMID_SELF )
         rdom = current->domain->domain_id;
@@ -362,7 +377,7 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     if ( rc )
         goto out;
 
-    flags = double_evtchn_lock(lchn, rchn);
+    double_evtchn_lock(lchn, rchn);
 
     lchn->u.interdomain.remote_dom  = rd;
     lchn->u.interdomain.remote_port = rport;
@@ -379,7 +394,7 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
      */
     evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
 
-    double_evtchn_unlock(lchn, rchn, flags);
+    double_evtchn_unlock(lchn, rchn);
 
     bind->local_port = lport;
 
@@ -402,7 +417,6 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
     struct domain *d = current->domain;
     int            virq = bind->virq, vcpu = bind->vcpu;
     int            rc = 0;
-    unsigned long  flags;
 
     if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
         return -EINVAL;
@@ -440,14 +454,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
 
     chn = evtchn_from_port(d, port);
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state          = ECS_VIRQ;
     chn->notify_vcpu_id = vcpu;
     chn->u.virq         = virq;
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     v->virq_to_evtchn[virq] = bind->port = port;
 
@@ -464,7 +478,6 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
     struct domain *d = current->domain;
     int            port, vcpu = bind->vcpu;
     long           rc = 0;
-    unsigned long  flags;
 
     if ( domain_vcpu(d, vcpu) == NULL )
         return -ENOENT;
@@ -476,13 +489,13 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
 
     chn = evtchn_from_port(d, port);
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state          = ECS_IPI;
     chn->notify_vcpu_id = vcpu;
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     bind->port = port;
 
@@ -526,7 +539,6 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
     struct pirq   *info;
     int            port = 0, pirq = bind->pirq;
     long           rc;
-    unsigned long  flags;
 
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
@@ -559,14 +571,14 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
         goto out;
     }
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state  = ECS_PIRQ;
     chn->u.pirq.irq = pirq;
     link_pirq_port(port, chn, v);
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     bind->port = port;
 
@@ -587,7 +599,6 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
     struct evtchn *chn1, *chn2;
     int            port2;
     long           rc = 0;
-    unsigned long  flags;
 
  again:
     spin_lock(&d1->event_lock);
@@ -688,14 +699,14 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
         BUG_ON(chn2->state != ECS_INTERDOMAIN);
         BUG_ON(chn2->u.interdomain.remote_dom != d1);
 
-        flags = double_evtchn_lock(chn1, chn2);
+        double_evtchn_lock(chn1, chn2);
 
         evtchn_free(d1, chn1);
 
         chn2->state = ECS_UNBOUND;
         chn2->u.unbound.remote_domid = d1->domain_id;
 
-        double_evtchn_unlock(chn1, chn2, flags);
+        double_evtchn_unlock(chn1, chn2);
 
         goto out;
 
@@ -703,9 +714,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
         BUG();
     }
 
-    spin_lock_irqsave(&chn1->lock, flags);
+    evtchn_write_lock(chn1);
     evtchn_free(d1, chn1);
-    spin_unlock_irqrestore(&chn1->lock, flags);
+    evtchn_write_unlock(chn1);
 
  out:
     if ( d2 != NULL )
@@ -725,7 +736,6 @@ int evtchn_send(struct domain *ld, unsigned int lport)
     struct evtchn *lchn, *rchn;
     struct domain *rd;
     int            rport, ret = 0;
-    unsigned long  flags;
 
     if ( !port_is_valid(ld, lport) )
         return -EINVAL;
@@ -738,7 +748,7 @@ int evtchn_send(struct domain *ld, unsigned int lport)
 
     lchn = evtchn_from_port(ld, lport);
 
-    spin_lock_irqsave(&lchn->lock, flags);
+    evtchn_read_lock(lchn);
 
     /* Guest cannot send via a Xen-attached event channel. */
     if ( unlikely(consumer_is_xen(lchn)) )
@@ -773,7 +783,7 @@ int evtchn_send(struct domain *ld, unsigned int lport)
     }
 
 out:
-    spin_unlock_irqrestore(&lchn->lock, flags);
+    evtchn_read_unlock(lchn);
 
     return ret;
 }
@@ -806,9 +816,11 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
 
     d = v->domain;
     chn = evtchn_from_port(d, port);
-    spin_lock(&chn->lock);
-    evtchn_port_set_pending(d, v->vcpu_id, chn);
-    spin_unlock(&chn->lock);
+    if ( evtchn_read_trylock(chn) )
+    {
+        evtchn_port_set_pending(d, v->vcpu_id, chn);
+        evtchn_read_unlock(chn);
+    }
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -837,9 +849,11 @@ void send_guest_global_virq(struct domain *d, uint32_t virq)
         goto out;
 
     chn = evtchn_from_port(d, port);
-    spin_lock(&chn->lock);
-    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
-    spin_unlock(&chn->lock);
+    if ( evtchn_read_trylock(chn) )
+    {
+        evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+        evtchn_read_unlock(chn);
+    }
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -849,7 +863,6 @@ void send_guest_pirq(struct domain *d, const struct pirq *pirq)
 {
     int port;
     struct evtchn *chn;
-    unsigned long flags;
 
     /*
      * PV guests: It should not be possible to race with __evtchn_close(). The
@@ -864,9 +877,11 @@ void send_guest_pirq(struct domain *d, const struct pirq *pirq)
     }
 
     chn = evtchn_from_port(d, port);
-    spin_lock_irqsave(&chn->lock, flags);
-    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
-    spin_unlock_irqrestore(&chn->lock, flags);
+    if ( evtchn_read_trylock(chn) )
+    {
+        evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+        evtchn_read_unlock(chn);
+    }
 }
 
 static struct domain *global_virq_handlers[NR_VIRQS] __read_mostly;
@@ -1068,15 +1083,17 @@ int evtchn_unmask(unsigned int port)
 {
     struct domain *d = current->domain;
     struct evtchn *evtchn;
-    unsigned long flags;
 
     if ( unlikely(!port_is_valid(d, port)) )
         return -EINVAL;
 
     evtchn = evtchn_from_port(d, port);
-    spin_lock_irqsave(&evtchn->lock, flags);
+
+    evtchn_read_lock(evtchn);
+
     evtchn_port_unmask(d, evtchn);
-    spin_unlock_irqrestore(&evtchn->lock, flags);
+
+    evtchn_read_unlock(evtchn);
 
     return 0;
 }
@@ -1156,15 +1173,17 @@ static long evtchn_set_priority(const struct evtchn_set_priority *set_priority)
     unsigned int port = set_priority->port;
     struct evtchn *chn;
     long ret;
-    unsigned long flags;
 
     if ( !port_is_valid(d, port) )
         return -EINVAL;
 
     chn = evtchn_from_port(d, port);
-    spin_lock_irqsave(&chn->lock, flags);
+
+    evtchn_read_lock(chn);
+
     ret = evtchn_port_set_priority(d, chn, set_priority->priority);
-    spin_unlock_irqrestore(&chn->lock, flags);
+
+    evtchn_read_unlock(chn);
 
     return ret;
 }
@@ -1332,7 +1351,6 @@ int alloc_unbound_xen_event_channel(
 {
     struct evtchn *chn;
     int            port, rc;
-    unsigned long  flags;
 
     spin_lock(&ld->event_lock);
 
@@ -1345,14 +1363,14 @@ int alloc_unbound_xen_event_channel(
     if ( rc )
         goto out;
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state = ECS_UNBOUND;
     chn->xen_consumer = get_xen_consumer(notification_fn);
     chn->notify_vcpu_id = lvcpu;
     chn->u.unbound.remote_domid = remote_domid;
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     /*
      * Increment ->xen_evtchns /after/ ->active_evtchns. No explicit
@@ -1388,7 +1406,6 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
 {
     struct evtchn *lchn, *rchn;
     struct domain *rd;
-    unsigned long flags;
 
     if ( !port_is_valid(ld, lport) )
     {
@@ -1403,7 +1420,8 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
 
     lchn = evtchn_from_port(ld, lport);
 
-    spin_lock_irqsave(&lchn->lock, flags);
+    if ( !evtchn_read_trylock(lchn) )
+        return;
 
     if ( likely(lchn->state == ECS_INTERDOMAIN) )
     {
@@ -1413,7 +1431,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
         evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
     }
 
-    spin_unlock_irqrestore(&lchn->lock, flags);
+    evtchn_read_unlock(lchn);
 }
 
 void evtchn_check_pollers(struct domain *d, unsigned int port)
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 2ed4be78f6..a0a85cdda8 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -105,6 +105,21 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
+static inline void evtchn_read_lock(struct evtchn *evtchn)
+{
+    read_lock(&evtchn->lock);
+}
+
+static inline bool evtchn_read_trylock(struct evtchn *evtchn)
+{
+    return read_trylock(&evtchn->lock);
+}
+
+static inline void evtchn_read_unlock(struct evtchn *evtchn)
+{
+    read_unlock(&evtchn->lock);
+}
+
 static inline bool_t port_is_valid(struct domain *d, unsigned int p)
 {
     if ( p >= read_atomic(&d->valid_evtchns) )
@@ -235,11 +250,12 @@ static inline bool evtchn_port_is_masked(struct domain *d, evtchn_port_t port)
 {
     struct evtchn *evtchn = evtchn_from_port(d, port);
     bool rc;
-    unsigned long flags;
 
-    spin_lock_irqsave(&evtchn->lock, flags);
+    evtchn_read_lock(evtchn);
+
     rc = evtchn_is_masked(d, evtchn);
-    spin_unlock_irqrestore(&evtchn->lock, flags);
+
+    evtchn_read_unlock(evtchn);
 
     return rc;
 }
@@ -252,12 +268,13 @@ static inline int evtchn_port_poll(struct domain *d, evtchn_port_t port)
     if ( port_is_valid(d, port) )
     {
         struct evtchn *evtchn = evtchn_from_port(d, port);
-        unsigned long flags;
 
-        spin_lock_irqsave(&evtchn->lock, flags);
+        evtchn_read_lock(evtchn);
+
         if ( evtchn_usable(evtchn) )
             rc = evtchn_is_pending(d, evtchn);
-        spin_unlock_irqrestore(&evtchn->lock, flags);
+
+        evtchn_read_unlock(evtchn);
     }
 
     return rc;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index a298ff4df8..66d8f058be 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -85,7 +85,7 @@ extern domid_t hardware_domid;
 
 struct evtchn
 {
-    spinlock_t lock;
+    rwlock_t lock;
 #define ECS_FREE         0 /* Channel is available for use.                  */
 #define ECS_RESERVED     1 /* Channel is reserved.                           */
 #define ECS_UNBOUND      2 /* Channel is waiting to bind to a remote domain. */
@@ -114,6 +114,9 @@ struct evtchn
         u16 virq;      /* state == ECS_VIRQ */
     } u;
     u8 priority;
+#ifndef NDEBUG
+    u8 old_state;      /* State when taking lock in write mode. */
+#endif
     u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
 #ifdef CONFIG_XSM
     union {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 3/3] xen/evtchn: revert 52e1fc47abc3a0123
  2020-11-09 16:38 [PATCH v6 0/3] XSA-343 followup patches Juergen Gross
  2020-11-09 16:38 ` [PATCH v6 1/3] xen/events: access last_priority and last_vcpu_id together Juergen Gross
  2020-11-09 16:38 ` [PATCH v6 2/3] xen/evtchn: rework per event channel lock Juergen Gross
@ 2020-11-09 16:38 ` Juergen Gross
  2020-11-09 16:40   ` Jan Beulich
  2020-11-17 18:13 ` [PATCH v6 0/3] XSA-343 followup patches Julien Grall
  3 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2020-11-09 16:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Daniel De Graaf

With the event channel lock no longer disabling interrupts commit
52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on send path") can
be reverted again.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/event_channel.c  |  6 ---
 xen/include/xsm/xsm.h       |  1 -
 xen/xsm/flask/avc.c         | 78 ++++---------------------------------
 xen/xsm/flask/hooks.c       | 10 -----
 xen/xsm/flask/include/avc.h |  2 -
 5 files changed, 7 insertions(+), 90 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 43e3520df6..eacd96b92f 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -740,12 +740,6 @@ int evtchn_send(struct domain *ld, unsigned int lport)
     if ( !port_is_valid(ld, lport) )
         return -EINVAL;
 
-    /*
-     * As the call further down needs to avoid allocations (due to running
-     * with IRQs off), give XSM a chance to pre-allocate if needed.
-     */
-    xsm_evtchn_send(XSM_HOOK, ld, NULL);
-
     lchn = evtchn_from_port(ld, lport);
 
     evtchn_read_lock(lchn);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 358ec13ba8..7bd03d8817 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -59,7 +59,6 @@ struct xsm_operations {
     int (*evtchn_interdomain) (struct domain *d1, struct evtchn *chn1,
                                         struct domain *d2, struct evtchn *chn2);
     void (*evtchn_close_post) (struct evtchn *chn);
-    /* Note: Next hook may be called with 'chn' set to NULL. See call site. */
     int (*evtchn_send) (struct domain *d, struct evtchn *chn);
     int (*evtchn_status) (struct domain *d, struct evtchn *chn);
     int (*evtchn_reset) (struct domain *d1, struct domain *d2);
diff --git a/xen/xsm/flask/avc.c b/xen/xsm/flask/avc.c
index 2dfa1f4295..87ea38b7a0 100644
--- a/xen/xsm/flask/avc.c
+++ b/xen/xsm/flask/avc.c
@@ -24,9 +24,7 @@
 #include <xen/prefetch.h>
 #include <xen/kernel.h>
 #include <xen/sched.h>
-#include <xen/cpu.h>
 #include <xen/init.h>
-#include <xen/percpu.h>
 #include <xen/rcupdate.h>
 #include <asm/atomic.h>
 #include <asm/current.h>
@@ -343,79 +341,17 @@ static inline int avc_reclaim_node(void)
     return ecx;
 }
 
-static struct avc_node *new_node(void)
-{
-    struct avc_node *node = xzalloc(struct avc_node);
-
-    if ( node )
-    {
-        INIT_RCU_HEAD(&node->rhead);
-        INIT_HLIST_NODE(&node->list);
-        avc_cache_stats_incr(allocations);
-    }
-
-    return node;
-}
-
-/*
- * avc_has_perm_noaudit() may consume up to two nodes, which we may not be
- * able to obtain from the allocator at that point. Since the is merely
- * about caching earlier decisions, allow for (just) one pre-allocated node.
- */
-static DEFINE_PER_CPU(struct avc_node *, prealloc_node);
-
-void avc_prealloc(void)
-{
-    struct avc_node **prealloc = &this_cpu(prealloc_node);
-
-    if ( !*prealloc )
-        *prealloc = new_node();
-}
-
-static int cpu_callback(struct notifier_block *nfb, unsigned long action,
-                        void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    struct avc_node **prealloc = &per_cpu(prealloc_node, cpu);
-
-    if ( action == CPU_DEAD && *prealloc )
-    {
-        xfree(*prealloc);
-        *prealloc = NULL;
-        avc_cache_stats_incr(frees);
-    }
-
-    return NOTIFY_DONE;
-}
-
-static struct notifier_block cpu_nfb = {
-    .notifier_call = cpu_callback,
-    .priority = 99
-};
-
-static int __init cpu_nfb_init(void)
-{
-    register_cpu_notifier(&cpu_nfb);
-    return 0;
-}
-__initcall(cpu_nfb_init);
-
 static struct avc_node *avc_alloc_node(void)
 {
-    struct avc_node *node, **prealloc = &this_cpu(prealloc_node);
+    struct avc_node *node;
 
-    node = *prealloc;
-    *prealloc = NULL;
+    node = xzalloc(struct avc_node);
+    if (!node)
+        goto out;
 
-    if ( !node )
-    {
-        /* Must not call xmalloc() & Co with IRQs off. */
-        if ( !local_irq_is_enabled() )
-            goto out;
-        node = new_node();
-        if ( !node )
-            goto out;
-    }
+    INIT_RCU_HEAD(&node->rhead);
+    INIT_HLIST_NODE(&node->list);
+    avc_cache_stats_incr(allocations);
 
     atomic_inc(&avc_cache.active_nodes);
     if ( atomic_read(&avc_cache.active_nodes) > avc_cache_threshold )
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index de050cc9fe..19b0d9e3eb 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -281,16 +281,6 @@ static int flask_evtchn_send(struct domain *d, struct evtchn *chn)
 {
     int rc;
 
-    /*
-     * When called with non-NULL chn, memory allocation may not be permitted.
-     * Allow AVC to preallocate nodes as necessary upon early notification.
-     */
-    if ( !chn )
-    {
-        avc_prealloc();
-        return 0;
-    }
-
     switch ( chn->state )
     {
     case ECS_INTERDOMAIN:
diff --git a/xen/xsm/flask/include/avc.h b/xen/xsm/flask/include/avc.h
index 722919b762..c14bd07a2b 100644
--- a/xen/xsm/flask/include/avc.h
+++ b/xen/xsm/flask/include/avc.h
@@ -91,8 +91,6 @@ int avc_has_perm_noaudit(u32 ssid, u32 tsid, u16 tclass, u32 requested,
 int avc_has_perm(u32 ssid, u32 tsid, u16 tclass, u32 requested,
                                              struct avc_audit_data *auditdata);
 
-void avc_prealloc(void);
-
 /* Exported to selinuxfs */
 struct xen_flask_hash_stats;
 int avc_get_hash_stats(struct xen_flask_hash_stats *arg);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/3] xen/evtchn: revert 52e1fc47abc3a0123
  2020-11-09 16:38 ` [PATCH v6 3/3] xen/evtchn: revert 52e1fc47abc3a0123 Juergen Gross
@ 2020-11-09 16:40   ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2020-11-09 16:40 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Wei Liu, Daniel De Graaf, xen-devel

On 09.11.2020 17:38, Juergen Gross wrote:
> With the event channel lock no longer disabling interrupts commit
> 52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on send path") can
> be reverted again.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 16:38 ` [PATCH v6 2/3] xen/evtchn: rework per event channel lock Juergen Gross
@ 2020-11-09 16:48   ` Jan Beulich
  2020-11-09 17:46     ` Julien Grall
  2020-11-09 17:44   ` Julien Grall
  2020-11-18 13:19   ` Jan Beulich
  2 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2020-11-09 16:48 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, xen-devel

On 09.11.2020 17:38, Juergen Gross wrote:
> Currently the lock for a single event channel needs to be taken with
> interrupts off, which causes deadlocks in some cases.
> 
> Rework the per event channel lock to be non-blocking for the case of
> sending an event and removing the need for disabling interrupts for
> taking the lock.
> 
> The lock is needed for avoiding races between event channel state
> changes (creation, closing, binding) against normal operations (set
> pending, [un]masking, priority changes).
> 
> Use a rwlock, but with some restrictions:
> 
> - Changing the state of an event channel (creation, closing, binding)
>   needs to use write_lock(), with ASSERT()ing that the lock is taken as
>   writer only when the state of the event channel is either before or
>   after the locked region appropriate (either free or unbound).
> 
> - Sending an event needs to use read_trylock() mostly, in case of not
>   obtaining the lock the operation is omitted. This is needed as
>   sending an event can happen with interrupts off (at least in some
>   cases).
> 
> - Dumping the event channel state for debug purposes is using
>   read_trylock(), too, in order to avoid blocking in case the lock is
>   taken as writer for a long time.
> 
> - All other cases can use read_lock().
> 
> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V4:
> - switch to rwlock
> - add ASSERT() to verify correct write_lock() usage
> 
> V3:
> - corrected a copy-and-paste error (Jan Beulich)
> - corrected unlocking in two cases (Jan Beulich)
> - renamed evtchn_read_trylock() (Jan Beulich)
> - added some comments and an ASSERT() for evtchn_write_lock()
> - set EVENT_WRITE_LOCK_INC to INT_MIN
> 
> V2:
> - added needed barriers
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

I'll give it overnight for others to possibly comment, but I'd
like to get this in tomorrow if at all possible.

I also think this will want backporting beyond just the fully
maintained branches.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 16:38 ` [PATCH v6 2/3] xen/evtchn: rework per event channel lock Juergen Gross
  2020-11-09 16:48   ` Jan Beulich
@ 2020-11-09 17:44   ` Julien Grall
  2020-11-18 13:19   ` Jan Beulich
  2 siblings, 0 replies; 21+ messages in thread
From: Julien Grall @ 2020-11-09 17:44 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini

Hi Juergen,

On 09/11/2020 16:38, Juergen Gross wrote:
> Currently the lock for a single event channel needs to be taken with
> interrupts off, which causes deadlocks in some cases.
> 
> Rework the per event channel lock to be non-blocking for the case of
> sending an event and removing the need for disabling interrupts for
> taking the lock.
> 
> The lock is needed for avoiding races between event channel state
> changes (creation, closing, binding) against normal operations (set
> pending, [un]masking, priority changes).
> 
> Use a rwlock, but with some restrictions:
> 
> - Changing the state of an event channel (creation, closing, binding)
>    needs to use write_lock(), with ASSERT()ing that the lock is taken as
>    writer only when the state of the event channel is either before or
>    after the locked region appropriate (either free or unbound).
> 
> - Sending an event needs to use read_trylock() mostly, in case of not
>    obtaining the lock the operation is omitted. This is needed as
>    sending an event can happen with interrupts off (at least in some
>    cases).
> 
> - Dumping the event channel state for debug purposes is using
>    read_trylock(), too, in order to avoid blocking in case the lock is
>    taken as writer for a long time.
> 
> - All other cases can use read_lock().
> 
> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 16:48   ` Jan Beulich
@ 2020-11-09 17:46     ` Julien Grall
  2020-11-10  6:13       ` Jürgen Groß
  2020-11-10  7:39       ` Jan Beulich
  0 siblings, 2 replies; 21+ messages in thread
From: Julien Grall @ 2020-11-09 17:46 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross, Roger Pau Monné, Andrew Cooper
  Cc: Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini, xen-devel

Hi,

On 09/11/2020 16:48, Jan Beulich wrote:
> On 09.11.2020 17:38, Juergen Gross wrote:
>> Currently the lock for a single event channel needs to be taken with
>> interrupts off, which causes deadlocks in some cases.
>>
>> Rework the per event channel lock to be non-blocking for the case of
>> sending an event and removing the need for disabling interrupts for
>> taking the lock.
>>
>> The lock is needed for avoiding races between event channel state
>> changes (creation, closing, binding) against normal operations (set
>> pending, [un]masking, priority changes).
>>
>> Use a rwlock, but with some restrictions:
>>
>> - Changing the state of an event channel (creation, closing, binding)
>>    needs to use write_lock(), with ASSERT()ing that the lock is taken as
>>    writer only when the state of the event channel is either before or
>>    after the locked region appropriate (either free or unbound).
>>
>> - Sending an event needs to use read_trylock() mostly, in case of not
>>    obtaining the lock the operation is omitted. This is needed as
>>    sending an event can happen with interrupts off (at least in some
>>    cases).
>>
>> - Dumping the event channel state for debug purposes is using
>>    read_trylock(), too, in order to avoid blocking in case the lock is
>>    taken as writer for a long time.
>>
>> - All other cases can use read_lock().
>>
>> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V4:
>> - switch to rwlock
>> - add ASSERT() to verify correct write_lock() usage
>>
>> V3:
>> - corrected a copy-and-paste error (Jan Beulich)
>> - corrected unlocking in two cases (Jan Beulich)
>> - renamed evtchn_read_trylock() (Jan Beulich)
>> - added some comments and an ASSERT() for evtchn_write_lock()
>> - set EVENT_WRITE_LOCK_INC to INT_MIN
>>
>> V2:
>> - added needed barriers
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> I'll give it overnight for others to possibly comment, but I'd
> like to get this in tomorrow if at all possible.

IIRC, Citrix originally reported the issues. I would like to give them 
an opportunity to test the patches and confirm this effectively fixed 
there issues.

@Roger, @Andrew, could you give a try to confirm this unblock the vm 
event issue?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 17:46     ` Julien Grall
@ 2020-11-10  6:13       ` Jürgen Groß
  2020-11-10  7:39       ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Jürgen Groß @ 2020-11-10  6:13 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich, Roger Pau Monné, Andrew Cooper
  Cc: Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2768 bytes --]

On 09.11.20 18:46, Julien Grall wrote:
> Hi,
> 
> On 09/11/2020 16:48, Jan Beulich wrote:
>> On 09.11.2020 17:38, Juergen Gross wrote:
>>> Currently the lock for a single event channel needs to be taken with
>>> interrupts off, which causes deadlocks in some cases.
>>>
>>> Rework the per event channel lock to be non-blocking for the case of
>>> sending an event and removing the need for disabling interrupts for
>>> taking the lock.
>>>
>>> The lock is needed for avoiding races between event channel state
>>> changes (creation, closing, binding) against normal operations (set
>>> pending, [un]masking, priority changes).
>>>
>>> Use a rwlock, but with some restrictions:
>>>
>>> - Changing the state of an event channel (creation, closing, binding)
>>>    needs to use write_lock(), with ASSERT()ing that the lock is taken as
>>>    writer only when the state of the event channel is either before or
>>>    after the locked region appropriate (either free or unbound).
>>>
>>> - Sending an event needs to use read_trylock() mostly, in case of not
>>>    obtaining the lock the operation is omitted. This is needed as
>>>    sending an event can happen with interrupts off (at least in some
>>>    cases).
>>>
>>> - Dumping the event channel state for debug purposes is using
>>>    read_trylock(), too, in order to avoid blocking in case the lock is
>>>    taken as writer for a long time.
>>>
>>> - All other cases can use read_lock().
>>>
>>> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> V4:
>>> - switch to rwlock
>>> - add ASSERT() to verify correct write_lock() usage
>>>
>>> V3:
>>> - corrected a copy-and-paste error (Jan Beulich)
>>> - corrected unlocking in two cases (Jan Beulich)
>>> - renamed evtchn_read_trylock() (Jan Beulich)
>>> - added some comments and an ASSERT() for evtchn_write_lock()
>>> - set EVENT_WRITE_LOCK_INC to INT_MIN
>>>
>>> V2:
>>> - added needed barriers
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> I'll give it overnight for others to possibly comment, but I'd
>> like to get this in tomorrow if at all possible.
> 
> IIRC, Citrix originally reported the issues. I would like to give them 
> an opportunity to test the patches and confirm this effectively fixed 
> there issues.
> 
> @Roger, @Andrew, could you give a try to confirm this unblock the vm 
> event issue?

It should be noted that this series ought to fix current unstable
failures of XSM tests, as those are triggering the advanced lock checks
of rwlocks due to the per-event channel lock disabling interrupts.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 17:46     ` Julien Grall
  2020-11-10  6:13       ` Jürgen Groß
@ 2020-11-10  7:39       ` Jan Beulich
  2020-11-10  9:47         ` Julien Grall
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2020-11-10  7:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini,
	xen-devel, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

On 09.11.2020 18:46, Julien Grall wrote:
> Hi,
> 
> On 09/11/2020 16:48, Jan Beulich wrote:
>> On 09.11.2020 17:38, Juergen Gross wrote:
>>> Currently the lock for a single event channel needs to be taken with
>>> interrupts off, which causes deadlocks in some cases.
>>>
>>> Rework the per event channel lock to be non-blocking for the case of
>>> sending an event and removing the need for disabling interrupts for
>>> taking the lock.
>>>
>>> The lock is needed for avoiding races between event channel state
>>> changes (creation, closing, binding) against normal operations (set
>>> pending, [un]masking, priority changes).
>>>
>>> Use a rwlock, but with some restrictions:
>>>
>>> - Changing the state of an event channel (creation, closing, binding)
>>>    needs to use write_lock(), with ASSERT()ing that the lock is taken as
>>>    writer only when the state of the event channel is either before or
>>>    after the locked region appropriate (either free or unbound).
>>>
>>> - Sending an event needs to use read_trylock() mostly, in case of not
>>>    obtaining the lock the operation is omitted. This is needed as
>>>    sending an event can happen with interrupts off (at least in some
>>>    cases).
>>>
>>> - Dumping the event channel state for debug purposes is using
>>>    read_trylock(), too, in order to avoid blocking in case the lock is
>>>    taken as writer for a long time.
>>>
>>> - All other cases can use read_lock().
>>>
>>> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> V4:
>>> - switch to rwlock
>>> - add ASSERT() to verify correct write_lock() usage
>>>
>>> V3:
>>> - corrected a copy-and-paste error (Jan Beulich)
>>> - corrected unlocking in two cases (Jan Beulich)
>>> - renamed evtchn_read_trylock() (Jan Beulich)
>>> - added some comments and an ASSERT() for evtchn_write_lock()
>>> - set EVENT_WRITE_LOCK_INC to INT_MIN
>>>
>>> V2:
>>> - added needed barriers
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> I'll give it overnight for others to possibly comment, but I'd
>> like to get this in tomorrow if at all possible.
> 
> IIRC, Citrix originally reported the issues. I would like to give them 
> an opportunity to test the patches and confirm this effectively fixed 
> there issues.

I would consider waiting longer, but I'd like to get staging unblocked.
Just like with 52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on
send path") we can always decide to revert / fix up afterwards. The
patch here surely is an improvement, even if it was to turn out it
doesn't fix all issues yes. I'd appreciate if you would confirm you
don't object to the patch going in - I'll wait a few more hours,
perhaps until around noon.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-10  7:39       ` Jan Beulich
@ 2020-11-10  9:47         ` Julien Grall
  2020-11-10 10:10           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Julien Grall @ 2020-11-10  9:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini,
	xen-devel, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi,

On 10/11/2020 07:39, Jan Beulich wrote:
> On 09.11.2020 18:46, Julien Grall wrote:
>> Hi,
>>
>> On 09/11/2020 16:48, Jan Beulich wrote:
>>> On 09.11.2020 17:38, Juergen Gross wrote:
>>>> Currently the lock for a single event channel needs to be taken with
>>>> interrupts off, which causes deadlocks in some cases.
>>>>
>>>> Rework the per event channel lock to be non-blocking for the case of
>>>> sending an event and removing the need for disabling interrupts for
>>>> taking the lock.
>>>>
>>>> The lock is needed for avoiding races between event channel state
>>>> changes (creation, closing, binding) against normal operations (set
>>>> pending, [un]masking, priority changes).
>>>>
>>>> Use a rwlock, but with some restrictions:
>>>>
>>>> - Changing the state of an event channel (creation, closing, binding)
>>>>     needs to use write_lock(), with ASSERT()ing that the lock is taken as
>>>>     writer only when the state of the event channel is either before or
>>>>     after the locked region appropriate (either free or unbound).
>>>>
>>>> - Sending an event needs to use read_trylock() mostly, in case of not
>>>>     obtaining the lock the operation is omitted. This is needed as
>>>>     sending an event can happen with interrupts off (at least in some
>>>>     cases).
>>>>
>>>> - Dumping the event channel state for debug purposes is using
>>>>     read_trylock(), too, in order to avoid blocking in case the lock is
>>>>     taken as writer for a long time.
>>>>
>>>> - All other cases can use read_lock().
>>>>
>>>> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>> V4:
>>>> - switch to rwlock
>>>> - add ASSERT() to verify correct write_lock() usage
>>>>
>>>> V3:
>>>> - corrected a copy-and-paste error (Jan Beulich)
>>>> - corrected unlocking in two cases (Jan Beulich)
>>>> - renamed evtchn_read_trylock() (Jan Beulich)
>>>> - added some comments and an ASSERT() for evtchn_write_lock()
>>>> - set EVENT_WRITE_LOCK_INC to INT_MIN
>>>>
>>>> V2:
>>>> - added needed barriers
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> I'll give it overnight for others to possibly comment, but I'd
>>> like to get this in tomorrow if at all possible.
>>
>> IIRC, Citrix originally reported the issues. I would like to give them
>> an opportunity to test the patches and confirm this effectively fixed
>> there issues.
> 
> I would consider waiting longer, but I'd like to get staging unblocked.

So the plan is to have the patches sitting in staging for a few days and 
then backport?

> Just like with 52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on
> send path") we can always decide to revert / fix up afterwards. The
> patch here surely is an improvement, even if it was to turn out it
> doesn't fix all issues yes. I'd appreciate if you would confirm you
> don't object to the patch going in - I'll wait a few more hours,
> perhaps until around noon.
I would suggest to wait until noon and if you don't get any answer, then 
merge it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-10  9:47         ` Julien Grall
@ 2020-11-10 10:10           ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2020-11-10 10:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini,
	xen-devel, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

On 10.11.2020 10:47, Julien Grall wrote:
> Hi,
> 
> On 10/11/2020 07:39, Jan Beulich wrote:
>> On 09.11.2020 18:46, Julien Grall wrote:
>>> Hi,
>>>
>>> On 09/11/2020 16:48, Jan Beulich wrote:
>>>> On 09.11.2020 17:38, Juergen Gross wrote:
>>>>> Currently the lock for a single event channel needs to be taken with
>>>>> interrupts off, which causes deadlocks in some cases.
>>>>>
>>>>> Rework the per event channel lock to be non-blocking for the case of
>>>>> sending an event and removing the need for disabling interrupts for
>>>>> taking the lock.
>>>>>
>>>>> The lock is needed for avoiding races between event channel state
>>>>> changes (creation, closing, binding) against normal operations (set
>>>>> pending, [un]masking, priority changes).
>>>>>
>>>>> Use a rwlock, but with some restrictions:
>>>>>
>>>>> - Changing the state of an event channel (creation, closing, binding)
>>>>>     needs to use write_lock(), with ASSERT()ing that the lock is taken as
>>>>>     writer only when the state of the event channel is either before or
>>>>>     after the locked region appropriate (either free or unbound).
>>>>>
>>>>> - Sending an event needs to use read_trylock() mostly, in case of not
>>>>>     obtaining the lock the operation is omitted. This is needed as
>>>>>     sending an event can happen with interrupts off (at least in some
>>>>>     cases).
>>>>>
>>>>> - Dumping the event channel state for debug purposes is using
>>>>>     read_trylock(), too, in order to avoid blocking in case the lock is
>>>>>     taken as writer for a long time.
>>>>>
>>>>> - All other cases can use read_lock().
>>>>>
>>>>> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>> ---
>>>>> V4:
>>>>> - switch to rwlock
>>>>> - add ASSERT() to verify correct write_lock() usage
>>>>>
>>>>> V3:
>>>>> - corrected a copy-and-paste error (Jan Beulich)
>>>>> - corrected unlocking in two cases (Jan Beulich)
>>>>> - renamed evtchn_read_trylock() (Jan Beulich)
>>>>> - added some comments and an ASSERT() for evtchn_write_lock()
>>>>> - set EVENT_WRITE_LOCK_INC to INT_MIN
>>>>>
>>>>> V2:
>>>>> - added needed barriers
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> I'll give it overnight for others to possibly comment, but I'd
>>>> like to get this in tomorrow if at all possible.
>>>
>>> IIRC, Citrix originally reported the issues. I would like to give them
>>> an opportunity to test the patches and confirm this effectively fixed
>>> there issues.
>>
>> I would consider waiting longer, but I'd like to get staging unblocked.
> 
> So the plan is to have the patches sitting in staging for a few days and 
> then backport?

Right, the usual thing - wait at least until a push has happened.
Unless we learn of problems with the change, I definitely want to
have this in for 4.14.1.

>> Just like with 52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on
>> send path") we can always decide to revert / fix up afterwards. The
>> patch here surely is an improvement, even if it was to turn out it
>> doesn't fix all issues yes. I'd appreciate if you would confirm you
>> don't object to the patch going in - I'll wait a few more hours,
>> perhaps until around noon.
> I would suggest to wait until noon and if you don't get any answer, then 
> merge it.

Okay.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-09 16:38 [PATCH v6 0/3] XSA-343 followup patches Juergen Gross
                   ` (2 preceding siblings ...)
  2020-11-09 16:38 ` [PATCH v6 3/3] xen/evtchn: revert 52e1fc47abc3a0123 Juergen Gross
@ 2020-11-17 18:13 ` Julien Grall
  2020-11-18  8:22   ` Jan Beulich
  3 siblings, 1 reply; 21+ messages in thread
From: Julien Grall @ 2020-11-17 18:13 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, committers
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Daniel De Graaf

Hi,

On 09/11/2020 16:38, Juergen Gross wrote:
> The patches for XSA-343 produced some fallout, especially the event
> channel locking has shown to be problematic.
> 
> Patch 1 is targeting fifo event channels for avoiding any races for the
> case that the fifo queue has been changed for a specific event channel.
> 
> The second patch is modifying the per event channel locking scheme in
> order to avoid deadlocks and problems due to the event channel lock
> having been changed to require IRQs off by the XSA-343 patches.
> 
> Changes in V6:
> - added patch 3 (Jan Beulich)
> - switched some more read_trylock() cases to read_lock() (Jan Beulich)
> 
> Changes in V5:
> - moved evtchn_write_[un]lock() to event_channel.c (Jan Beulich)
> - used normal read_lock() in some cases (Jan Beulich)
> 
> Changes in V4:
> - switched to real rwlock
> 
> Changes in V3:
> - addressed comments
> 
> Juergen Gross (3):
>    xen/events: access last_priority and last_vcpu_id together
>    xen/evtchn: rework per event channel lock
>    xen/evtchn: revert 52e1fc47abc3a0123

While looking at the list of commits, I noticed that the first patch 
hasn't been committed. They were all acked/reviewed, so I am a bit 
puzzled why this was omitted...

I have nearly missed as I was expecting the 3 patches to be committed 
together. May I suggest that in the future we reply to the cover letter 
and mention which patches are (or not) committed?

Regarding patch #1, I will commit it tomorrow unless there are strong 
objections against.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-17 18:13 ` [PATCH v6 0/3] XSA-343 followup patches Julien Grall
@ 2020-11-18  8:22   ` Jan Beulich
  2020-11-18  8:41     ` Jürgen Groß
  2020-11-18 14:18     ` Julien Grall
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2020-11-18  8:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Daniel De Graaf, committers, xen-devel, Juergen Gross

On 17.11.2020 19:13, Julien Grall wrote:
> On 09/11/2020 16:38, Juergen Gross wrote:
>> Juergen Gross (3):
>>    xen/events: access last_priority and last_vcpu_id together
>>    xen/evtchn: rework per event channel lock
>>    xen/evtchn: revert 52e1fc47abc3a0123
> 
> While looking at the list of commits, I noticed that the first patch 
> hasn't been committed. They were all acked/reviewed, so I am a bit 
> puzzled why this was omitted...
> 
> I have nearly missed as I was expecting the 3 patches to be committed 
> together. May I suggest that in the future we reply to the cover letter 
> and mention which patches are (or not) committed?
> 
> Regarding patch #1, I will commit it tomorrow unless there are strong 
> objections against.

Without a clear outline of what would break with the present logic,
I had previously indicated I'm not convinced of the change. This
isn't a strong objection, no, but I still wouldn't want to see my
name associated with it in such a case. Furthermore I clearly view
this as not a backporting candidate, while the other two are (as I
did previously indicate). Hence the latter two patches wanted
re-basing ahead of the first one anyway, to ease the backports.

While I will accept there are different views possible here, I also
don't think sending mail in such a case is a good use of resources.
The information what was or was not committed is readily available.
Personally I view such mails as at least very close to spam.

Irrespective of the above I'm sorry for any inconvenience caused.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-18  8:22   ` Jan Beulich
@ 2020-11-18  8:41     ` Jürgen Groß
  2020-11-18 12:09       ` Jan Beulich
  2020-11-18 14:18     ` Julien Grall
  1 sibling, 1 reply; 21+ messages in thread
From: Jürgen Groß @ 2020-11-18  8:41 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Daniel De Graaf, committers, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1633 bytes --]

On 18.11.20 09:22, Jan Beulich wrote:
> On 17.11.2020 19:13, Julien Grall wrote:
>> On 09/11/2020 16:38, Juergen Gross wrote:
>>> Juergen Gross (3):
>>>     xen/events: access last_priority and last_vcpu_id together
>>>     xen/evtchn: rework per event channel lock
>>>     xen/evtchn: revert 52e1fc47abc3a0123
>>
>> While looking at the list of commits, I noticed that the first patch
>> hasn't been committed. They were all acked/reviewed, so I am a bit
>> puzzled why this was omitted...
>>
>> I have nearly missed as I was expecting the 3 patches to be committed
>> together. May I suggest that in the future we reply to the cover letter
>> and mention which patches are (or not) committed?
>>
>> Regarding patch #1, I will commit it tomorrow unless there are strong
>> objections against.
> 
> Without a clear outline of what would break with the present logic,
> I had previously indicated I'm not convinced of the change. This
> isn't a strong objection, no, but I still wouldn't want to see my
> name associated with it in such a case. Furthermore I clearly view
> this as not a backporting candidate, while the other two are (as I
> did previously indicate). Hence the latter two patches wanted
> re-basing ahead of the first one anyway, to ease the backports.

Consider an NMI during evtchn_fifo_set_pending() between updating
last_vcpu_id and last_priority, while on another cpu a concurrent
evtchn_fifo_set_pending() is being called. On that other cpu
lock_old_queue() might return a wrong queue as it will read only
the new last_vcpu_id, but not the new last_priority value.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-18  8:41     ` Jürgen Groß
@ 2020-11-18 12:09       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2020-11-18 12:09 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Daniel De Graaf, committers, xen-devel, Julien Grall

On 18.11.2020 09:41, Jürgen Groß wrote:
> On 18.11.20 09:22, Jan Beulich wrote:
>> On 17.11.2020 19:13, Julien Grall wrote:
>>> On 09/11/2020 16:38, Juergen Gross wrote:
>>>> Juergen Gross (3):
>>>>     xen/events: access last_priority and last_vcpu_id together
>>>>     xen/evtchn: rework per event channel lock
>>>>     xen/evtchn: revert 52e1fc47abc3a0123
>>>
>>> While looking at the list of commits, I noticed that the first patch
>>> hasn't been committed. They were all acked/reviewed, so I am a bit
>>> puzzled why this was omitted...
>>>
>>> I have nearly missed as I was expecting the 3 patches to be committed
>>> together. May I suggest that in the future we reply to the cover letter
>>> and mention which patches are (or not) committed?
>>>
>>> Regarding patch #1, I will commit it tomorrow unless there are strong
>>> objections against.
>>
>> Without a clear outline of what would break with the present logic,
>> I had previously indicated I'm not convinced of the change. This
>> isn't a strong objection, no, but I still wouldn't want to see my
>> name associated with it in such a case. Furthermore I clearly view
>> this as not a backporting candidate, while the other two are (as I
>> did previously indicate). Hence the latter two patches wanted
>> re-basing ahead of the first one anyway, to ease the backports.
> 
> Consider an NMI during evtchn_fifo_set_pending() between updating
> last_vcpu_id and last_priority, while on another cpu a concurrent
> evtchn_fifo_set_pending() is being called. On that other cpu
> lock_old_queue() might return a wrong queue as it will read only
> the new last_vcpu_id, but not the new last_priority value.

Which has become possible only as of patch 2 of this series, i.e.
this one really was a prereq without its description making this
clear. Now that it'll go in on top of the other two, there's no
strict need to change the description anymore, though. But I take
from this that I should also queue it up for backporting then.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-09 16:38 ` [PATCH v6 2/3] xen/evtchn: rework per event channel lock Juergen Gross
  2020-11-09 16:48   ` Jan Beulich
  2020-11-09 17:44   ` Julien Grall
@ 2020-11-18 13:19   ` Jan Beulich
  2020-11-23  7:29     ` Jürgen Groß
  2 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2020-11-18 13:19 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, xen-devel

On 09.11.2020 17:38, Juergen Gross wrote:
> Currently the lock for a single event channel needs to be taken with
> interrupts off, which causes deadlocks in some cases.
> 
> Rework the per event channel lock to be non-blocking for the case of
> sending an event and removing the need for disabling interrupts for
> taking the lock.
> 
> The lock is needed for avoiding races between event channel state
> changes (creation, closing, binding) against normal operations (set
> pending, [un]masking, priority changes).
> 
> Use a rwlock, but with some restrictions:
> 
> - Changing the state of an event channel (creation, closing, binding)
>   needs to use write_lock(), with ASSERT()ing that the lock is taken as
>   writer only when the state of the event channel is either before or
>   after the locked region appropriate (either free or unbound).
> 
> - Sending an event needs to use read_trylock() mostly, in case of not
>   obtaining the lock the operation is omitted. This is needed as
>   sending an event can happen with interrupts off (at least in some
>   cases).
> 
> - Dumping the event channel state for debug purposes is using
>   read_trylock(), too, in order to avoid blocking in case the lock is
>   taken as writer for a long time.
> 
> - All other cases can use read_lock().

One of the implications is that racing invocations of ->set_pending()
are now possible for the same port. Beyond what I said in reply to
0/3 already, I'm afraid there are (latent) issues:

1) The update of ->pending (or basically any bitfield in struct
evtchn, or yet more generically any field getting updated in a read-
modify-write fashion) is no longer generally safe in any of the
hooks called with just a read lock held. ->pending itself is not an
issue now merely because it shares storage only with xen_consumer,
which won't get updated once a port was bound.

2) Of two racing sends, one may now complete without the port
actually having got fully recorded as linked in the FIFO code. This
is because the party losing the race of setting EVTCHN_FIFO_LINKED
will return early, without regard to whether the winner has made
enough progress. (Of course this is possible only with an
intermediate queue change, as only then the lock would become
available to the second of the senders early enough.)

I've gone through other functions called from this path and didn't
find any further race potential there, but I'm not entirely certain
I didn't miss anything.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-18  8:22   ` Jan Beulich
  2020-11-18  8:41     ` Jürgen Groß
@ 2020-11-18 14:18     ` Julien Grall
  2020-11-18 14:30       ` Ian Jackson
  2020-11-18 14:33       ` Paul Durrant
  1 sibling, 2 replies; 21+ messages in thread
From: Julien Grall @ 2020-11-18 14:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Daniel De Graaf, committers, xen-devel, Juergen Gross

Hi Jan,

On 18/11/2020 08:22, Jan Beulich wrote:
> On 17.11.2020 19:13, Julien Grall wrote:
>> On 09/11/2020 16:38, Juergen Gross wrote:
>>> Juergen Gross (3):
>>>     xen/events: access last_priority and last_vcpu_id together
>>>     xen/evtchn: rework per event channel lock
>>>     xen/evtchn: revert 52e1fc47abc3a0123
>>
>> While looking at the list of commits, I noticed that the first patch
>> hasn't been committed. They were all acked/reviewed, so I am a bit
>> puzzled why this was omitted...
>>
>> I have nearly missed as I was expecting the 3 patches to be committed
>> together. May I suggest that in the future we reply to the cover letter
>> and mention which patches are (or not) committed?
>>
>> Regarding patch #1, I will commit it tomorrow unless there are strong
>> objections against.
> 
> Without a clear outline of what would break with the present logic,
> I had previously indicated I'm not convinced of the change. This
> isn't a strong objection, no, but I still wouldn't want to see my
> name associated with it in such a case.

I was under the impression that the committer job is mostly mechanical 
(i.e. collecting tags and applying patches). There are no rules in 
MAINTAINERS that mention committers can decide what gets committed as 
long as maintainers approved it and there are no strong objections.

> Furthermore I clearly view
> this as not a backporting candidate, while the other two are (as I
> did previously indicate). Hence the latter two patches wanted
> re-basing ahead of the first one anyway, to ease the backports.

I understand the backport concern. However, if there were clash, then it 
means you had to resolve them on commit to staging. Surely, they were 
quite minimal otherwise you would have sent an e-mail on xen-devel, right?

> While I will accept there are different views possible here, I also
> don't think sending mail in such a case is a good use of resources.
> The information what was or was not committed is readily available. > Personally I view such mails as at least very close to spam.

This is a matter of perspective. It helps to confirm with the 
contributor that it was fine to merge only part of the series (multiple 
pair of eyes is always better than one...) or mentioning any clash/reworked.

It also makes easier for reviewers to figure out what was committed as 
it can be difficult to know if a patch was merged because commit title 
can be altered (even simply dropping the prefix "xen/" can take a coule 
of more minutes to pinpoint commit).

Therefore, I think there is a value for such e-mail to be sent out. It 
will likely improve coordination among the member of the community.

I would be happy to consider different approach if it fullfills that goal.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-18 14:18     ` Julien Grall
@ 2020-11-18 14:30       ` Ian Jackson
  2020-11-18 14:33       ` Paul Durrant
  1 sibling, 0 replies; 21+ messages in thread
From: Ian Jackson @ 2020-11-18 14:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Daniel De Graaf, committers, xen-devel, Juergen Gross

Julien Grall writes ("Re: [PATCH v6 0/3] XSA-343 followup patches"):
> On 18/11/2020 08:22, Jan Beulich wrote:
> > Without a clear outline of what would break with the present logic,
> > I had previously indicated I'm not convinced of the change. This
> > isn't a strong objection, no, but I still wouldn't want to see my
> > name associated with it in such a case.
> 
> I was under the impression that the committer job is mostly mechanical 
> (i.e. collecting tags and applying patches). There are no rules in 
> MAINTAINERS that mention committers can decide what gets committed as 
> long as maintainers approved it and there are no strong objections.

I think in practice committers need to exercise quite a bit of
judgement.  I often find myself deciding on what seem to be edge cases
of the rules.  I also sometimes, with something which has the formally
needed acks, but which seems like it might be controversial or
awkward, decide to just make an extra double-check.  I sometimes even
commit something when maybe the formal rules wouldn't permit it, when
I'm confident that the relevant maintainer(s) etc. wouldn't object -
an example being when something is badly broken and this is allegedly
the fix.

However, I don't agree that a committer should omit committing
something which is acked, and is part of a series which they are
committing, simply because they don't see the need for the patch.

If a committer who is also a maintainer has a firm objection, they
should state that objection as a formal NAK.  If a formal NAK is not
warranted, a committer should not engage in passive obstruction.

Also, a committer should not "silently" commit only part of a series.


In summary, committing something is not a declaration by the committer
that they approve of the patch on a technical level.

It is a declaration that (in the committer's view) the patch is
properly acked/approved, and that therefore according to our shared
understanding of the community's processes, it ought to go in.

That might even occur if the committer has an outstanding technical
objection.  I have certainly committed things that I didn't really
like very much, on the grounds that they had enough acks and that I
didn't feel I wanted to give it a formal NAK.


If it would help a committer feel more comfortable to be explicit
about this, there are a number of approaches available: the committer
could commit the patch but also reply to it on-list restating their
objection but saying that they are committing it despite the
objection, because of others' acks.

If it is felt desirable to record this information in the repository,
one could write something like
   Not-Acked-by: Ian Jackson <iwj@xenproject.org>
(which I think is not the same as Nacked-by) or even add a note in
the commit message like
  [ I am committing this, even though I don't think it is
    necessary, because it has the requisite acks.  I do not
    think it warrants a nack.  -iwj ]

> This is a matter of perspective. It helps to confirm with the
> contributor that it was fine to merge only part of the series
> (multiple pair of eyes is always better than one...) or mentioning
> any clash/reworked.

I think it is especially important to send an email about things being
committed when what has happened might be surprising.  For example, if
only part of a series is committed.

Thanks,
Ian.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v6 0/3] XSA-343 followup patches
  2020-11-18 14:18     ` Julien Grall
  2020-11-18 14:30       ` Ian Jackson
@ 2020-11-18 14:33       ` Paul Durrant
  1 sibling, 0 replies; 21+ messages in thread
From: Paul Durrant @ 2020-11-18 14:33 UTC (permalink / raw)
  To: 'Julien Grall', 'Jan Beulich'
  Cc: 'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Daniel De Graaf',
	committers, xen-devel, 'Juergen Gross'

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Julien Grall
> Sent: 18 November 2020 14:18
> To: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <iwj@xenproject.org>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>; committers@xenproject.org; xen-
> devel@lists.xenproject.org; Juergen Gross <jgross@suse.com>
> Subject: Re: [PATCH v6 0/3] XSA-343 followup patches
> 
> Hi Jan,
> 
> On 18/11/2020 08:22, Jan Beulich wrote:
> > On 17.11.2020 19:13, Julien Grall wrote:
> >> On 09/11/2020 16:38, Juergen Gross wrote:
> >>> Juergen Gross (3):
> >>>     xen/events: access last_priority and last_vcpu_id together
> >>>     xen/evtchn: rework per event channel lock
> >>>     xen/evtchn: revert 52e1fc47abc3a0123
> >>
> >> While looking at the list of commits, I noticed that the first patch
> >> hasn't been committed. They were all acked/reviewed, so I am a bit
> >> puzzled why this was omitted...
> >>
> >> I have nearly missed as I was expecting the 3 patches to be committed
> >> together. May I suggest that in the future we reply to the cover letter
> >> and mention which patches are (or not) committed?
> >>
> >> Regarding patch #1, I will commit it tomorrow unless there are strong
> >> objections against.
> >
> > Without a clear outline of what would break with the present logic,
> > I had previously indicated I'm not convinced of the change. This
> > isn't a strong objection, no, but I still wouldn't want to see my
> > name associated with it in such a case.
> 
> I was under the impression that the committer job is mostly mechanical
> (i.e. collecting tags and applying patches). There are no rules in
> MAINTAINERS that mention committers can decide what gets committed as
> long as maintainers approved it and there are no strong objections.
> 
> > Furthermore I clearly view
> > this as not a backporting candidate, while the other two are (as I
> > did previously indicate). Hence the latter two patches wanted
> > re-basing ahead of the first one anyway, to ease the backports.
> 
> I understand the backport concern. However, if there were clash, then it
> means you had to resolve them on commit to staging. Surely, they were
> quite minimal otherwise you would have sent an e-mail on xen-devel, right?
> 
> > While I will accept there are different views possible here, I also
> > don't think sending mail in such a case is a good use of resources.
> > The information what was or was not committed is readily available. > Personally I view such mails
> as at least very close to spam.
> 
> This is a matter of perspective. It helps to confirm with the
> contributor that it was fine to merge only part of the series (multiple
> pair of eyes is always better than one...) or mentioning any clash/reworked.
> 
> It also makes easier for reviewers to figure out what was committed as
> it can be difficult to know if a patch was merged because commit title
> can be altered (even simply dropping the prefix "xen/" can take a coule
> of more minutes to pinpoint commit).
> 
> Therefore, I think there is a value for such e-mail to be sent out. It
> will likely improve coordination among the member of the community.

+1

Knowing that part of a long series has already been committed would be useful. When I do the usual pull/rebase dance prior to re-submitting I have, more than once, been surprised by rebase failures because some of the patches had been committed but were modified in doing do. It's not a massive problem but an email would have at least made me aware I needed to be a bit more careful.

  Paul

> 
> I would be happy to consider different approach if it fullfills that goal.
> 
> Cheers,
> 
> --
> Julien Grall




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock
  2020-11-18 13:19   ` Jan Beulich
@ 2020-11-23  7:29     ` Jürgen Groß
  0 siblings, 0 replies; 21+ messages in thread
From: Jürgen Groß @ 2020-11-23  7:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3252 bytes --]

On 18.11.20 14:19, Jan Beulich wrote:
> On 09.11.2020 17:38, Juergen Gross wrote:
>> Currently the lock for a single event channel needs to be taken with
>> interrupts off, which causes deadlocks in some cases.
>>
>> Rework the per event channel lock to be non-blocking for the case of
>> sending an event and removing the need for disabling interrupts for
>> taking the lock.
>>
>> The lock is needed for avoiding races between event channel state
>> changes (creation, closing, binding) against normal operations (set
>> pending, [un]masking, priority changes).
>>
>> Use a rwlock, but with some restrictions:
>>
>> - Changing the state of an event channel (creation, closing, binding)
>>    needs to use write_lock(), with ASSERT()ing that the lock is taken as
>>    writer only when the state of the event channel is either before or
>>    after the locked region appropriate (either free or unbound).
>>
>> - Sending an event needs to use read_trylock() mostly, in case of not
>>    obtaining the lock the operation is omitted. This is needed as
>>    sending an event can happen with interrupts off (at least in some
>>    cases).
>>
>> - Dumping the event channel state for debug purposes is using
>>    read_trylock(), too, in order to avoid blocking in case the lock is
>>    taken as writer for a long time.
>>
>> - All other cases can use read_lock().
> 
> One of the implications is that racing invocations of ->set_pending()
> are now possible for the same port. Beyond what I said in reply to
> 0/3 already, I'm afraid there are (latent) issues:
> 
> 1) The update of ->pending (or basically any bitfield in struct
> evtchn, or yet more generically any field getting updated in a read-
> modify-write fashion) is no longer generally safe in any of the
> hooks called with just a read lock held. ->pending itself is not an
> issue now merely because it shares storage only with xen_consumer,
> which won't get updated once a port was bound.

This is fragile.

We should put the pending indicator into a dedicated byte.

> 2) Of two racing sends, one may now complete without the port
> actually having got fully recorded as linked in the FIFO code. This
> is because the party losing the race of setting EVTCHN_FIFO_LINKED
> will return early, without regard to whether the winner has made
> enough progress. (Of course this is possible only with an
> intermediate queue change, as only then the lock would become
> available to the second of the senders early enough.)

No, I don't think this is limited to a queue change. If a caller of
evtchn_fifo_set_pending() is being interrupted after setting
EVTCHN_FIFO_PENDING, and then a second caller can make it to setting
EVTCHN_FIFO_LINKED, the first caller won't even try to take the queue
lock, resulting in evtchn_check_pollers() being called before the
event might have been put properly into the queue.

I'd suggest to extend the fifo queue lock region in order to mitigate
this problem.

> 
> I've gone through other functions called from this path and didn't
> find any further race potential there, but I'm not entirely certain
> I didn't miss anything.

I can prepare a patch if you agree my ideas.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-11-23  7:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-09 16:38 [PATCH v6 0/3] XSA-343 followup patches Juergen Gross
2020-11-09 16:38 ` [PATCH v6 1/3] xen/events: access last_priority and last_vcpu_id together Juergen Gross
2020-11-09 16:38 ` [PATCH v6 2/3] xen/evtchn: rework per event channel lock Juergen Gross
2020-11-09 16:48   ` Jan Beulich
2020-11-09 17:46     ` Julien Grall
2020-11-10  6:13       ` Jürgen Groß
2020-11-10  7:39       ` Jan Beulich
2020-11-10  9:47         ` Julien Grall
2020-11-10 10:10           ` Jan Beulich
2020-11-09 17:44   ` Julien Grall
2020-11-18 13:19   ` Jan Beulich
2020-11-23  7:29     ` Jürgen Groß
2020-11-09 16:38 ` [PATCH v6 3/3] xen/evtchn: revert 52e1fc47abc3a0123 Juergen Gross
2020-11-09 16:40   ` Jan Beulich
2020-11-17 18:13 ` [PATCH v6 0/3] XSA-343 followup patches Julien Grall
2020-11-18  8:22   ` Jan Beulich
2020-11-18  8:41     ` Jürgen Groß
2020-11-18 12:09       ` Jan Beulich
2020-11-18 14:18     ` Julien Grall
2020-11-18 14:30       ` Ian Jackson
2020-11-18 14:33       ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.