Xen-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] XSA-343 followup patches
@ 2020-10-12  9:27 Juergen Gross
  2020-10-12  9:27 ` [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together Juergen Gross
  2020-10-12  9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
  0 siblings, 2 replies; 26+ messages in thread
From: Juergen Gross @ 2020-10-12  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné

The patches for XSA-343 produced some fallout, especially the event
channel locking has shown to be problematic.

Patch 1 is targeting fifo event channels for avoiding any races for the
case that the fifo queue has been changed for a specific event channel.

The second patch is modifying the per event channel locking scheme in
order to avoid deadlocks and problems due to the event channel lock
having been changed to require IRQs off by the XSA-343 patches.

Juergen Gross (2):
  xen/events: access last_priority and last_vcpu_id together
  xen/evtchn: rework per event channel lock

 xen/arch/x86/irq.c         |   6 +-
 xen/arch/x86/pv/shim.c     |   9 +--
 xen/common/event_channel.c | 109 +++++++++++++++++--------------------
 xen/common/event_fifo.c    |  25 +++++++--
 xen/include/xen/event.h    |  56 ++++++++++++++++---
 xen/include/xen/sched.h    |   5 +-
 6 files changed, 126 insertions(+), 84 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-12  9:27 [PATCH v2 0/2] XSA-343 followup patches Juergen Gross
@ 2020-10-12  9:27 ` Juergen Gross
  2020-10-12  9:48   ` Paul Durrant
  2020-10-13 13:58   ` Jan Beulich
  2020-10-12  9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
  1 sibling, 2 replies; 26+ messages in thread
From: Juergen Gross @ 2020-10-12  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

The queue for a fifo event is depending on the vcpu_id and the
priority of the event. When sending an event it might happen the
event needs to change queues and the old queue needs to be kept for
keeping the links between queue elements intact. For this purpose
the event channel contains last_priority and last_vcpu_id values
elements for being able to identify the old queue.

In order to avoid races always access last_priority and last_vcpu_id
with a single atomic operation avoiding any inconsistencies.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/event_fifo.c | 25 +++++++++++++++++++------
 xen/include/xen/sched.h |  3 +--
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index fc189152e1..fffbd409c8 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -42,6 +42,14 @@ struct evtchn_fifo_domain {
     unsigned int num_evtchns;
 };
 
+union evtchn_fifo_lastq {
+    u32 raw;
+    struct {
+        u8 last_priority;
+        u16 last_vcpu_id;
+    };
+};
+
 static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d,
                                                        unsigned int port)
 {
@@ -86,16 +94,18 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
     struct vcpu *v;
     struct evtchn_fifo_queue *q, *old_q;
     unsigned int try;
+    union evtchn_fifo_lastq lastq;
 
     for ( try = 0; try < 3; try++ )
     {
-        v = d->vcpu[evtchn->last_vcpu_id];
-        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
+        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+        v = d->vcpu[lastq.last_vcpu_id];
+        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
 
         spin_lock_irqsave(&old_q->lock, *flags);
 
-        v = d->vcpu[evtchn->last_vcpu_id];
-        q = &v->evtchn_fifo->queue[evtchn->last_priority];
+        v = d->vcpu[lastq.last_vcpu_id];
+        q = &v->evtchn_fifo->queue[lastq.last_priority];
 
         if ( old_q == q )
             return old_q;
@@ -246,8 +256,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         /* Moved to a different queue? */
         if ( old_q != q )
         {
-            evtchn->last_vcpu_id = v->vcpu_id;
-            evtchn->last_priority = q->priority;
+            union evtchn_fifo_lastq lastq;
+
+            lastq.last_vcpu_id = v->vcpu_id;
+            lastq.last_priority = q->priority;
+            write_atomic(&evtchn->fifo_lastq, lastq.raw);
 
             spin_unlock_irqrestore(&old_q->lock, flags);
             spin_lock_irqsave(&q->lock, flags);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d8ed83f869..a298ff4df8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -114,8 +114,7 @@ struct evtchn
         u16 virq;      /* state == ECS_VIRQ */
     } u;
     u8 priority;
-    u8 last_priority;
-    u16 last_vcpu_id;
+    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
 #ifdef CONFIG_XSM
     union {
 #ifdef XSM_NEED_GENERIC_EVTCHN_SSID
-- 
2.26.2



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-12  9:27 [PATCH v2 0/2] XSA-343 followup patches Juergen Gross
  2020-10-12  9:27 ` [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together Juergen Gross
@ 2020-10-12  9:27 ` Juergen Gross
  2020-10-13 14:02   ` Jan Beulich
                     ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Juergen Gross @ 2020-10-12  9:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

Currently the lock for a single event channel needs to be taken with
interrupts off, which causes deadlocks in some cases.

Rework the per event channel lock to be non-blocking for the case of
sending an event and removing the need for disabling interrupts for
taking the lock.

The lock is needed for avoiding races between sending an event or
querying the channel's state against removal of the event channel.

Use a locking scheme similar to a rwlock, but with some modifications:

- sending an event or querying the event channel's state uses an
  operation similar to read_trylock(), in case of not obtaining the
  lock the sending is omitted or a default state is returned

- closing an event channel is similar to write_lock(), but without
  real fairness regarding multiple writers (this saves some space in
  the event channel structure and multiple writers are impossible as
  closing an event channel requires the domain's event_lock to be
  held).

With this locking scheme it is mandatory that a writer will always
either start with an unbound or free event channel or will end with
an unbound or free event channel, as otherwise the reaction of a reader
not getting the lock would be wrong.

Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- added needed barriers
---
 xen/arch/x86/irq.c         |   6 +-
 xen/arch/x86/pv/shim.c     |   9 +--
 xen/common/event_channel.c | 109 +++++++++++++++++--------------------
 xen/include/xen/event.h    |  56 ++++++++++++++++---
 xen/include/xen/sched.h    |   2 +-
 5 files changed, 106 insertions(+), 76 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 93c4fb9a79..77290032f5 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2495,14 +2495,12 @@ static void dump_irqs(unsigned char key)
                 pirq = domain_irq_to_pirq(d, irq);
                 info = pirq_info(d, pirq);
                 evtchn = evtchn_from_port(d, info->evtchn);
-                local_irq_disable();
-                if ( spin_trylock(&evtchn->lock) )
+                if ( evtchn_tryread_lock(evtchn) )
                 {
                     pending = evtchn_is_pending(d, evtchn);
                     masked = evtchn_is_masked(d, evtchn);
-                    spin_unlock(&evtchn->lock);
+                    evtchn_read_unlock(evtchn);
                 }
-                local_irq_enable();
                 printk("d%d:%3d(%c%c%c)%c",
                        d->domain_id, pirq, "-P?"[pending],
                        "-M?"[masked], info->masked ? 'M' : '-',
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 9aef7a860a..3734250bf7 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -660,11 +660,12 @@ void pv_shim_inject_evtchn(unsigned int port)
     if ( port_is_valid(guest, port) )
     {
         struct evtchn *chn = evtchn_from_port(guest, port);
-        unsigned long flags;
 
-        spin_lock_irqsave(&chn->lock, flags);
-        evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
-        spin_unlock_irqrestore(&chn->lock, flags);
+        if ( evtchn_tryread_lock(chn) )
+        {
+            evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
+            evtchn_read_unlock(chn);
+        }
     }
 }
 
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index e365b5498f..398a1e7aa0 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -131,7 +131,7 @@ static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port)
             return NULL;
         }
         chn[i].port = port + i;
-        spin_lock_init(&chn[i].lock);
+        atomic_set(&chn[i].lock, 0);
     }
     return chn;
 }
@@ -253,7 +253,6 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
     int            port;
     domid_t        dom = alloc->dom;
     long           rc;
-    unsigned long  flags;
 
     d = rcu_lock_domain_by_any_id(dom);
     if ( d == NULL )
@@ -269,14 +268,14 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
     if ( rc )
         goto out;
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state = ECS_UNBOUND;
     if ( (chn->u.unbound.remote_domid = alloc->remote_dom) == DOMID_SELF )
         chn->u.unbound.remote_domid = current->domain->domain_id;
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     alloc->port = port;
 
@@ -289,32 +288,26 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 }
 
 
-static unsigned long double_evtchn_lock(struct evtchn *lchn,
-                                        struct evtchn *rchn)
+static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
 {
-    unsigned long flags;
-
     if ( lchn <= rchn )
     {
-        spin_lock_irqsave(&lchn->lock, flags);
+        evtchn_write_lock(lchn);
         if ( lchn != rchn )
-            spin_lock(&rchn->lock);
+            evtchn_write_lock(rchn);
     }
     else
     {
-        spin_lock_irqsave(&rchn->lock, flags);
-        spin_lock(&lchn->lock);
+        evtchn_write_lock(rchn);
+        evtchn_write_lock(lchn);
     }
-
-    return flags;
 }
 
-static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn,
-                                 unsigned long flags)
+static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn)
 {
     if ( lchn != rchn )
-        spin_unlock(&lchn->lock);
-    spin_unlock_irqrestore(&rchn->lock, flags);
+        evtchn_write_unlock(lchn);
+    evtchn_write_unlock(rchn);
 }
 
 static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
@@ -324,7 +317,6 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     int            lport, rport = bind->remote_port;
     domid_t        rdom = bind->remote_dom;
     long           rc;
-    unsigned long  flags;
 
     if ( rdom == DOMID_SELF )
         rdom = current->domain->domain_id;
@@ -360,7 +352,7 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     if ( rc )
         goto out;
 
-    flags = double_evtchn_lock(lchn, rchn);
+    double_evtchn_lock(lchn, rchn);
 
     lchn->u.interdomain.remote_dom  = rd;
     lchn->u.interdomain.remote_port = rport;
@@ -377,7 +369,7 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
      */
     evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
 
-    double_evtchn_unlock(lchn, rchn, flags);
+    double_evtchn_unlock(lchn, rchn);
 
     bind->local_port = lport;
 
@@ -400,7 +392,6 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
     struct domain *d = current->domain;
     int            virq = bind->virq, vcpu = bind->vcpu;
     int            rc = 0;
-    unsigned long  flags;
 
     if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
         return -EINVAL;
@@ -438,14 +429,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
 
     chn = evtchn_from_port(d, port);
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state          = ECS_VIRQ;
     chn->notify_vcpu_id = vcpu;
     chn->u.virq         = virq;
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     v->virq_to_evtchn[virq] = bind->port = port;
 
@@ -462,7 +453,6 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
     struct domain *d = current->domain;
     int            port, vcpu = bind->vcpu;
     long           rc = 0;
-    unsigned long  flags;
 
     if ( domain_vcpu(d, vcpu) == NULL )
         return -ENOENT;
@@ -474,13 +464,13 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
 
     chn = evtchn_from_port(d, port);
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state          = ECS_IPI;
     chn->notify_vcpu_id = vcpu;
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     bind->port = port;
 
@@ -524,7 +514,6 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
     struct pirq   *info;
     int            port = 0, pirq = bind->pirq;
     long           rc;
-    unsigned long  flags;
 
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
@@ -557,14 +546,14 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
         goto out;
     }
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state  = ECS_PIRQ;
     chn->u.pirq.irq = pirq;
     link_pirq_port(port, chn, v);
     evtchn_port_init(d, chn);
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     bind->port = port;
 
@@ -585,7 +574,6 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
     struct evtchn *chn1, *chn2;
     int            port2;
     long           rc = 0;
-    unsigned long  flags;
 
  again:
     spin_lock(&d1->event_lock);
@@ -686,14 +674,14 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
         BUG_ON(chn2->state != ECS_INTERDOMAIN);
         BUG_ON(chn2->u.interdomain.remote_dom != d1);
 
-        flags = double_evtchn_lock(chn1, chn2);
+        double_evtchn_lock(chn1, chn2);
 
         evtchn_free(d1, chn1);
 
         chn2->state = ECS_UNBOUND;
         chn2->u.unbound.remote_domid = d1->domain_id;
 
-        double_evtchn_unlock(chn1, chn2, flags);
+        double_evtchn_unlock(chn1, chn2);
 
         goto out;
 
@@ -701,9 +689,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
         BUG();
     }
 
-    spin_lock_irqsave(&chn1->lock, flags);
+    evtchn_write_lock(chn1);
     evtchn_free(d1, chn1);
-    spin_unlock_irqrestore(&chn1->lock, flags);
+    evtchn_write_unlock(chn1);
 
  out:
     if ( d2 != NULL )
@@ -723,7 +711,6 @@ int evtchn_send(struct domain *ld, unsigned int lport)
     struct evtchn *lchn, *rchn;
     struct domain *rd;
     int            rport, ret = 0;
-    unsigned long  flags;
 
     if ( !port_is_valid(ld, lport) )
         return -EINVAL;
@@ -736,7 +723,8 @@ int evtchn_send(struct domain *ld, unsigned int lport)
 
     lchn = evtchn_from_port(ld, lport);
 
-    spin_lock_irqsave(&lchn->lock, flags);
+    if ( !evtchn_tryread_lock(lchn) )
+        return 0;
 
     /* Guest cannot send via a Xen-attached event channel. */
     if ( unlikely(consumer_is_xen(lchn)) )
@@ -771,7 +759,7 @@ int evtchn_send(struct domain *ld, unsigned int lport)
     }
 
 out:
-    spin_unlock_irqrestore(&lchn->lock, flags);
+    evtchn_read_unlock(lchn);
 
     return ret;
 }
@@ -798,9 +786,11 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
 
     d = v->domain;
     chn = evtchn_from_port(d, port);
-    spin_lock(&chn->lock);
-    evtchn_port_set_pending(d, v->vcpu_id, chn);
-    spin_unlock(&chn->lock);
+    if ( evtchn_tryread_lock(chn) )
+    {
+        evtchn_port_set_pending(d, v->vcpu_id, chn);
+        evtchn_read_unlock(chn);
+    }
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -829,9 +819,11 @@ void send_guest_global_virq(struct domain *d, uint32_t virq)
         goto out;
 
     chn = evtchn_from_port(d, port);
-    spin_lock(&chn->lock);
-    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
-    spin_unlock(&chn->lock);
+    if ( evtchn_tryread_lock(chn) )
+    {
+        evtchn_port_set_pending(d, v->vcpu_id, chn);
+        evtchn_read_unlock(chn);
+    }
 
  out:
     spin_unlock_irqrestore(&v->virq_lock, flags);
@@ -841,7 +833,6 @@ void send_guest_pirq(struct domain *d, const struct pirq *pirq)
 {
     int port;
     struct evtchn *chn;
-    unsigned long flags;
 
     /*
      * PV guests: It should not be possible to race with __evtchn_close(). The
@@ -856,9 +847,11 @@ void send_guest_pirq(struct domain *d, const struct pirq *pirq)
     }
 
     chn = evtchn_from_port(d, port);
-    spin_lock_irqsave(&chn->lock, flags);
-    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
-    spin_unlock_irqrestore(&chn->lock, flags);
+    if ( evtchn_tryread_lock(chn) )
+    {
+        evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+        evtchn_read_unlock(chn);
+    }
 }
 
 static struct domain *global_virq_handlers[NR_VIRQS] __read_mostly;
@@ -1060,15 +1053,16 @@ int evtchn_unmask(unsigned int port)
 {
     struct domain *d = current->domain;
     struct evtchn *evtchn;
-    unsigned long flags;
 
     if ( unlikely(!port_is_valid(d, port)) )
         return -EINVAL;
 
     evtchn = evtchn_from_port(d, port);
-    spin_lock_irqsave(&evtchn->lock, flags);
-    evtchn_port_unmask(d, evtchn);
-    spin_unlock_irqrestore(&evtchn->lock, flags);
+    if ( evtchn_tryread_lock(evtchn) )
+    {
+        evtchn_port_unmask(d, evtchn);
+        evtchn_read_unlock(evtchn);
+    }
 
     return 0;
 }
@@ -1327,7 +1321,6 @@ int alloc_unbound_xen_event_channel(
 {
     struct evtchn *chn;
     int            port, rc;
-    unsigned long  flags;
 
     spin_lock(&ld->event_lock);
 
@@ -1340,14 +1333,14 @@ int alloc_unbound_xen_event_channel(
     if ( rc )
         goto out;
 
-    spin_lock_irqsave(&chn->lock, flags);
+    evtchn_write_lock(chn);
 
     chn->state = ECS_UNBOUND;
     chn->xen_consumer = get_xen_consumer(notification_fn);
     chn->notify_vcpu_id = lvcpu;
     chn->u.unbound.remote_domid = remote_domid;
 
-    spin_unlock_irqrestore(&chn->lock, flags);
+    evtchn_write_unlock(chn);
 
     /*
      * Increment ->xen_evtchns /after/ ->active_evtchns. No explicit
@@ -1383,7 +1376,6 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
 {
     struct evtchn *lchn, *rchn;
     struct domain *rd;
-    unsigned long flags;
 
     if ( !port_is_valid(ld, lport) )
     {
@@ -1398,7 +1390,8 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
 
     lchn = evtchn_from_port(ld, lport);
 
-    spin_lock_irqsave(&lchn->lock, flags);
+    if ( !evtchn_tryread_lock(lchn) )
+        return;
 
     if ( likely(lchn->state == ECS_INTERDOMAIN) )
     {
@@ -1408,7 +1401,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
         evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
     }
 
-    spin_unlock_irqrestore(&lchn->lock, flags);
+    evtchn_read_unlock(lchn);
 }
 
 void evtchn_check_pollers(struct domain *d, unsigned int port)
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 509d3ae861..39a93f7556 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -105,6 +105,45 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
+#define EVENT_WRITE_LOCK_INC    MAX_VIRT_CPUS
+static inline void evtchn_write_lock(struct evtchn *evtchn)
+{
+    int val;
+
+    /* No barrier needed, atomic_add_return() is full barrier. */
+    for ( val = atomic_add_return(EVENT_WRITE_LOCK_INC, &evtchn->lock);
+          val != EVENT_WRITE_LOCK_INC;
+          val = atomic_read(&evtchn->lock) )
+        cpu_relax();
+}
+
+static inline void evtchn_write_unlock(struct evtchn *evtchn)
+{
+    arch_lock_release_barrier();
+
+    atomic_sub(EVENT_WRITE_LOCK_INC, &evtchn->lock);
+}
+
+static inline bool evtchn_tryread_lock(struct evtchn *evtchn)
+{
+    if ( atomic_read(&evtchn->lock) >= EVENT_WRITE_LOCK_INC )
+        return false;
+
+    /* No barrier needed, atomic_inc_return() is full barrier. */
+    if ( atomic_inc_return(&evtchn->lock) < EVENT_WRITE_LOCK_INC )
+        return true;
+
+    atomic_dec(&evtchn->lock);
+    return false;
+}
+
+static inline void evtchn_read_unlock(struct evtchn *evtchn)
+{
+    arch_lock_release_barrier();
+
+    atomic_dec(&evtchn->lock);
+}
+
 static inline unsigned int max_evtchns(const struct domain *d)
 {
     return d->evtchn_fifo ? EVTCHN_FIFO_NR_CHANNELS
@@ -249,12 +288,11 @@ static inline bool evtchn_is_masked(const struct domain *d,
 static inline bool evtchn_port_is_masked(struct domain *d, evtchn_port_t port)
 {
     struct evtchn *evtchn = evtchn_from_port(d, port);
-    bool rc;
-    unsigned long flags;
+    bool rc = true;
 
-    spin_lock_irqsave(&evtchn->lock, flags);
-    rc = evtchn_is_masked(d, evtchn);
-    spin_unlock_irqrestore(&evtchn->lock, flags);
+    if ( evtchn_tryread_lock(evtchn) )
+        rc = evtchn_is_masked(d, evtchn);
+    evtchn_read_unlock(evtchn);
 
     return rc;
 }
@@ -274,12 +312,12 @@ static inline int evtchn_port_poll(struct domain *d, evtchn_port_t port)
     if ( port_is_valid(d, port) )
     {
         struct evtchn *evtchn = evtchn_from_port(d, port);
-        unsigned long flags;
 
-        spin_lock_irqsave(&evtchn->lock, flags);
-        if ( evtchn_usable(evtchn) )
+        if ( evtchn_tryread_lock(evtchn) && evtchn_usable(evtchn) )
+        {
             rc = evtchn_is_pending(d, evtchn);
-        spin_unlock_irqrestore(&evtchn->lock, flags);
+            evtchn_read_unlock(evtchn);
+        }
     }
 
     return rc;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index a298ff4df8..096e0ec6af 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -85,7 +85,7 @@ extern domid_t hardware_domid;
 
 struct evtchn
 {
-    spinlock_t lock;
+    atomic_t lock;         /* kind of rwlock, use evtchn_*_[un]lock()        */
 #define ECS_FREE         0 /* Channel is available for use.                  */
 #define ECS_RESERVED     1 /* Channel is reserved.                           */
 #define ECS_UNBOUND      2 /* Channel is waiting to bind to a remote domain. */
-- 
2.26.2



^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-12  9:27 ` [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together Juergen Gross
@ 2020-10-12  9:48   ` Paul Durrant
  2020-10-12  9:56     ` Jürgen Groß
  2020-10-13 13:58   ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Durrant @ 2020-10-12  9:48 UTC (permalink / raw)
  To: 'Juergen Gross', xen-devel
  Cc: 'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Jan Beulich',
	'Julien Grall', 'Stefano Stabellini',
	'Wei Liu'

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Juergen Gross
> Sent: 12 October 2020 10:28
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Jan Beulich <jbeulich@suse.com>; Julien
> Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>
> Subject: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
> 
> The queue for a fifo event is depending on the vcpu_id and the
> priority of the event. When sending an event it might happen the
> event needs to change queues and the old queue needs to be kept for
> keeping the links between queue elements intact. For this purpose
> the event channel contains last_priority and last_vcpu_id values
> elements for being able to identify the old queue.
> 
> In order to avoid races always access last_priority and last_vcpu_id
> with a single atomic operation avoiding any inconsistencies.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/common/event_fifo.c | 25 +++++++++++++++++++------
>  xen/include/xen/sched.h |  3 +--
>  2 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> index fc189152e1..fffbd409c8 100644
> --- a/xen/common/event_fifo.c
> +++ b/xen/common/event_fifo.c
> @@ -42,6 +42,14 @@ struct evtchn_fifo_domain {
>      unsigned int num_evtchns;
>  };
> 
> +union evtchn_fifo_lastq {
> +    u32 raw;
> +    struct {
> +        u8 last_priority;
> +        u16 last_vcpu_id;
> +    };
> +};

I guess you want to s/u32/uint32_t, etc. above.

> +
>  static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d,
>                                                         unsigned int port)
>  {
> @@ -86,16 +94,18 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>      struct vcpu *v;
>      struct evtchn_fifo_queue *q, *old_q;
>      unsigned int try;
> +    union evtchn_fifo_lastq lastq;
> 
>      for ( try = 0; try < 3; try++ )
>      {
> -        v = d->vcpu[evtchn->last_vcpu_id];
> -        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> +        v = d->vcpu[lastq.last_vcpu_id];
> +        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
> 
>          spin_lock_irqsave(&old_q->lock, *flags);
> 
> -        v = d->vcpu[evtchn->last_vcpu_id];
> -        q = &v->evtchn_fifo->queue[evtchn->last_priority];
> +        v = d->vcpu[lastq.last_vcpu_id];
> +        q = &v->evtchn_fifo->queue[lastq.last_priority];
> 
>          if ( old_q == q )
>              return old_q;
> @@ -246,8 +256,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>          /* Moved to a different queue? */
>          if ( old_q != q )
>          {
> -            evtchn->last_vcpu_id = v->vcpu_id;
> -            evtchn->last_priority = q->priority;
> +            union evtchn_fifo_lastq lastq;
> +
> +            lastq.last_vcpu_id = v->vcpu_id;
> +            lastq.last_priority = q->priority;
> +            write_atomic(&evtchn->fifo_lastq, lastq.raw);
> 

You're going to leak some stack here I think. Perhaps add a 'pad' field between 'last_priority' and 'last_vcpu_id' and zero it?

  Paul

>              spin_unlock_irqrestore(&old_q->lock, flags);
>              spin_lock_irqsave(&q->lock, flags);
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index d8ed83f869..a298ff4df8 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -114,8 +114,7 @@ struct evtchn
>          u16 virq;      /* state == ECS_VIRQ */
>      } u;
>      u8 priority;
> -    u8 last_priority;
> -    u16 last_vcpu_id;
> +    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
>  #ifdef CONFIG_XSM
>      union {
>  #ifdef XSM_NEED_GENERIC_EVTCHN_SSID
> --
> 2.26.2
> 




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-12  9:48   ` Paul Durrant
@ 2020-10-12  9:56     ` Jürgen Groß
  2020-10-12 10:06       ` Paul Durrant
  0 siblings, 1 reply; 26+ messages in thread
From: Jürgen Groß @ 2020-10-12  9:56 UTC (permalink / raw)
  To: paul, xen-devel
  Cc: 'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Jan Beulich',
	'Julien Grall', 'Stefano Stabellini',
	'Wei Liu'

On 12.10.20 11:48, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Juergen Gross
>> Sent: 12 October 2020 10:28
>> To: xen-devel@lists.xenproject.org
>> Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
>> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Jan Beulich <jbeulich@suse.com>; Julien
>> Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>
>> Subject: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
>>
>> The queue for a fifo event is depending on the vcpu_id and the
>> priority of the event. When sending an event it might happen the
>> event needs to change queues and the old queue needs to be kept for
>> keeping the links between queue elements intact. For this purpose
>> the event channel contains last_priority and last_vcpu_id values
>> elements for being able to identify the old queue.
>>
>> In order to avoid races always access last_priority and last_vcpu_id
>> with a single atomic operation avoiding any inconsistencies.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   xen/common/event_fifo.c | 25 +++++++++++++++++++------
>>   xen/include/xen/sched.h |  3 +--
>>   2 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
>> index fc189152e1..fffbd409c8 100644
>> --- a/xen/common/event_fifo.c
>> +++ b/xen/common/event_fifo.c
>> @@ -42,6 +42,14 @@ struct evtchn_fifo_domain {
>>       unsigned int num_evtchns;
>>   };
>>
>> +union evtchn_fifo_lastq {
>> +    u32 raw;
>> +    struct {
>> +        u8 last_priority;
>> +        u16 last_vcpu_id;
>> +    };
>> +};
> 
> I guess you want to s/u32/uint32_t, etc. above.

Hmm, yes, probably.

> 
>> +
>>   static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d,
>>                                                          unsigned int port)
>>   {
>> @@ -86,16 +94,18 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>>       struct vcpu *v;
>>       struct evtchn_fifo_queue *q, *old_q;
>>       unsigned int try;
>> +    union evtchn_fifo_lastq lastq;
>>
>>       for ( try = 0; try < 3; try++ )
>>       {
>> -        v = d->vcpu[evtchn->last_vcpu_id];
>> -        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
>> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>> +        v = d->vcpu[lastq.last_vcpu_id];
>> +        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
>>
>>           spin_lock_irqsave(&old_q->lock, *flags);
>>
>> -        v = d->vcpu[evtchn->last_vcpu_id];
>> -        q = &v->evtchn_fifo->queue[evtchn->last_priority];
>> +        v = d->vcpu[lastq.last_vcpu_id];
>> +        q = &v->evtchn_fifo->queue[lastq.last_priority];
>>
>>           if ( old_q == q )
>>               return old_q;
>> @@ -246,8 +256,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>>           /* Moved to a different queue? */
>>           if ( old_q != q )
>>           {
>> -            evtchn->last_vcpu_id = v->vcpu_id;
>> -            evtchn->last_priority = q->priority;
>> +            union evtchn_fifo_lastq lastq;
>> +
>> +            lastq.last_vcpu_id = v->vcpu_id;
>> +            lastq.last_priority = q->priority;
>> +            write_atomic(&evtchn->fifo_lastq, lastq.raw);
>>
> 
> You're going to leak some stack here I think. Perhaps add a 'pad' field between 'last_priority' and 'last_vcpu_id' and zero it?

I can do that, but why? This is nothing a guest is supposed to see at
any time.


Juergen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-12  9:56     ` Jürgen Groß
@ 2020-10-12 10:06       ` Paul Durrant
  0 siblings, 0 replies; 26+ messages in thread
From: Paul Durrant @ 2020-10-12 10:06 UTC (permalink / raw)
  To: 'Jürgen Groß', xen-devel
  Cc: 'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Jan Beulich',
	'Julien Grall', 'Stefano Stabellini',
	'Wei Liu'

> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 12 October 2020 10:56
> To: paul@xen.org; xen-devel@lists.xenproject.org
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Ian
> Jackson' <iwj@xenproject.org>; 'Jan Beulich' <jbeulich@suse.com>; 'Julien Grall' <julien@xen.org>;
> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>
> Subject: Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
> 
> On 12.10.20 11:48, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Juergen Gross
> >> Sent: 12 October 2020 10:28
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> >> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Jan Beulich <jbeulich@suse.com>;
> Julien
> >> Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>
> >> Subject: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
> >>
> >> The queue for a fifo event is depending on the vcpu_id and the
> >> priority of the event. When sending an event it might happen the
> >> event needs to change queues and the old queue needs to be kept for
> >> keeping the links between queue elements intact. For this purpose
> >> the event channel contains last_priority and last_vcpu_id values
> >> elements for being able to identify the old queue.
> >>
> >> In order to avoid races always access last_priority and last_vcpu_id
> >> with a single atomic operation avoiding any inconsistencies.
> >>
> >> Signed-off-by: Juergen Gross <jgross@suse.com>
> >> ---
> >>   xen/common/event_fifo.c | 25 +++++++++++++++++++------
> >>   xen/include/xen/sched.h |  3 +--
> >>   2 files changed, 20 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> >> index fc189152e1..fffbd409c8 100644
> >> --- a/xen/common/event_fifo.c
> >> +++ b/xen/common/event_fifo.c
> >> @@ -42,6 +42,14 @@ struct evtchn_fifo_domain {
> >>       unsigned int num_evtchns;
> >>   };
> >>
> >> +union evtchn_fifo_lastq {
> >> +    u32 raw;
> >> +    struct {
> >> +        u8 last_priority;
> >> +        u16 last_vcpu_id;
> >> +    };
> >> +};
> >
> > I guess you want to s/u32/uint32_t, etc. above.
> 
> Hmm, yes, probably.
> 
> >
> >> +
> >>   static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d,
> >>                                                          unsigned int port)
> >>   {
> >> @@ -86,16 +94,18 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
> >>       struct vcpu *v;
> >>       struct evtchn_fifo_queue *q, *old_q;
> >>       unsigned int try;
> >> +    union evtchn_fifo_lastq lastq;
> >>
> >>       for ( try = 0; try < 3; try++ )
> >>       {
> >> -        v = d->vcpu[evtchn->last_vcpu_id];
> >> -        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
> >> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> >> +        v = d->vcpu[lastq.last_vcpu_id];
> >> +        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
> >>
> >>           spin_lock_irqsave(&old_q->lock, *flags);
> >>
> >> -        v = d->vcpu[evtchn->last_vcpu_id];
> >> -        q = &v->evtchn_fifo->queue[evtchn->last_priority];
> >> +        v = d->vcpu[lastq.last_vcpu_id];
> >> +        q = &v->evtchn_fifo->queue[lastq.last_priority];
> >>
> >>           if ( old_q == q )
> >>               return old_q;
> >> @@ -246,8 +256,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
> >>           /* Moved to a different queue? */
> >>           if ( old_q != q )
> >>           {
> >> -            evtchn->last_vcpu_id = v->vcpu_id;
> >> -            evtchn->last_priority = q->priority;
> >> +            union evtchn_fifo_lastq lastq;
> >> +
> >> +            lastq.last_vcpu_id = v->vcpu_id;
> >> +            lastq.last_priority = q->priority;
> >> +            write_atomic(&evtchn->fifo_lastq, lastq.raw);
> >>
> >
> > You're going to leak some stack here I think. Perhaps add a 'pad' field between 'last_priority' and
> 'last_vcpu_id' and zero it?
> 
> I can do that, but why? This is nothing a guest is supposed to see at
> any time.

True, but it would also be nice if the value of 'raw' was at least predictable. I guest just adding '= {}' to the declaration would actually be easiest.

  Paul

> 
> 
> Juergen



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-12  9:27 ` [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together Juergen Gross
  2020-10-12  9:48   ` Paul Durrant
@ 2020-10-13 13:58   ` Jan Beulich
  2020-10-13 14:20     ` Jürgen Groß
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-13 13:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu

On 12.10.2020 11:27, Juergen Gross wrote:
> The queue for a fifo event is depending on the vcpu_id and the
> priority of the event. When sending an event it might happen the
> event needs to change queues and the old queue needs to be kept for
> keeping the links between queue elements intact. For this purpose
> the event channel contains last_priority and last_vcpu_id values
> elements for being able to identify the old queue.
> 
> In order to avoid races always access last_priority and last_vcpu_id
> with a single atomic operation avoiding any inconsistencies.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

I seem to vaguely recall that at the time this seemingly racy
access was done on purpose by David. Did you go look at the
old commits to understand whether there really is a race which
can't be tolerated within the spec?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -114,8 +114,7 @@ struct evtchn
>          u16 virq;      /* state == ECS_VIRQ */
>      } u;
>      u8 priority;
> -    u8 last_priority;
> -    u16 last_vcpu_id;
> +    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */

This grows struct evtchn's size on at least 32-bit Arm. I'd
like to suggest including "priority" in the union, and call the
new field simply "fifo" or some such.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-12  9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
@ 2020-10-13 14:02   ` Jan Beulich
  2020-10-13 14:13     ` Jürgen Groß
  2020-10-13 15:28   ` Jan Beulich
  2020-10-16  9:51   ` Julien Grall
  2 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-13 14:02 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 12.10.2020 11:27, Juergen Gross wrote:
> Currently the lock for a single event channel needs to be taken with
> interrupts off, which causes deadlocks in some cases.
> 
> Rework the per event channel lock to be non-blocking for the case of
> sending an event and removing the need for disabling interrupts for
> taking the lock.
> 
> The lock is needed for avoiding races between sending an event or
> querying the channel's state against removal of the event channel.
> 
> Use a locking scheme similar to a rwlock, but with some modifications:
> 
> - sending an event or querying the event channel's state uses an
>   operation similar to read_trylock(), in case of not obtaining the
>   lock the sending is omitted or a default state is returned

And how come omitting the send or returning default state is valid?

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-13 14:02   ` Jan Beulich
@ 2020-10-13 14:13     ` Jürgen Groß
  2020-10-13 15:30       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Jürgen Groß @ 2020-10-13 14:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 13.10.20 16:02, Jan Beulich wrote:
> On 12.10.2020 11:27, Juergen Gross wrote:
>> Currently the lock for a single event channel needs to be taken with
>> interrupts off, which causes deadlocks in some cases.
>>
>> Rework the per event channel lock to be non-blocking for the case of
>> sending an event and removing the need for disabling interrupts for
>> taking the lock.
>>
>> The lock is needed for avoiding races between sending an event or
>> querying the channel's state against removal of the event channel.
>>
>> Use a locking scheme similar to a rwlock, but with some modifications:
>>
>> - sending an event or querying the event channel's state uses an
>>    operation similar to read_trylock(), in case of not obtaining the
>>    lock the sending is omitted or a default state is returned
> 
> And how come omitting the send or returning default state is valid?

This is explained in the part of the commit message you didn't cite:

With this locking scheme it is mandatory that a writer will always
either start with an unbound or free event channel or will end with
an unbound or free event channel, as otherwise the reaction of a reader
not getting the lock would be wrong.


Juergen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-13 13:58   ` Jan Beulich
@ 2020-10-13 14:20     ` Jürgen Groß
  2020-10-13 14:26       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Jürgen Groß @ 2020-10-13 14:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu

On 13.10.20 15:58, Jan Beulich wrote:
> On 12.10.2020 11:27, Juergen Gross wrote:
>> The queue for a fifo event is depending on the vcpu_id and the
>> priority of the event. When sending an event it might happen the
>> event needs to change queues and the old queue needs to be kept for
>> keeping the links between queue elements intact. For this purpose
>> the event channel contains last_priority and last_vcpu_id values
>> elements for being able to identify the old queue.
>>
>> In order to avoid races always access last_priority and last_vcpu_id
>> with a single atomic operation avoiding any inconsistencies.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> I seem to vaguely recall that at the time this seemingly racy
> access was done on purpose by David. Did you go look at the
> old commits to understand whether there really is a race which
> can't be tolerated within the spec?

At least the comments in the code tell us that the race regarding
the writing of priority (not last_priority) is acceptable.

Especially Julien was rather worried by the current situation. In
case you can convince him the current handling is fine, we can
easily drop this patch.

> 
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -114,8 +114,7 @@ struct evtchn
>>           u16 virq;      /* state == ECS_VIRQ */
>>       } u;
>>       u8 priority;
>> -    u8 last_priority;
>> -    u16 last_vcpu_id;
>> +    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
> 
> This grows struct evtchn's size on at least 32-bit Arm. I'd
> like to suggest including "priority" in the union, and call the
> new field simply "fifo" or some such.

This will add quite some complexity as suddenly all writes to the
union will need to be made through a cmpxchg() scheme.

Regarding the growth: struct evtchn is aligned to 64 bytes. So there
is no growth at all, as the size will not be larger than those 64
bytes.


Juergen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-13 14:20     ` Jürgen Groß
@ 2020-10-13 14:26       ` Jan Beulich
  2020-10-14 11:40         ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-13 14:26 UTC (permalink / raw)
  To: Jürgen Groß, Julien Grall
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

On 13.10.2020 16:20, Jürgen Groß wrote:
> On 13.10.20 15:58, Jan Beulich wrote:
>> On 12.10.2020 11:27, Juergen Gross wrote:
>>> The queue for a fifo event is depending on the vcpu_id and the
>>> priority of the event. When sending an event it might happen the
>>> event needs to change queues and the old queue needs to be kept for
>>> keeping the links between queue elements intact. For this purpose
>>> the event channel contains last_priority and last_vcpu_id values
>>> elements for being able to identify the old queue.
>>>
>>> In order to avoid races always access last_priority and last_vcpu_id
>>> with a single atomic operation avoiding any inconsistencies.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> I seem to vaguely recall that at the time this seemingly racy
>> access was done on purpose by David. Did you go look at the
>> old commits to understand whether there really is a race which
>> can't be tolerated within the spec?
> 
> At least the comments in the code tell us that the race regarding
> the writing of priority (not last_priority) is acceptable.

Ah, then it was comments. I knew I read something to this effect
somewhere, recently.

> Especially Julien was rather worried by the current situation. In
> case you can convince him the current handling is fine, we can
> easily drop this patch.

Julien, in the light of the above - can you clarify the specific
concerns you (still) have?

>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -114,8 +114,7 @@ struct evtchn
>>>           u16 virq;      /* state == ECS_VIRQ */
>>>       } u;
>>>       u8 priority;
>>> -    u8 last_priority;
>>> -    u16 last_vcpu_id;
>>> +    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
>>
>> This grows struct evtchn's size on at least 32-bit Arm. I'd
>> like to suggest including "priority" in the union, and call the
>> new field simply "fifo" or some such.
> 
> This will add quite some complexity as suddenly all writes to the
> union will need to be made through a cmpxchg() scheme.
> 
> Regarding the growth: struct evtchn is aligned to 64 bytes. So there
> is no growth at all, as the size will not be larger than those 64
> bytes.

Oh, I didn't spot this attribute, which I consider at least
suspicious. Without XSM I'm getting the impression that on 32-bit
Arm the structure's size would be 32 bytes or less without it, so
it looks as if it shouldn't be there unconditionally.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-12  9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
  2020-10-13 14:02   ` Jan Beulich
@ 2020-10-13 15:28   ` Jan Beulich
  2020-10-14  6:00     ` Jürgen Groß
  2020-10-16  9:51   ` Julien Grall
  2 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-13 15:28 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 12.10.2020 11:27, Juergen Gross wrote:
> @@ -798,9 +786,11 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
>  
>      d = v->domain;
>      chn = evtchn_from_port(d, port);
> -    spin_lock(&chn->lock);
> -    evtchn_port_set_pending(d, v->vcpu_id, chn);
> -    spin_unlock(&chn->lock);
> +    if ( evtchn_tryread_lock(chn) )
> +    {
> +        evtchn_port_set_pending(d, v->vcpu_id, chn);
> +        evtchn_read_unlock(chn);
> +    }
>  
>   out:
>      spin_unlock_irqrestore(&v->virq_lock, flags);
> @@ -829,9 +819,11 @@ void send_guest_global_virq(struct domain *d, uint32_t virq)
>          goto out;
>  
>      chn = evtchn_from_port(d, port);
> -    spin_lock(&chn->lock);
> -    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
> -    spin_unlock(&chn->lock);
> +    if ( evtchn_tryread_lock(chn) )
> +    {
> +        evtchn_port_set_pending(d, v->vcpu_id, chn);

Is this simply a copy-and-paste mistake (re-using the code from
send_guest_vcpu_virq()), or is there a reason you switch from
where to obtain the vCPU to send to (in which case this ought
to be mentioned in the description, and in which case you could
use literal zero)?

> --- a/xen/include/xen/event.h
> +++ b/xen/include/xen/event.h
> @@ -105,6 +105,45 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
>  #define bucket_from_port(d, p) \
>      ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
>  
> +#define EVENT_WRITE_LOCK_INC    MAX_VIRT_CPUS

Isn't the ceiling on simultaneous readers the number of pCPU-s,
and the value here then needs to be NR_CPUS + 1 to accommodate
the maximum number of readers? Furthermore, with you dropping
the disabling of interrupts, one pCPU can acquire a read lock
now more than once, when interrupting a locked region.

> +static inline void evtchn_write_lock(struct evtchn *evtchn)
> +{
> +    int val;
> +
> +    /* No barrier needed, atomic_add_return() is full barrier. */
> +    for ( val = atomic_add_return(EVENT_WRITE_LOCK_INC, &evtchn->lock);
> +          val != EVENT_WRITE_LOCK_INC;
> +          val = atomic_read(&evtchn->lock) )
> +        cpu_relax();
> +}
> +
> +static inline void evtchn_write_unlock(struct evtchn *evtchn)
> +{
> +    arch_lock_release_barrier();
> +
> +    atomic_sub(EVENT_WRITE_LOCK_INC, &evtchn->lock);
> +}
> +
> +static inline bool evtchn_tryread_lock(struct evtchn *evtchn)

The corresponding "generic" function is read_trylock() - I'd
suggest to use the same base name, with the evtchn_ prefix.

> @@ -274,12 +312,12 @@ static inline int evtchn_port_poll(struct domain *d, evtchn_port_t port)
>      if ( port_is_valid(d, port) )
>      {
>          struct evtchn *evtchn = evtchn_from_port(d, port);
> -        unsigned long flags;
>  
> -        spin_lock_irqsave(&evtchn->lock, flags);
> -        if ( evtchn_usable(evtchn) )
> +        if ( evtchn_tryread_lock(evtchn) && evtchn_usable(evtchn) )
> +        {
>              rc = evtchn_is_pending(d, evtchn);
> -        spin_unlock_irqrestore(&evtchn->lock, flags);
> +            evtchn_read_unlock(evtchn);
> +        }
>      }

This needs to be two nested if()-s, as you need to drop the lock
even when evtchn_usable() returns false.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-13 14:13     ` Jürgen Groß
@ 2020-10-13 15:30       ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2020-10-13 15:30 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 13.10.2020 16:13, Jürgen Groß wrote:
> On 13.10.20 16:02, Jan Beulich wrote:
>> On 12.10.2020 11:27, Juergen Gross wrote:
>>> Currently the lock for a single event channel needs to be taken with
>>> interrupts off, which causes deadlocks in some cases.
>>>
>>> Rework the per event channel lock to be non-blocking for the case of
>>> sending an event and removing the need for disabling interrupts for
>>> taking the lock.
>>>
>>> The lock is needed for avoiding races between sending an event or
>>> querying the channel's state against removal of the event channel.
>>>
>>> Use a locking scheme similar to a rwlock, but with some modifications:
>>>
>>> - sending an event or querying the event channel's state uses an
>>>    operation similar to read_trylock(), in case of not obtaining the
>>>    lock the sending is omitted or a default state is returned
>>
>> And how come omitting the send or returning default state is valid?
> 
> This is explained in the part of the commit message you didn't cite:
> 
> With this locking scheme it is mandatory that a writer will always
> either start with an unbound or free event channel or will end with
> an unbound or free event channel, as otherwise the reaction of a reader
> not getting the lock would be wrong.

Oh, I did read this latter part as something extra to be aware of,
not as this being the correctness guarantee. Could you make the
connection more clear?

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-13 15:28   ` Jan Beulich
@ 2020-10-14  6:00     ` Jürgen Groß
  2020-10-14  6:52       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Jürgen Groß @ 2020-10-14  6:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 13.10.20 17:28, Jan Beulich wrote:
> On 12.10.2020 11:27, Juergen Gross wrote:
>> @@ -798,9 +786,11 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
>>   
>>       d = v->domain;
>>       chn = evtchn_from_port(d, port);
>> -    spin_lock(&chn->lock);
>> -    evtchn_port_set_pending(d, v->vcpu_id, chn);
>> -    spin_unlock(&chn->lock);
>> +    if ( evtchn_tryread_lock(chn) )
>> +    {
>> +        evtchn_port_set_pending(d, v->vcpu_id, chn);
>> +        evtchn_read_unlock(chn);
>> +    }
>>   
>>    out:
>>       spin_unlock_irqrestore(&v->virq_lock, flags);
>> @@ -829,9 +819,11 @@ void send_guest_global_virq(struct domain *d, uint32_t virq)
>>           goto out;
>>   
>>       chn = evtchn_from_port(d, port);
>> -    spin_lock(&chn->lock);
>> -    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
>> -    spin_unlock(&chn->lock);
>> +    if ( evtchn_tryread_lock(chn) )
>> +    {
>> +        evtchn_port_set_pending(d, v->vcpu_id, chn);
> 
> Is this simply a copy-and-paste mistake (re-using the code from
> send_guest_vcpu_virq()), or is there a reason you switch from
> where to obtain the vCPU to send to (in which case this ought
> to be mentioned in the description, and in which case you could
> use literal zero)?

Thanks for spotting! Its a copy-and-paste mistake.

> 
>> --- a/xen/include/xen/event.h
>> +++ b/xen/include/xen/event.h
>> @@ -105,6 +105,45 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
>>   #define bucket_from_port(d, p) \
>>       ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
>>   
>> +#define EVENT_WRITE_LOCK_INC    MAX_VIRT_CPUS
> 
> Isn't the ceiling on simultaneous readers the number of pCPU-s,
> and the value here then needs to be NR_CPUS + 1 to accommodate
> the maximum number of readers? Furthermore, with you dropping
> the disabling of interrupts, one pCPU can acquire a read lock
> now more than once, when interrupting a locked region.

Yes, I think you are right.

So at least 2 * (NR-CPUS + 1), or even 3 * (NR_CPUS + 1) for covering
NMIs, too?

> 
>> +static inline void evtchn_write_lock(struct evtchn *evtchn)
>> +{
>> +    int val;
>> +
>> +    /* No barrier needed, atomic_add_return() is full barrier. */
>> +    for ( val = atomic_add_return(EVENT_WRITE_LOCK_INC, &evtchn->lock);
>> +          val != EVENT_WRITE_LOCK_INC;
>> +          val = atomic_read(&evtchn->lock) )
>> +        cpu_relax();
>> +}
>> +
>> +static inline void evtchn_write_unlock(struct evtchn *evtchn)
>> +{
>> +    arch_lock_release_barrier();
>> +
>> +    atomic_sub(EVENT_WRITE_LOCK_INC, &evtchn->lock);
>> +}
>> +
>> +static inline bool evtchn_tryread_lock(struct evtchn *evtchn)
> 
> The corresponding "generic" function is read_trylock() - I'd
> suggest to use the same base name, with the evtchn_ prefix.

Okay.

> 
>> @@ -274,12 +312,12 @@ static inline int evtchn_port_poll(struct domain *d, evtchn_port_t port)
>>       if ( port_is_valid(d, port) )
>>       {
>>           struct evtchn *evtchn = evtchn_from_port(d, port);
>> -        unsigned long flags;
>>   
>> -        spin_lock_irqsave(&evtchn->lock, flags);
>> -        if ( evtchn_usable(evtchn) )
>> +        if ( evtchn_tryread_lock(evtchn) && evtchn_usable(evtchn) )
>> +        {
>>               rc = evtchn_is_pending(d, evtchn);
>> -        spin_unlock_irqrestore(&evtchn->lock, flags);
>> +            evtchn_read_unlock(evtchn);
>> +        }
>>       }
> 
> This needs to be two nested if()-s, as you need to drop the lock
> even when evtchn_usable() returns false.

Oh, yes.


Juergen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-14  6:00     ` Jürgen Groß
@ 2020-10-14  6:52       ` Jan Beulich
  2020-10-14  7:27         ` Jürgen Groß
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-14  6:52 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 14.10.2020 08:00, Jürgen Groß wrote:
> On 13.10.20 17:28, Jan Beulich wrote:
>> On 12.10.2020 11:27, Juergen Gross wrote:
>>> --- a/xen/include/xen/event.h
>>> +++ b/xen/include/xen/event.h
>>> @@ -105,6 +105,45 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
>>>   #define bucket_from_port(d, p) \
>>>       ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
>>>   
>>> +#define EVENT_WRITE_LOCK_INC    MAX_VIRT_CPUS
>>
>> Isn't the ceiling on simultaneous readers the number of pCPU-s,
>> and the value here then needs to be NR_CPUS + 1 to accommodate
>> the maximum number of readers? Furthermore, with you dropping
>> the disabling of interrupts, one pCPU can acquire a read lock
>> now more than once, when interrupting a locked region.
> 
> Yes, I think you are right.
> 
> So at least 2 * (NR-CPUS + 1), or even 3 * (NR_CPUS + 1) for covering
> NMIs, too?

Hard to say: Even interrupts can in principle nest. I'd go further
and use e.g. INT_MAX / 4, albeit no matter what value we choose
there'll remain a theoretical risk. I'm therefore not fully
convinced of the concept, irrespective of it providing an elegant
solution to the problem at hand. I'd be curious what others think.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-14  6:52       ` Jan Beulich
@ 2020-10-14  7:27         ` Jürgen Groß
  0 siblings, 0 replies; 26+ messages in thread
From: Jürgen Groß @ 2020-10-14  7:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

On 14.10.20 08:52, Jan Beulich wrote:
> On 14.10.2020 08:00, Jürgen Groß wrote:
>> On 13.10.20 17:28, Jan Beulich wrote:
>>> On 12.10.2020 11:27, Juergen Gross wrote:
>>>> --- a/xen/include/xen/event.h
>>>> +++ b/xen/include/xen/event.h
>>>> @@ -105,6 +105,45 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
>>>>    #define bucket_from_port(d, p) \
>>>>        ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
>>>>    
>>>> +#define EVENT_WRITE_LOCK_INC    MAX_VIRT_CPUS
>>>
>>> Isn't the ceiling on simultaneous readers the number of pCPU-s,
>>> and the value here then needs to be NR_CPUS + 1 to accommodate
>>> the maximum number of readers? Furthermore, with you dropping
>>> the disabling of interrupts, one pCPU can acquire a read lock
>>> now more than once, when interrupting a locked region.
>>
>> Yes, I think you are right.
>>
>> So at least 2 * (NR-CPUS + 1), or even 3 * (NR_CPUS + 1) for covering
>> NMIs, too?
> 
> Hard to say: Even interrupts can in principle nest. I'd go further
> and use e.g. INT_MAX / 4, albeit no matter what value we choose
> there'll remain a theoretical risk. I'm therefore not fully
> convinced of the concept, irrespective of it providing an elegant
> solution to the problem at hand. I'd be curious what others think.

I just realized I should add a sanity test in evtchn_write_lock() to
exclude the case of multiple writers (this should never happen due to
all writers locking d->event_lock).

This in turn means we can set EVENT_WRITE_LOCK_INC to INT_MIN and use
negative lock values for a write-locked event channel.

Hitting this limit seems to require quite high values of NR_CPUS, even
with interrupts nesting (I'm quite sure we'll run out of stack space
way before this limit can be hit even with 16 million cpus).


Juergen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-13 14:26       ` Jan Beulich
@ 2020-10-14 11:40         ` Julien Grall
  2020-10-15 12:07           ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-10-14 11:40 UTC (permalink / raw)
  To: Jan Beulich, Jürgen Groß
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

Hi Jan,

On 13/10/2020 15:26, Jan Beulich wrote:
> On 13.10.2020 16:20, Jürgen Groß wrote:
>> On 13.10.20 15:58, Jan Beulich wrote:
>>> On 12.10.2020 11:27, Juergen Gross wrote:
>>>> The queue for a fifo event is depending on the vcpu_id and the
>>>> priority of the event. When sending an event it might happen the
>>>> event needs to change queues and the old queue needs to be kept for
>>>> keeping the links between queue elements intact. For this purpose
>>>> the event channel contains last_priority and last_vcpu_id values
>>>> elements for being able to identify the old queue.
>>>>
>>>> In order to avoid races always access last_priority and last_vcpu_id
>>>> with a single atomic operation avoiding any inconsistencies.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> I seem to vaguely recall that at the time this seemingly racy
>>> access was done on purpose by David. Did you go look at the
>>> old commits to understand whether there really is a race which
>>> can't be tolerated within the spec?
>>
>> At least the comments in the code tell us that the race regarding
>> the writing of priority (not last_priority) is acceptable.
> 
> Ah, then it was comments. I knew I read something to this effect
> somewhere, recently.
> 
>> Especially Julien was rather worried by the current situation. In
>> case you can convince him the current handling is fine, we can
>> easily drop this patch.
> 
> Julien, in the light of the above - can you clarify the specific
> concerns you (still) have?

Let me start with that the assumption if evtchn->lock is not held when 
evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.

 From my understanding, the goal of lock_old_queue() is to return the 
old queue used.

last_priority and last_vcpu_id may be updated separately and I could not 
convince myself that it would not be possible to return a queue that is 
neither the current one nor the old one.

The following could happen if evtchn->priority and 
evtchn->notify_vcpu_id keeps changing between calls.

pCPU0				| pCPU1
				|
evtchn_fifo_set_pending(v0,...)	|
				| evtchn_fifo_set_pending(v1, ...)
  [...]				|
  /* Queue has changed */	|
  evtchn->last_vcpu_id = v0 	|
				| -> evtchn_old_queue()
				| v = d->vcpu[evtchn->last_vcpu_id];
   				| old_q = ...
				| spin_lock(old_q->...)
				| v = ...
				| q = ...
				| /* q and old_q would be the same */
				|
  evtchn->las_priority = priority|

If my diagram is correct, then pCPU1 would return a queue that is 
neither the current nor old one.

In which case, I think it would at least be possible to corrupt the 
queue. From evtchn_fifo_set_pending():

         /*
          * If this event was a tail, the old queue is now empty and
          * its tail must be invalidated to prevent adding an event to
          * the old queue from corrupting the new queue.
          */
         if ( old_q->tail == port )
             old_q->tail = 0;

Did I miss anything?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-14 11:40         ` Julien Grall
@ 2020-10-15 12:07           ` Jan Beulich
  2020-10-16  5:46             ` Jürgen Groß
  2020-10-16  9:36             ` Julien Grall
  0 siblings, 2 replies; 26+ messages in thread
From: Jan Beulich @ 2020-10-15 12:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

On 14.10.2020 13:40, Julien Grall wrote:
> Hi Jan,
> 
> On 13/10/2020 15:26, Jan Beulich wrote:
>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>> On 13.10.20 15:58, Jan Beulich wrote:
>>>> On 12.10.2020 11:27, Juergen Gross wrote:
>>>>> The queue for a fifo event is depending on the vcpu_id and the
>>>>> priority of the event. When sending an event it might happen the
>>>>> event needs to change queues and the old queue needs to be kept for
>>>>> keeping the links between queue elements intact. For this purpose
>>>>> the event channel contains last_priority and last_vcpu_id values
>>>>> elements for being able to identify the old queue.
>>>>>
>>>>> In order to avoid races always access last_priority and last_vcpu_id
>>>>> with a single atomic operation avoiding any inconsistencies.
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> I seem to vaguely recall that at the time this seemingly racy
>>>> access was done on purpose by David. Did you go look at the
>>>> old commits to understand whether there really is a race which
>>>> can't be tolerated within the spec?
>>>
>>> At least the comments in the code tell us that the race regarding
>>> the writing of priority (not last_priority) is acceptable.
>>
>> Ah, then it was comments. I knew I read something to this effect
>> somewhere, recently.
>>
>>> Especially Julien was rather worried by the current situation. In
>>> case you can convince him the current handling is fine, we can
>>> easily drop this patch.
>>
>> Julien, in the light of the above - can you clarify the specific
>> concerns you (still) have?
> 
> Let me start with that the assumption if evtchn->lock is not held when 
> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.

But this isn't interesting - we know there are paths where it is
held, and ones (interdomain sending) where it's the remote port's
lock instead which is held. What's important here is that a
_consistent_ lock be held (but it doesn't need to be evtchn's).

>  From my understanding, the goal of lock_old_queue() is to return the 
> old queue used.
> 
> last_priority and last_vcpu_id may be updated separately and I could not 
> convince myself that it would not be possible to return a queue that is 
> neither the current one nor the old one.
> 
> The following could happen if evtchn->priority and 
> evtchn->notify_vcpu_id keeps changing between calls.
> 
> pCPU0				| pCPU1
> 				|
> evtchn_fifo_set_pending(v0,...)	|
> 				| evtchn_fifo_set_pending(v1, ...)
>   [...]				|
>   /* Queue has changed */	|
>   evtchn->last_vcpu_id = v0 	|
> 				| -> evtchn_old_queue()
> 				| v = d->vcpu[evtchn->last_vcpu_id];
>    				| old_q = ...
> 				| spin_lock(old_q->...)
> 				| v = ...
> 				| q = ...
> 				| /* q and old_q would be the same */
> 				|
>   evtchn->las_priority = priority|
> 
> If my diagram is correct, then pCPU1 would return a queue that is 
> neither the current nor old one.

I think I agree.

> In which case, I think it would at least be possible to corrupt the 
> queue. From evtchn_fifo_set_pending():
> 
>          /*
>           * If this event was a tail, the old queue is now empty and
>           * its tail must be invalidated to prevent adding an event to
>           * the old queue from corrupting the new queue.
>           */
>          if ( old_q->tail == port )
>              old_q->tail = 0;
> 
> Did I miss anything?

I don't think you did. The important point though is that a consistent
lock is being held whenever we come here, so two racing set_pending()
aren't possible for one and the same evtchn. As a result I don't think
the patch here is actually needed.

If I take this further, then I think I can reason why it wasn't
necessary to add further locking to send_guest_{global,vcpu}_virq():
The virq_lock is the "consistent lock" protecting ECS_VIRQ ports. The
spin_barrier() while closing the port guards that side against the
port changing to a different ECS_* behind the sending functions' backs.
And binding such ports sets ->virq_to_evtchn[] last, with a suitable
barrier (the unlock).

Which leaves send_guest_pirq() before we can drop the IRQ-safe locking
again. I guess we would need to work towards using the underlying
irq_desc's lock as consistent lock here, but this certainly isn't the
case just yet, and I'm not really certain this can be achieved.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-15 12:07           ` Jan Beulich
@ 2020-10-16  5:46             ` Jürgen Groß
  2020-10-16  9:36             ` Julien Grall
  1 sibling, 0 replies; 26+ messages in thread
From: Jürgen Groß @ 2020-10-16  5:46 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

On 15.10.20 14:07, Jan Beulich wrote:
> On 14.10.2020 13:40, Julien Grall wrote:
>> Hi Jan,
>>
>> On 13/10/2020 15:26, Jan Beulich wrote:
>>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>>> On 13.10.20 15:58, Jan Beulich wrote:
>>>>> On 12.10.2020 11:27, Juergen Gross wrote:
>>>>>> The queue for a fifo event is depending on the vcpu_id and the
>>>>>> priority of the event. When sending an event it might happen the
>>>>>> event needs to change queues and the old queue needs to be kept for
>>>>>> keeping the links between queue elements intact. For this purpose
>>>>>> the event channel contains last_priority and last_vcpu_id values
>>>>>> elements for being able to identify the old queue.
>>>>>>
>>>>>> In order to avoid races always access last_priority and last_vcpu_id
>>>>>> with a single atomic operation avoiding any inconsistencies.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>
>>>>> I seem to vaguely recall that at the time this seemingly racy
>>>>> access was done on purpose by David. Did you go look at the
>>>>> old commits to understand whether there really is a race which
>>>>> can't be tolerated within the spec?
>>>>
>>>> At least the comments in the code tell us that the race regarding
>>>> the writing of priority (not last_priority) is acceptable.
>>>
>>> Ah, then it was comments. I knew I read something to this effect
>>> somewhere, recently.
>>>
>>>> Especially Julien was rather worried by the current situation. In
>>>> case you can convince him the current handling is fine, we can
>>>> easily drop this patch.
>>>
>>> Julien, in the light of the above - can you clarify the specific
>>> concerns you (still) have?
>>
>> Let me start with that the assumption if evtchn->lock is not held when
>> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.
> 
> But this isn't interesting - we know there are paths where it is
> held, and ones (interdomain sending) where it's the remote port's
> lock instead which is held. What's important here is that a
> _consistent_ lock be held (but it doesn't need to be evtchn's).
> 
>>   From my understanding, the goal of lock_old_queue() is to return the
>> old queue used.
>>
>> last_priority and last_vcpu_id may be updated separately and I could not
>> convince myself that it would not be possible to return a queue that is
>> neither the current one nor the old one.
>>
>> The following could happen if evtchn->priority and
>> evtchn->notify_vcpu_id keeps changing between calls.
>>
>> pCPU0				| pCPU1
>> 				|
>> evtchn_fifo_set_pending(v0,...)	|
>> 				| evtchn_fifo_set_pending(v1, ...)
>>    [...]				|
>>    /* Queue has changed */	|
>>    evtchn->last_vcpu_id = v0 	|
>> 				| -> evtchn_old_queue()
>> 				| v = d->vcpu[evtchn->last_vcpu_id];
>>     				| old_q = ...
>> 				| spin_lock(old_q->...)
>> 				| v = ...
>> 				| q = ...
>> 				| /* q and old_q would be the same */
>> 				|
>>    evtchn->las_priority = priority|
>>
>> If my diagram is correct, then pCPU1 would return a queue that is
>> neither the current nor old one.
> 
> I think I agree.
> 
>> In which case, I think it would at least be possible to corrupt the
>> queue. From evtchn_fifo_set_pending():
>>
>>           /*
>>            * If this event was a tail, the old queue is now empty and
>>            * its tail must be invalidated to prevent adding an event to
>>            * the old queue from corrupting the new queue.
>>            */
>>           if ( old_q->tail == port )
>>               old_q->tail = 0;
>>
>> Did I miss anything?
> 
> I don't think you did. The important point though is that a consistent
> lock is being held whenever we come here, so two racing set_pending()
> aren't possible for one and the same evtchn. As a result I don't think
> the patch here is actually needed.

Julien, do you agree?

Can i drop this patch?


Juergen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-15 12:07           ` Jan Beulich
  2020-10-16  5:46             ` Jürgen Groß
@ 2020-10-16  9:36             ` Julien Grall
  2020-10-16 12:09               ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-10-16  9:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu



On 15/10/2020 13:07, Jan Beulich wrote:
> On 14.10.2020 13:40, Julien Grall wrote:
>> Hi Jan,
>>
>> On 13/10/2020 15:26, Jan Beulich wrote:
>>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>>> On 13.10.20 15:58, Jan Beulich wrote:
>>>>> On 12.10.2020 11:27, Juergen Gross wrote:
>>>>>> The queue for a fifo event is depending on the vcpu_id and the
>>>>>> priority of the event. When sending an event it might happen the
>>>>>> event needs to change queues and the old queue needs to be kept for
>>>>>> keeping the links between queue elements intact. For this purpose
>>>>>> the event channel contains last_priority and last_vcpu_id values
>>>>>> elements for being able to identify the old queue.
>>>>>>
>>>>>> In order to avoid races always access last_priority and last_vcpu_id
>>>>>> with a single atomic operation avoiding any inconsistencies.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>
>>>>> I seem to vaguely recall that at the time this seemingly racy
>>>>> access was done on purpose by David. Did you go look at the
>>>>> old commits to understand whether there really is a race which
>>>>> can't be tolerated within the spec?
>>>>
>>>> At least the comments in the code tell us that the race regarding
>>>> the writing of priority (not last_priority) is acceptable.
>>>
>>> Ah, then it was comments. I knew I read something to this effect
>>> somewhere, recently.
>>>
>>>> Especially Julien was rather worried by the current situation. In
>>>> case you can convince him the current handling is fine, we can
>>>> easily drop this patch.
>>>
>>> Julien, in the light of the above - can you clarify the specific
>>> concerns you (still) have?
>>
>> Let me start with that the assumption if evtchn->lock is not held when
>> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.
> 
> But this isn't interesting - we know there are paths where it is
> held, and ones (interdomain sending) where it's the remote port's
> lock instead which is held. What's important here is that a
> _consistent_ lock be held (but it doesn't need to be evtchn's).

Yes, a _consistent_ lock *should* be sufficient. But it is better to use 
the same lock everywhere so it is easier to reason (see more below).

> 
>>   From my understanding, the goal of lock_old_queue() is to return the
>> old queue used.
>>
>> last_priority and last_vcpu_id may be updated separately and I could not
>> convince myself that it would not be possible to return a queue that is
>> neither the current one nor the old one.
>>
>> The following could happen if evtchn->priority and
>> evtchn->notify_vcpu_id keeps changing between calls.
>>
>> pCPU0				| pCPU1
>> 				|
>> evtchn_fifo_set_pending(v0,...)	|
>> 				| evtchn_fifo_set_pending(v1, ...)
>>    [...]				|
>>    /* Queue has changed */	|
>>    evtchn->last_vcpu_id = v0 	|
>> 				| -> evtchn_old_queue()
>> 				| v = d->vcpu[evtchn->last_vcpu_id];
>>     				| old_q = ...
>> 				| spin_lock(old_q->...)
>> 				| v = ...
>> 				| q = ...
>> 				| /* q and old_q would be the same */
>> 				|
>>    evtchn->las_priority = priority|
>>
>> If my diagram is correct, then pCPU1 would return a queue that is
>> neither the current nor old one.
> 
> I think I agree.
> 
>> In which case, I think it would at least be possible to corrupt the
>> queue. From evtchn_fifo_set_pending():
>>
>>           /*
>>            * If this event was a tail, the old queue is now empty and
>>            * its tail must be invalidated to prevent adding an event to
>>            * the old queue from corrupting the new queue.
>>            */
>>           if ( old_q->tail == port )
>>               old_q->tail = 0;
>>
>> Did I miss anything?
> 
> I don't think you did. The important point though is that a consistent
> lock is being held whenever we come here, so two racing set_pending()
> aren't possible for one and the same evtchn. As a result I don't think
> the patch here is actually needed.

I haven't yet read in full details the rest of the patches to say 
whether this is necessary or not. However, at a first glance, I think 
this is not a sane to rely on different lock to protect us. And don't 
get me started on the lack of documentation...

Furthermore, the implementation of old_lock_queue() suggests that the 
code was planned to be lockless. Why would you need the loop otherwise?

Therefore, regardless the rest of the discussion, I think this patch 
would be useful to have for our peace of mind.

> 
> If I take this further, then I think I can reason why it wasn't
> necessary to add further locking to send_guest_{global,vcpu}_virq():
> The virq_lock is the "consistent lock" protecting ECS_VIRQ ports. The
> spin_barrier() while closing the port guards that side against the
> port changing to a different ECS_* behind the sending functions' backs.
> And binding such ports sets ->virq_to_evtchn[] last, with a suitable
> barrier (the unlock).

This makes sense.

> 
> Which leaves send_guest_pirq() before we can drop the IRQ-safe locking
> again. I guess we would need to work towards using the underlying
> irq_desc's lock as consistent lock here, but this certainly isn't the
> case just yet, and I'm not really certain this can be achieved.
I can't comment on the PIRQ code but I think this is a risky approach 
(see more above).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] xen/evtchn: rework per event channel lock
  2020-10-12  9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
  2020-10-13 14:02   ` Jan Beulich
  2020-10-13 15:28   ` Jan Beulich
@ 2020-10-16  9:51   ` Julien Grall
  2 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2020-10-16  9:51 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Stefano Stabellini

Hi Juergen,

On 12/10/2020 10:27, Juergen Gross wrote:
> Currently the lock for a single event channel needs to be taken with
> interrupts off, which causes deadlocks in some cases.
> 
> Rework the per event channel lock to be non-blocking for the case of
> sending an event and removing the need for disabling interrupts for
> taking the lock.
> 
> The lock is needed for avoiding races between sending an event or
> querying the channel's state against removal of the event channel.
> 
> Use a locking scheme similar to a rwlock, but with some modifications:
> 
> - sending an event or querying the event channel's state uses an
>    operation similar to read_trylock(), in case of not obtaining the
>    lock the sending is omitted or a default state is returned
> 
> - closing an event channel is similar to write_lock(), but without
>    real fairness regarding multiple writers (this saves some space in
>    the event channel structure and multiple writers are impossible as
>    closing an event channel requires the domain's event_lock to be
>    held).
> 
> With this locking scheme it is mandatory that a writer will always
> either start with an unbound or free event channel or will end with
> an unbound or free event channel, as otherwise the reaction of a reader
> not getting the lock would be wrong.
> 
> Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
> Signed-off-by: Juergen Gross <jgross@suse.com>

The approach looks ok to me. I have a couple of remarks below.

[...]

> diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
> index 509d3ae861..39a93f7556 100644
> --- a/xen/include/xen/event.h
> +++ b/xen/include/xen/event.h
> @@ -105,6 +105,45 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
>   #define bucket_from_port(d, p) \
>       ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
>   
> +#define EVENT_WRITE_LOCK_INC    MAX_VIRT_CPUS
> +static inline void evtchn_write_lock(struct evtchn *evtchn)

I think it would be good to describe the locking expectation in-code.

> +{
> +    int val;
> +
> +    /* No barrier needed, atomic_add_return() is full barrier. */
> +    for ( val = atomic_add_return(EVENT_WRITE_LOCK_INC, &evtchn->lock);
> +          val != EVENT_WRITE_LOCK_INC;
> +          val = atomic_read(&evtchn->lock) )
> +        cpu_relax();
> +}
> +
> +static inline void evtchn_write_unlock(struct evtchn *evtchn)
> +{
> +    arch_lock_release_barrier();
> +
> +    atomic_sub(EVENT_WRITE_LOCK_INC, &evtchn->lock);
> +}
> +
> +static inline bool evtchn_tryread_lock(struct evtchn *evtchn)
> +{
> +    if ( atomic_read(&evtchn->lock) >= EVENT_WRITE_LOCK_INC )
> +        return false;
> +
> +    /* No barrier needed, atomic_inc_return() is full barrier. */
> +    if ( atomic_inc_return(&evtchn->lock) < EVENT_WRITE_LOCK_INC )
> +        return true;
> +
> +    atomic_dec(&evtchn->lock);

NIT: Can you add a newline here?

> +    return false;
> +}
> +

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-16  9:36             ` Julien Grall
@ 2020-10-16 12:09               ` Jan Beulich
  2020-10-20  9:25                 ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-16 12:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

On 16.10.2020 11:36, Julien Grall wrote:
> On 15/10/2020 13:07, Jan Beulich wrote:
>> On 14.10.2020 13:40, Julien Grall wrote:
>>> On 13/10/2020 15:26, Jan Beulich wrote:
>>>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>>>> Especially Julien was rather worried by the current situation. In
>>>>> case you can convince him the current handling is fine, we can
>>>>> easily drop this patch.
>>>>
>>>> Julien, in the light of the above - can you clarify the specific
>>>> concerns you (still) have?
>>>
>>> Let me start with that the assumption if evtchn->lock is not held when
>>> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.
>>
>> But this isn't interesting - we know there are paths where it is
>> held, and ones (interdomain sending) where it's the remote port's
>> lock instead which is held. What's important here is that a
>> _consistent_ lock be held (but it doesn't need to be evtchn's).
> 
> Yes, a _consistent_ lock *should* be sufficient. But it is better to use 
> the same lock everywhere so it is easier to reason (see more below).

But that's already not the case, due to the way interdomain channels
have events sent. You did suggest acquiring both locks, but as
indicated at the time I think this goes too far. As far as the doc
aspect - we can improve the situation. Iirc it was you who made me
add the respective comment ahead of struct evtchn_port_ops.

>>>   From my understanding, the goal of lock_old_queue() is to return the
>>> old queue used.
>>>
>>> last_priority and last_vcpu_id may be updated separately and I could not
>>> convince myself that it would not be possible to return a queue that is
>>> neither the current one nor the old one.
>>>
>>> The following could happen if evtchn->priority and
>>> evtchn->notify_vcpu_id keeps changing between calls.
>>>
>>> pCPU0				| pCPU1
>>> 				|
>>> evtchn_fifo_set_pending(v0,...)	|
>>> 				| evtchn_fifo_set_pending(v1, ...)
>>>    [...]				|
>>>    /* Queue has changed */	|
>>>    evtchn->last_vcpu_id = v0 	|
>>> 				| -> evtchn_old_queue()
>>> 				| v = d->vcpu[evtchn->last_vcpu_id];
>>>     				| old_q = ...
>>> 				| spin_lock(old_q->...)
>>> 				| v = ...
>>> 				| q = ...
>>> 				| /* q and old_q would be the same */
>>> 				|
>>>    evtchn->las_priority = priority|
>>>
>>> If my diagram is correct, then pCPU1 would return a queue that is
>>> neither the current nor old one.
>>
>> I think I agree.
>>
>>> In which case, I think it would at least be possible to corrupt the
>>> queue. From evtchn_fifo_set_pending():
>>>
>>>           /*
>>>            * If this event was a tail, the old queue is now empty and
>>>            * its tail must be invalidated to prevent adding an event to
>>>            * the old queue from corrupting the new queue.
>>>            */
>>>           if ( old_q->tail == port )
>>>               old_q->tail = 0;
>>>
>>> Did I miss anything?
>>
>> I don't think you did. The important point though is that a consistent
>> lock is being held whenever we come here, so two racing set_pending()
>> aren't possible for one and the same evtchn. As a result I don't think
>> the patch here is actually needed.
> 
> I haven't yet read in full details the rest of the patches to say 
> whether this is necessary or not. However, at a first glance, I think 
> this is not a sane to rely on different lock to protect us. And don't 
> get me started on the lack of documentation...
> 
> Furthermore, the implementation of old_lock_queue() suggests that the 
> code was planned to be lockless. Why would you need the loop otherwise?

The lock-less aspect of this affects multiple accesses to e.g.
the same queue, I think. I'm unconvinced it was really considered
whether racing sending on the same channel is also safe this way.

> Therefore, regardless the rest of the discussion, I think this patch 
> would be useful to have for our peace of mind.

That's a fair position to take. My counterargument is mainly
that readability (and hence maintainability) suffers with those
changes.

>> If I take this further, then I think I can reason why it wasn't
>> necessary to add further locking to send_guest_{global,vcpu}_virq():
>> The virq_lock is the "consistent lock" protecting ECS_VIRQ ports. The
>> spin_barrier() while closing the port guards that side against the
>> port changing to a different ECS_* behind the sending functions' backs.
>> And binding such ports sets ->virq_to_evtchn[] last, with a suitable
>> barrier (the unlock).
> 
> This makes sense.
> 
>>
>> Which leaves send_guest_pirq() before we can drop the IRQ-safe locking
>> again. I guess we would need to work towards using the underlying
>> irq_desc's lock as consistent lock here, but this certainly isn't the
>> case just yet, and I'm not really certain this can be achieved.
> I can't comment on the PIRQ code but I think this is a risky approach 
> (see more above).

It may be; one would only know how risky it is once it is being tried.
For the moment, with your apparent agreement above, I'll see whether I
can put together a relaxation patch for the vIRQ sending. Main
question is going to be whether in the process I wouldn't find a
reason why this isn't a safe thing to do.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-16 12:09               ` Jan Beulich
@ 2020-10-20  9:25                 ` Julien Grall
  2020-10-20  9:34                   ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-10-20  9:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

Hi Jan,

On 16/10/2020 13:09, Jan Beulich wrote:
> On 16.10.2020 11:36, Julien Grall wrote:
>> On 15/10/2020 13:07, Jan Beulich wrote:
>>> On 14.10.2020 13:40, Julien Grall wrote:
>>>> On 13/10/2020 15:26, Jan Beulich wrote:
>>>>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>>>>> Especially Julien was rather worried by the current situation. In
>>>>>> case you can convince him the current handling is fine, we can
>>>>>> easily drop this patch.
>>>>>
>>>>> Julien, in the light of the above - can you clarify the specific
>>>>> concerns you (still) have?
>>>>
>>>> Let me start with that the assumption if evtchn->lock is not held when
>>>> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.
>>>
>>> But this isn't interesting - we know there are paths where it is
>>> held, and ones (interdomain sending) where it's the remote port's
>>> lock instead which is held. What's important here is that a
>>> _consistent_ lock be held (but it doesn't need to be evtchn's).
>>
>> Yes, a _consistent_ lock *should* be sufficient. But it is better to use
>> the same lock everywhere so it is easier to reason (see more below).
> 
> But that's already not the case, due to the way interdomain channels
> have events sent. You did suggest acquiring both locks, but as
> indicated at the time I think this goes too far. As far as the doc
> aspect - we can improve the situation. Iirc it was you who made me
> add the respective comment ahead of struct evtchn_port_ops.
> 
>>>>    From my understanding, the goal of lock_old_queue() is to return the
>>>> old queue used.
>>>>
>>>> last_priority and last_vcpu_id may be updated separately and I could not
>>>> convince myself that it would not be possible to return a queue that is
>>>> neither the current one nor the old one.
>>>>
>>>> The following could happen if evtchn->priority and
>>>> evtchn->notify_vcpu_id keeps changing between calls.
>>>>
>>>> pCPU0				| pCPU1
>>>> 				|
>>>> evtchn_fifo_set_pending(v0,...)	|
>>>> 				| evtchn_fifo_set_pending(v1, ...)
>>>>     [...]				|
>>>>     /* Queue has changed */	|
>>>>     evtchn->last_vcpu_id = v0 	|
>>>> 				| -> evtchn_old_queue()
>>>> 				| v = d->vcpu[evtchn->last_vcpu_id];
>>>>      				| old_q = ...
>>>> 				| spin_lock(old_q->...)
>>>> 				| v = ...
>>>> 				| q = ...
>>>> 				| /* q and old_q would be the same */
>>>> 				|
>>>>     evtchn->las_priority = priority|
>>>>
>>>> If my diagram is correct, then pCPU1 would return a queue that is
>>>> neither the current nor old one.
>>>
>>> I think I agree.
>>>
>>>> In which case, I think it would at least be possible to corrupt the
>>>> queue. From evtchn_fifo_set_pending():
>>>>
>>>>            /*
>>>>             * If this event was a tail, the old queue is now empty and
>>>>             * its tail must be invalidated to prevent adding an event to
>>>>             * the old queue from corrupting the new queue.
>>>>             */
>>>>            if ( old_q->tail == port )
>>>>                old_q->tail = 0;
>>>>
>>>> Did I miss anything?
>>>
>>> I don't think you did. The important point though is that a consistent
>>> lock is being held whenever we come here, so two racing set_pending()
>>> aren't possible for one and the same evtchn. As a result I don't think
>>> the patch here is actually needed.
>>
>> I haven't yet read in full details the rest of the patches to say
>> whether this is necessary or not. However, at a first glance, I think
>> this is not a sane to rely on different lock to protect us. And don't
>> get me started on the lack of documentation...
>>
>> Furthermore, the implementation of old_lock_queue() suggests that the
>> code was planned to be lockless. Why would you need the loop otherwise?
> 
> The lock-less aspect of this affects multiple accesses to e.g.
> the same queue, I think.
I don't think we are talking about the same thing. What I was referring 
to is the following code:

static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
                                                 struct evtchn *evtchn,
                                                 unsigned long *flags)
{
     struct vcpu *v;
     struct evtchn_fifo_queue *q, *old_q;
     unsigned int try;

     for ( try = 0; try < 3; try++ )
     {
         v = d->vcpu[evtchn->last_vcpu_id];
         old_q = &v->evtchn_fifo->queue[evtchn->last_priority];

         spin_lock_irqsave(&old_q->lock, *flags);

         v = d->vcpu[evtchn->last_vcpu_id];
         q = &v->evtchn_fifo->queue[evtchn->last_priority];

         if ( old_q == q )
             return old_q;

         spin_unlock_irqrestore(&old_q->lock, *flags);
     }

     gprintk(XENLOG_WARNING,
             "dom%d port %d lost event (too many queue changes)\n",
             d->domain_id, evtchn->port);
     return NULL;
}

Given that evtchn->last_vcpu_id and evtchn->last_priority can only be 
modified in evtchn_fifo_set_pending(), this suggests that it is expected 
for the function to multiple called concurrently on the same event channel.

> I'm unconvinced it was really considered
> whether racing sending on the same channel is also safe this way.

How would you explain the 3 try in lock_old_queue then?

> 
>> Therefore, regardless the rest of the discussion, I think this patch
>> would be useful to have for our peace of mind.
> 
> That's a fair position to take. My counterargument is mainly
> that readability (and hence maintainability) suffers with those
> changes.

We surely have different opinion... I don't particularly care about the 
approach as long as it is *properly* documented.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-20  9:25                 ` Julien Grall
@ 2020-10-20  9:34                   ` Jan Beulich
  2020-10-20 10:01                     ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-10-20  9:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

On 20.10.2020 11:25, Julien Grall wrote:
> Hi Jan,
> 
> On 16/10/2020 13:09, Jan Beulich wrote:
>> On 16.10.2020 11:36, Julien Grall wrote:
>>> On 15/10/2020 13:07, Jan Beulich wrote:
>>>> On 14.10.2020 13:40, Julien Grall wrote:
>>>>> On 13/10/2020 15:26, Jan Beulich wrote:
>>>>>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>>>>>> Especially Julien was rather worried by the current situation. In
>>>>>>> case you can convince him the current handling is fine, we can
>>>>>>> easily drop this patch.
>>>>>>
>>>>>> Julien, in the light of the above - can you clarify the specific
>>>>>> concerns you (still) have?
>>>>>
>>>>> Let me start with that the assumption if evtchn->lock is not held when
>>>>> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.
>>>>
>>>> But this isn't interesting - we know there are paths where it is
>>>> held, and ones (interdomain sending) where it's the remote port's
>>>> lock instead which is held. What's important here is that a
>>>> _consistent_ lock be held (but it doesn't need to be evtchn's).
>>>
>>> Yes, a _consistent_ lock *should* be sufficient. But it is better to use
>>> the same lock everywhere so it is easier to reason (see more below).
>>
>> But that's already not the case, due to the way interdomain channels
>> have events sent. You did suggest acquiring both locks, but as
>> indicated at the time I think this goes too far. As far as the doc
>> aspect - we can improve the situation. Iirc it was you who made me
>> add the respective comment ahead of struct evtchn_port_ops.
>>
>>>>>    From my understanding, the goal of lock_old_queue() is to return the
>>>>> old queue used.
>>>>>
>>>>> last_priority and last_vcpu_id may be updated separately and I could not
>>>>> convince myself that it would not be possible to return a queue that is
>>>>> neither the current one nor the old one.
>>>>>
>>>>> The following could happen if evtchn->priority and
>>>>> evtchn->notify_vcpu_id keeps changing between calls.
>>>>>
>>>>> pCPU0				| pCPU1
>>>>> 				|
>>>>> evtchn_fifo_set_pending(v0,...)	|
>>>>> 				| evtchn_fifo_set_pending(v1, ...)
>>>>>     [...]				|
>>>>>     /* Queue has changed */	|
>>>>>     evtchn->last_vcpu_id = v0 	|
>>>>> 				| -> evtchn_old_queue()
>>>>> 				| v = d->vcpu[evtchn->last_vcpu_id];
>>>>>      				| old_q = ...
>>>>> 				| spin_lock(old_q->...)
>>>>> 				| v = ...
>>>>> 				| q = ...
>>>>> 				| /* q and old_q would be the same */
>>>>> 				|
>>>>>     evtchn->las_priority = priority|
>>>>>
>>>>> If my diagram is correct, then pCPU1 would return a queue that is
>>>>> neither the current nor old one.
>>>>
>>>> I think I agree.
>>>>
>>>>> In which case, I think it would at least be possible to corrupt the
>>>>> queue. From evtchn_fifo_set_pending():
>>>>>
>>>>>            /*
>>>>>             * If this event was a tail, the old queue is now empty and
>>>>>             * its tail must be invalidated to prevent adding an event to
>>>>>             * the old queue from corrupting the new queue.
>>>>>             */
>>>>>            if ( old_q->tail == port )
>>>>>                old_q->tail = 0;
>>>>>
>>>>> Did I miss anything?
>>>>
>>>> I don't think you did. The important point though is that a consistent
>>>> lock is being held whenever we come here, so two racing set_pending()
>>>> aren't possible for one and the same evtchn. As a result I don't think
>>>> the patch here is actually needed.
>>>
>>> I haven't yet read in full details the rest of the patches to say
>>> whether this is necessary or not. However, at a first glance, I think
>>> this is not a sane to rely on different lock to protect us. And don't
>>> get me started on the lack of documentation...
>>>
>>> Furthermore, the implementation of old_lock_queue() suggests that the
>>> code was planned to be lockless. Why would you need the loop otherwise?
>>
>> The lock-less aspect of this affects multiple accesses to e.g.
>> the same queue, I think.
> I don't think we are talking about the same thing. What I was referring 
> to is the following code:
> 
> static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>                                                  struct evtchn *evtchn,
>                                                  unsigned long *flags)
> {
>      struct vcpu *v;
>      struct evtchn_fifo_queue *q, *old_q;
>      unsigned int try;
> 
>      for ( try = 0; try < 3; try++ )
>      {
>          v = d->vcpu[evtchn->last_vcpu_id];
>          old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
> 
>          spin_lock_irqsave(&old_q->lock, *flags);
> 
>          v = d->vcpu[evtchn->last_vcpu_id];
>          q = &v->evtchn_fifo->queue[evtchn->last_priority];
> 
>          if ( old_q == q )
>              return old_q;
> 
>          spin_unlock_irqrestore(&old_q->lock, *flags);
>      }
> 
>      gprintk(XENLOG_WARNING,
>              "dom%d port %d lost event (too many queue changes)\n",
>              d->domain_id, evtchn->port);
>      return NULL;
> }
> 
> Given that evtchn->last_vcpu_id and evtchn->last_priority can only be 
> modified in evtchn_fifo_set_pending(), this suggests that it is expected 
> for the function to multiple called concurrently on the same event channel.
> 
>> I'm unconvinced it was really considered
>> whether racing sending on the same channel is also safe this way.
> 
> How would you explain the 3 try in lock_old_queue then?

Queue changes (as said by the gprintk()) can't result from sending
alone, but require re-binding to a different vCPU or altering the
priority. I'm simply unconvinced that the code indeed fully reflects
the original intentions. IOW I'm unsure whether we talk about the
same thing ...

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-20  9:34                   ` Jan Beulich
@ 2020-10-20 10:01                     ` Julien Grall
  2020-10-20 10:06                       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-10-20 10:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu



On 20/10/2020 10:34, Jan Beulich wrote:
> On 20.10.2020 11:25, Julien Grall wrote:
>> Given that evtchn->last_vcpu_id and evtchn->last_priority can only be
>> modified in evtchn_fifo_set_pending(), this suggests that it is expected
>> for the function to multiple called concurrently on the same event channel.
>>
>>> I'm unconvinced it was really considered
>>> whether racing sending on the same channel is also safe this way.
>>
>> How would you explain the 3 try in lock_old_queue then?
> 
> Queue changes (as said by the gprintk()) can't result from sending
> alone, but require re-binding to a different vCPU or altering the
> priority. 

I agree with that. However, this doesn't change the fact that update to 
evtchn->last_priority and evtchn->last_vcpu can only happen when calling 
evtchn_fifo_set_pending().

If evtchn_fifo_set_pending() cannot be called concurrently for the same 
event, then there is *no* way for evtchn->last_{priority, vcpu} to be 
updated concurrently.

> I'm simply unconvinced that the code indeed fully reflects
> the original intentions. 

Do you mind (re-)sharing what was the original intentions?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
  2020-10-20 10:01                     ` Julien Grall
@ 2020-10-20 10:06                       ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2020-10-20 10:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jürgen Groß,
	xen-devel, Andrew Cooper, George Dunlap, Ian Jackson,
	Stefano Stabellini, Wei Liu

On 20.10.2020 12:01, Julien Grall wrote:
> 
> 
> On 20/10/2020 10:34, Jan Beulich wrote:
>> On 20.10.2020 11:25, Julien Grall wrote:
>>> Given that evtchn->last_vcpu_id and evtchn->last_priority can only be
>>> modified in evtchn_fifo_set_pending(), this suggests that it is expected
>>> for the function to multiple called concurrently on the same event channel.
>>>
>>>> I'm unconvinced it was really considered
>>>> whether racing sending on the same channel is also safe this way.
>>>
>>> How would you explain the 3 try in lock_old_queue then?
>>
>> Queue changes (as said by the gprintk()) can't result from sending
>> alone, but require re-binding to a different vCPU or altering the
>> priority. 
> 
> I agree with that. However, this doesn't change the fact that update to 
> evtchn->last_priority and evtchn->last_vcpu can only happen when calling 
> evtchn_fifo_set_pending().
> 
> If evtchn_fifo_set_pending() cannot be called concurrently for the same 
> event, then there is *no* way for evtchn->last_{priority, vcpu} to be 
> updated concurrently.
> 
>> I'm simply unconvinced that the code indeed fully reflects
>> the original intentions. 
> 
> Do you mind (re-)sharing what was the original intentions?

If only I knew, I would.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, back to index

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12  9:27 [PATCH v2 0/2] XSA-343 followup patches Juergen Gross
2020-10-12  9:27 ` [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together Juergen Gross
2020-10-12  9:48   ` Paul Durrant
2020-10-12  9:56     ` Jürgen Groß
2020-10-12 10:06       ` Paul Durrant
2020-10-13 13:58   ` Jan Beulich
2020-10-13 14:20     ` Jürgen Groß
2020-10-13 14:26       ` Jan Beulich
2020-10-14 11:40         ` Julien Grall
2020-10-15 12:07           ` Jan Beulich
2020-10-16  5:46             ` Jürgen Groß
2020-10-16  9:36             ` Julien Grall
2020-10-16 12:09               ` Jan Beulich
2020-10-20  9:25                 ` Julien Grall
2020-10-20  9:34                   ` Jan Beulich
2020-10-20 10:01                     ` Julien Grall
2020-10-20 10:06                       ` Jan Beulich
2020-10-12  9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
2020-10-13 14:02   ` Jan Beulich
2020-10-13 14:13     ` Jürgen Groß
2020-10-13 15:30       ` Jan Beulich
2020-10-13 15:28   ` Jan Beulich
2020-10-14  6:00     ` Jürgen Groß
2020-10-14  6:52       ` Jan Beulich
2020-10-14  7:27         ` Jürgen Groß
2020-10-16  9:51   ` Julien Grall

Xen-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/xen-devel/0 xen-devel/git/0.git
	git clone --mirror https://lore.kernel.org/xen-devel/1 xen-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 xen-devel xen-devel/ https://lore.kernel.org/xen-devel \
		xen-devel@lists.xenproject.org xen-devel@lists.xen.org
	public-inbox-index xen-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.xenproject.lists.xen-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git