All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] Xen: FIFO-based event channel ABI
@ 2013-08-09 18:08 David Vrabel
  2013-08-09 18:08 ` [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys David Vrabel
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

This is an RFC of the FIFO-based event channel ABI described in this
design document:

http://xenbits.xen.org/people/dvrabel/event-channels-D.pdf

Changes in draft D include:

- Only Xen writes to the HEAD field in the control block

- Added a link_bits field to EVTCHNOP_init_control to return the
  number of valid bits for the LINK field.

Patch 1-4 do some preparatory work for supporting alternate ABIs.

Patch 5-8 expand the number of evtchn objects a domain may have to
changing how they are allocated.

Patch 7-8 add the ABI and the implementation.  Patch 8 list some of
the known limitations with this RFC implementation.

Changes in v1:

- Updates for Draft D of the design.
- 130,000+ event channels are now supported.
- event_port.c -> event_2l.c and only contains 2l functions.
- Addresses various review comments
  - int -> unsigned in lots of places
  - use write_atomic() to set HEAD
  - removed MAX_EVTCHNS
  - evtchn_ops are const.
- Pack struct evtchns better to reduce memory needed.

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-15 13:55   ` Jan Beulich
  2013-08-09 18:08 ` [PATCH 2/8] evtchn: refactor low-level event channel port ops David Vrabel
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

With different event channel ABIs, showing the 2-level selector word
may be misleading (if a different ABI is in use).

When dumping domain information, printing the state of the VIRQ_DEBUG
port is redundant -- this information is available via the 'e' key.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/irq.c      |    5 +----
 xen/common/keyhandler.c |   11 ++---------
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index a4da786..ae66de2 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2128,14 +2128,11 @@ static void dump_irqs(unsigned char key)
                 d = action->guest[i];
                 pirq = domain_irq_to_pirq(d, irq);
                 info = pirq_info(d, pirq);
-                printk("%u:%3d(%c%c%c%c)",
+                printk("%u:%3d(%c%c%c)",
                        d->domain_id, pirq,
                        (test_bit(info->evtchn,
                                  &shared_info(d, evtchn_pending)) ?
                         'P' : '-'),
-                       (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d),
-                                 &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ?
-                        'S' : '-'),
                        (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ?
                         'M' : '-'),
                        (info->masked ? 'M' : '-'));
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 5072133..a65ec08 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -310,16 +310,9 @@ static void dump_domains(unsigned char key)
     {
         for_each_vcpu ( d, v )
         {
-            printk("Notifying guest %d:%d (virq %d, port %d, stat %d/%d/%d)\n",
+            printk("Notifying guest %d:%d (virq %d, port %d)\n",
                    d->domain_id, v->vcpu_id,
-                   VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG],
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG], 
-                            &shared_info(d, evtchn_pending)),
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG], 
-                            &shared_info(d, evtchn_mask)),
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG] /
-                            BITS_PER_EVTCHN_WORD(d),
-                            &vcpu_info(v, evtchn_pending_sel)));
+                   VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG]);
             send_guest_vcpu_virq(v, VIRQ_DEBUG);
         }
     }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/8] evtchn: refactor low-level event channel port ops
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
  2013-08-09 18:08 ` [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-15 14:05   ` Jan Beulich
  2013-08-09 18:08 ` [PATCH 3/8] evtchn: add a hook to bind an event port to a VCPU David Vrabel
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Use functions for the low-level event channel port operations
(set/clear pending, unmask, is_pending and is_masked).

Group these functions into a struct evtchn_port_op so they can be
replaced by alternate implementations (for different ABIs) on a
per-domain basis.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/irq.c         |   11 ++---
 xen/common/Makefile        |    1 +
 xen/common/event_2l.c      |   99 ++++++++++++++++++++++++++++++++++++++++++++
 xen/common/event_channel.c |   89 ++++++++++++++++-----------------------
 xen/common/schedule.c      |    3 +-
 xen/include/xen/event.h    |   43 +++++++++++++++++++
 xen/include/xen/sched.h    |    4 ++
 7 files changed, 190 insertions(+), 60 deletions(-)
 create mode 100644 xen/common/event_2l.c

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index ae66de2..42199a3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1474,7 +1474,7 @@ int pirq_guest_unmask(struct domain *d)
         {
             pirq = pirqs[i]->pirq;
             if ( pirqs[i]->masked &&
-                 !test_bit(pirqs[i]->evtchn, &shared_info(d, evtchn_mask)) )
+                 !evtchn_port_is_masked(d, evtchn_from_port(d, pirqs[i]->evtchn)) )
                 pirq_guest_eoi(pirqs[i]);
         }
     } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) );
@@ -2088,6 +2088,7 @@ static void dump_irqs(unsigned char key)
     int i, irq, pirq;
     struct irq_desc *desc;
     irq_guest_action_t *action;
+    struct evtchn *evtchn;
     struct domain *d;
     const struct pirq *info;
     unsigned long flags;
@@ -2128,13 +2129,11 @@ static void dump_irqs(unsigned char key)
                 d = action->guest[i];
                 pirq = domain_irq_to_pirq(d, irq);
                 info = pirq_info(d, pirq);
+                evtchn = evtchn_from_port(d, info->evtchn);
                 printk("%u:%3d(%c%c%c)",
                        d->domain_id, pirq,
-                       (test_bit(info->evtchn,
-                                 &shared_info(d, evtchn_pending)) ?
-                        'P' : '-'),
-                       (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ?
-                        'M' : '-'),
+                       (evtchn_port_is_pending(d, evtchn) ? 'P' : '-'),
+                       (evtchn_port_is_masked(d, evtchn) ? 'M' : '-'),
                        (info->masked ? 'M' : '-'));
                 if ( i != action->nr_guests )
                     printk(",");
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 0dc2050..ef03eac 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -5,6 +5,7 @@ obj-y += cpupool.o
 obj-$(HAS_DEVICE_TREE) += device_tree.o
 obj-y += domctl.o
 obj-y += domain.o
+obj-y += event_2l.o
 obj-y += event_channel.o
 obj-y += grant_table.o
 obj-y += irq.o
diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
new file mode 100644
index 0000000..18c0c6e
--- /dev/null
+++ b/xen/common/event_2l.c
@@ -0,0 +1,99 @@
+/*
+ * Event channel port operations.
+ *
+ * Copyright (c) 2003-2006, K A Fraser.
+ * 
+ * This source code is licensed under the GNU General Public License,
+ * Version 2 or later.  See the file COPYING for more details.
+ */
+
+#include <xen/config.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/event.h>
+
+static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
+{
+    struct domain *d = v->domain;
+    unsigned port = evtchn->port;
+
+    /*
+     * The following bit operations must happen in strict order.
+     * NB. On x86, the atomic bit operations also act as memory barriers.
+     * There is therefore sufficiently strict ordering for this architecture --
+     * others may require explicit memory barriers.
+     */
+
+    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
+        return;
+
+    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
+         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
+                           &vcpu_info(v, evtchn_pending_sel)) )
+    {
+        vcpu_mark_events_pending(v);
+    }
+
+    evtchn_check_pollers(d, port);
+}
+
+static void evtchn_2l_clear_pending(struct domain *d, struct evtchn *evtchn)
+{
+    clear_bit(evtchn->port, &shared_info(d, evtchn_pending));
+}
+
+static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn)
+{
+    struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id];
+    unsigned port = evtchn->port;
+
+    /*
+     * These operations must happen in strict order. Based on
+     * evtchn_2l_set_pending() above.
+     */
+    if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
+         test_bit          (port, &shared_info(d, evtchn_pending)) &&
+         !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d),
+                            &vcpu_info(v, evtchn_pending_sel)) )
+    {
+        vcpu_mark_events_pending(v);
+    }
+}
+
+static bool_t evtchn_2l_is_pending(struct domain *d,
+                                   const struct evtchn *evtchn)
+{
+    return test_bit(evtchn->port, &shared_info(d, evtchn_pending));
+}
+
+static bool_t evtchn_2l_is_masked(struct domain *d,
+                                  const struct evtchn *evtchn)
+{
+    return test_bit(evtchn->port, &shared_info(d, evtchn_mask));
+}
+
+static const struct evtchn_port_ops evtchn_port_ops_2l =
+{
+    .set_pending   = evtchn_2l_set_pending,
+    .clear_pending = evtchn_2l_clear_pending,
+    .unmask        = evtchn_2l_unmask,
+    .is_pending    = evtchn_2l_is_pending,
+    .is_masked     = evtchn_2l_is_masked,
+};
+
+void evtchn_2l_init(struct domain *d)
+{
+    d->evtchn_port_ops = &evtchn_port_ops_2l;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 64c976b..01d0b77 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -150,6 +150,7 @@ static int get_free_port(struct domain *d)
             xfree(chn);
             return -ENOMEM;
         }
+        chn[i].port = port + i;
     }
 
     bucket_from_port(d, port) = chn;
@@ -530,7 +531,7 @@ static long __evtchn_close(struct domain *d1, int port1)
     }
 
     /* Clear pending event to avoid unexpected behavior on re-bind. */
-    clear_bit(port1, &shared_info(d1, evtchn_pending));
+    evtchn_port_clear_pending(d1, chn1);
 
     /* Reset binding to vcpu0 when the channel is freed. */
     chn1->state          = ECS_FREE;
@@ -615,43 +616,10 @@ out:
 
 static void evtchn_set_pending(struct vcpu *v, int port)
 {
-    struct domain *d = v->domain;
-    int vcpuid;
-
-    /*
-     * The following bit operations must happen in strict order.
-     * NB. On x86, the atomic bit operations also act as memory barriers.
-     * There is therefore sufficiently strict ordering for this architecture --
-     * others may require explicit memory barriers.
-     */
-
-    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
-        return;
-
-    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
-         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
-                           &vcpu_info(v, evtchn_pending_sel)) )
-    {
-        vcpu_mark_events_pending(v);
-    }
-    
-    /* Check if some VCPU might be polling for this event. */
-    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
-        return;
+    struct evtchn *evtchn;
 
-    /* Wake any interested (or potentially interested) pollers. */
-    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
-          vcpuid < d->max_vcpus;
-          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
-    {
-        v = d->vcpu[vcpuid];
-        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
-             test_and_clear_bit(vcpuid, d->poll_mask) )
-        {
-            v->poll_evtchn = 0;
-            vcpu_unblock(v);
-        }
-    }
+    evtchn = evtchn_from_port(v->domain, port);
+    evtchn_port_set_pending(v, evtchn);
 }
 
 int guest_enabled_event(struct vcpu *v, uint32_t virq)
@@ -920,26 +888,15 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id)
 int evtchn_unmask(unsigned int port)
 {
     struct domain *d = current->domain;
-    struct vcpu   *v;
+    struct evtchn *evtchn;
 
     ASSERT(spin_is_locked(&d->event_lock));
 
     if ( unlikely(!port_is_valid(d, port)) )
         return -EINVAL;
 
-    v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id];
-
-    /*
-     * These operations must happen in strict order. Based on
-     * include/xen/event.h:evtchn_set_pending(). 
-     */
-    if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
-         test_bit          (port, &shared_info(d, evtchn_pending)) &&
-         !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d),
-                            &vcpu_info(v, evtchn_pending_sel)) )
-    {
-        vcpu_mark_events_pending(v);
-    }
+    evtchn = evtchn_from_port(d, port);
+    evtchn_port_unmask(d, evtchn);
 
     return 0;
 }
@@ -1170,9 +1127,35 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
     spin_unlock(&ld->event_lock);
 }
 
+void evtchn_check_pollers(struct domain *d, unsigned port)
+{
+    struct vcpu *v;
+    unsigned vcpuid;
+
+    /* Check if some VCPU might be polling for this event. */
+    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
+        return;
+
+    /* Wake any interested (or potentially interested) pollers. */
+    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
+          vcpuid < d->max_vcpus;
+          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
+    {
+        v = d->vcpu[vcpuid];
+        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
+             test_and_clear_bit(vcpuid, d->poll_mask) )
+        {
+            v->poll_evtchn = 0;
+            vcpu_unblock(v);
+        }
+    }
+}
 
 int evtchn_init(struct domain *d)
 {
+    /* Default to N-level ABI. */
+    evtchn_2l_init(d);
+
     spin_lock_init(&d->event_lock);
     if ( get_free_port(d) != 0 )
         return -EINVAL;
@@ -1270,8 +1253,8 @@ static void domain_dump_evtchn_info(struct domain *d)
 
         printk("    %4u [%d/%d]: s=%d n=%d x=%d",
                port,
-               !!test_bit(port, &shared_info(d, evtchn_pending)),
-               !!test_bit(port, &shared_info(d, evtchn_mask)),
+               !!evtchn_port_is_pending(d, chn),
+               !!evtchn_port_is_masked(d, chn),
                chn->state, chn->notify_vcpu_id, chn->xen_consumer);
 
         switch ( chn->state )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a8398bd..7e6884d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -34,6 +34,7 @@
 #include <xen/multicall.h>
 #include <xen/cpu.h>
 #include <xen/preempt.h>
+#include <xen/event.h>
 #include <public/sched.h>
 #include <xsm/xsm.h>
 
@@ -751,7 +752,7 @@ static long do_poll(struct sched_poll *sched_poll)
             goto out;
 
         rc = 0;
-        if ( test_bit(port, &shared_info(d, evtchn_pending)) )
+        if ( evtchn_port_is_pending(d, evtchn_from_port(d, port)) )
             goto out;
     }
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 6f60162..208f7bd 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -102,4 +102,47 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
         smp_mb(); /* set blocked status /then/ caller does his work */  \
     } while ( 0 )
 
+void evtchn_check_pollers(struct domain *d, unsigned port);
+
+void evtchn_2l_init(struct domain *d);
+
+/*
+ * Low-level event channel port ops.
+ */
+struct evtchn_port_ops {
+    void (*set_pending)(struct vcpu *v, struct evtchn *evtchn);
+    void (*clear_pending)(struct domain *d, struct evtchn *evtchn);
+    void (*unmask)(struct domain *d, struct evtchn *evtchn);
+    bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn);
+    bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn);
+};
+
+static inline void evtchn_port_set_pending(struct vcpu *v, struct evtchn *evtchn)
+{
+    v->domain->evtchn_port_ops->set_pending(v, evtchn);
+}
+
+static inline void evtchn_port_clear_pending(struct domain *d, struct evtchn *evtchn)
+{
+    d->evtchn_port_ops->clear_pending(d, evtchn);
+}
+
+static inline void evtchn_port_unmask(struct domain *d,
+                                      struct evtchn *evtchn)
+{
+    d->evtchn_port_ops->unmask(d, evtchn);
+}
+
+static inline bool_t evtchn_port_is_pending(struct domain *d,
+                                            const struct evtchn *evtchn)
+{
+    return d->evtchn_port_ops->is_pending(d, evtchn);
+}
+
+static inline bool_t evtchn_port_is_masked(struct domain *d,
+                                           const struct evtchn *evtchn)
+{
+    return d->evtchn_port_ops->is_masked(d, evtchn);
+}
+
 #endif /* __XEN_EVENT_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ae6a3b8..9e42220 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -66,6 +66,7 @@ struct evtchn
     u8  state;             /* ECS_* */
     u8  xen_consumer;      /* Consumer in Xen, if any? (0 = send to guest) */
     u16 notify_vcpu_id;    /* VCPU for local delivery notification */
+    u32 port;
     union {
         struct {
             domid_t remote_domid;
@@ -238,6 +239,8 @@ struct mem_event_per_domain
     struct mem_event_domain access;
 };
 
+struct evtchn_port_ops;
+
 struct domain
 {
     domid_t          domain_id;
@@ -271,6 +274,7 @@ struct domain
     /* Event channel information. */
     struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
     spinlock_t       event_lock;
+    const struct evtchn_port_ops *evtchn_port_ops;
 
     struct grant_table *grant_table;
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/8] evtchn: add a hook to bind an event port to a VCPU
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
  2013-08-09 18:08 ` [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys David Vrabel
  2013-08-09 18:08 ` [PATCH 2/8] evtchn: refactor low-level event channel port ops David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-09 18:08 ` [PATCH 4/8] evtchn: use a per-domain variable for the max number of event channels David Vrabel
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

The upcoming FIFO-based event channel ABI will require binding ports
to a specific per-VCPU queue, so add a port op (bind_to_vcpu) for
this.  Call this new op when events are bound and when they are moved
to a different VCPU.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_channel.c |    7 +++++++
 xen/include/xen/event.h    |    9 +++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 01d0b77..d0fbe82 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -237,10 +237,12 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     if ( rc )
         goto out;
 
+    evtchn_port_bind_to_vcpu(ld, lchn, ld->vcpu[lchn->notify_vcpu_id]);
     lchn->u.interdomain.remote_dom  = rd;
     lchn->u.interdomain.remote_port = (u16)rport;
     lchn->state                     = ECS_INTERDOMAIN;
     
+    evtchn_port_bind_to_vcpu(rd, rchn, rd->vcpu[rchn->notify_vcpu_id]);
     rchn->u.interdomain.remote_dom  = ld;
     rchn->u.interdomain.remote_port = (u16)lport;
     rchn->state                     = ECS_INTERDOMAIN;
@@ -291,6 +293,7 @@ static long evtchn_bind_virq(evtchn_bind_virq_t *bind)
         ERROR_EXIT(port);
 
     chn = evtchn_from_port(d, port);
+    evtchn_port_bind_to_vcpu(d, chn, v);
     chn->state          = ECS_VIRQ;
     chn->notify_vcpu_id = vcpu;
     chn->u.virq         = virq;
@@ -321,6 +324,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
         ERROR_EXIT(port);
 
     chn = evtchn_from_port(d, port);
+    evtchn_port_bind_to_vcpu(d, chn, d->vcpu[vcpu]);
     chn->state          = ECS_IPI;
     chn->notify_vcpu_id = vcpu;
 
@@ -398,6 +402,7 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
         goto out;
     }
 
+    evtchn_port_bind_to_vcpu(d, chn, v);
     chn->state  = ECS_PIRQ;
     chn->u.pirq.irq = pirq;
     link_pirq_port(port, chn, v);
@@ -877,6 +882,8 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id)
         rc = -EINVAL;
         break;
     }
+    if ( rc == 0 )
+        evtchn_port_bind_to_vcpu(d, chn, d->vcpu[vcpu_id]);
 
  out:
     spin_unlock(&d->event_lock);
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 208f7bd..05d8091 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -115,6 +115,8 @@ struct evtchn_port_ops {
     void (*unmask)(struct domain *d, struct evtchn *evtchn);
     bool_t (*is_pending)(struct domain *d, const struct evtchn *evtchn);
     bool_t (*is_masked)(struct domain *d, const struct evtchn *evtchn);
+    void (*bind_to_vcpu)(struct domain *d, struct evtchn *evtchn,
+                         struct vcpu *vcpu);
 };
 
 static inline void evtchn_port_set_pending(struct vcpu *v, struct evtchn *evtchn)
@@ -145,4 +147,11 @@ static inline bool_t evtchn_port_is_masked(struct domain *d,
     return d->evtchn_port_ops->is_masked(d, evtchn);
 }
 
+static inline void evtchn_port_bind_to_vcpu(struct domain *d, struct evtchn *evtchn,
+                                            struct vcpu *vcpu)
+{
+    if ( d->evtchn_port_ops->bind_to_vcpu )
+        d->evtchn_port_ops->bind_to_vcpu(d, evtchn, vcpu);
+}
+
 #endif /* __XEN_EVENT_H__ */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/8] evtchn: use a per-domain variable for the max number of event channels
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (2 preceding siblings ...)
  2013-08-09 18:08 ` [PATCH 3/8] evtchn: add a hook to bind an event port to a VCPU David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-15 14:09   ` Jan Beulich
  2013-08-09 18:08 ` [PATCH 5/8] evtchn: dynamically allocate d->evtchn David Vrabel
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead.  This
avoids having to repeatedly check the ABI type.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_2l.c      |    1 +
 xen/common/event_channel.c |    4 ++--
 xen/common/schedule.c      |    2 +-
 xen/include/xen/event.h    |    2 +-
 xen/include/xen/sched.h    |    1 +
 5 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
index 18c0c6e..0f6606e 100644
--- a/xen/common/event_2l.c
+++ b/xen/common/event_2l.c
@@ -86,6 +86,7 @@ static const struct evtchn_port_ops evtchn_port_ops_2l =
 void evtchn_2l_init(struct domain *d)
 {
     d->evtchn_port_ops = &evtchn_port_ops_2l;
+    d->max_evtchns = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
 }
 
 /*
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index d0fbe82..1a1636e 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -134,7 +134,7 @@ static int get_free_port(struct domain *d)
         if ( evtchn_from_port(d, port)->state == ECS_FREE )
             return port;
 
-    if ( port == MAX_EVTCHNS(d) )
+    if ( port == d->max_evtchns )
         return -ENOSPC;
 
     chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET);
@@ -1247,7 +1247,7 @@ static void domain_dump_evtchn_info(struct domain *d)
 
     spin_lock(&d->event_lock);
 
-    for ( port = 1; port < MAX_EVTCHNS(d); ++port )
+    for ( port = 1; port < d->max_evtchns; ++port )
     {
         const struct evtchn *chn;
         char *ssid;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7e6884d..a5a0010 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -748,7 +748,7 @@ static long do_poll(struct sched_poll *sched_poll)
             goto out;
 
         rc = -EINVAL;
-        if ( port >= MAX_EVTCHNS(d) )
+        if ( port >= d->max_evtchns )
             goto out;
 
         rc = 0;
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 05d8091..a49697c 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -73,7 +73,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d,p) \
     ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET])
 #define port_is_valid(d,p)    \
-    (((p) >= 0) && ((p) < MAX_EVTCHNS(d)) && \
+    (((p) >= 0) && ((p) < d->max_evtchns) && \
      (bucket_from_port(d,p) != NULL))
 #define evtchn_from_port(d,p) \
     (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)])
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9e42220..0ca984d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -273,6 +273,7 @@ struct domain
 
     /* Event channel information. */
     struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
+    unsigned         max_evtchns;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/8] evtchn: dynamically allocate d->evtchn
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (3 preceding siblings ...)
  2013-08-09 18:08 ` [PATCH 4/8] evtchn: use a per-domain variable for the max number of event channels David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-15 14:10   ` Jan Beulich
  2013-08-09 18:08 ` [PATCH 6/8] evtchn: alter internal object handling scheme David Vrabel
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: Wei Liu <wei.liu2@citrix.com>

As we move to extended evtchn ABI we need bigger d->evtchn, as a result
this will bloat struct domain. So move this array out of struct domain
and allocate a dedicated array for it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_channel.c |   13 +++++++++++++
 xen/include/xen/sched.h    |    2 +-
 2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 1a1636e..126cf84 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1163,15 +1163,26 @@ int evtchn_init(struct domain *d)
     /* Default to N-level ABI. */
     evtchn_2l_init(d);
 
+    BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE);
+    d->evtchn = xzalloc_array(struct evtchn *, NR_EVTCHN_BUCKETS);
+    if ( d->evtchn == NULL )
+        return -ENOMEM;
+
     spin_lock_init(&d->event_lock);
     if ( get_free_port(d) != 0 )
+    {
+        xfree(d->evtchn);
         return -EINVAL;
+    }
     evtchn_from_port(d, 0)->state = ECS_RESERVED;
 
 #if MAX_VIRT_CPUS > BITS_PER_LONG
     d->poll_mask = xmalloc_array(unsigned long, BITS_TO_LONGS(MAX_VIRT_CPUS));
     if ( !d->poll_mask )
+    {
+        xfree(d->evtchn);
         return -ENOMEM;
+    }
     bitmap_zero(d->poll_mask, MAX_VIRT_CPUS);
 #endif
 
@@ -1205,6 +1216,8 @@ void evtchn_destroy(struct domain *d)
     spin_unlock(&d->event_lock);
 
     clear_global_virq_handlers(d);
+
+    xfree(d->evtchn);
 }
 
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0ca984d..3070555 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -272,7 +272,7 @@ struct domain
     spinlock_t       rangesets_lock;
 
     /* Event channel information. */
-    struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
+    struct evtchn  **evtchn;
     unsigned         max_evtchns;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/8] evtchn: alter internal object handling scheme
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (4 preceding siblings ...)
  2013-08-09 18:08 ` [PATCH 5/8] evtchn: dynamically allocate d->evtchn David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-15 14:21   ` Jan Beulich
  2013-08-16 16:55   ` Wei Liu
  2013-08-09 18:08 ` [PATCH 7/8] evtchn: add FIFO-based event channel ABI David Vrabel
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Malcolm Crossley, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Originally, evtchn objects are stored in buckets. Now we add another
layer called group.  struct domain holds an array to evtchn groups,
then each group holds pointers to a bucket.

With this change, each domain can have more struct evtchn in a
space-efficient way.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Allocate the array of group pointers, shrinking the size of struct
domain.

Compile time calculate grouping and bucket parameters to achive
optimum packing into PAGE_SIZE memory allocations.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_channel.c |   29 ++++++++++++++++++++++++-----
 xen/include/xen/event.h    |   12 ++++++++++--
 xen/include/xen/sched.h    |   22 ++++++++++++++++++----
 3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 126cf84..67dcdbc 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -124,6 +124,7 @@ static int virq_is_global(uint32_t virq)
 static int get_free_port(struct domain *d)
 {
     struct evtchn *chn;
+    struct evtchn **grp;
     int            port;
     int            i, j;
 
@@ -137,6 +138,15 @@ static int get_free_port(struct domain *d)
     if ( port == d->max_evtchns )
         return -ENOSPC;
 
+    if ( unlikely(group_from_port(d, port)  == NULL ) )
+    {
+        grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
+        if ( unlikely(grp == NULL) )
+            return -ENOMEM;
+        else
+            group_from_port(d, port) = grp;
+    }
+
     chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET);
     if ( unlikely(chn == NULL) )
         return -ENOMEM;
@@ -1163,8 +1173,7 @@ int evtchn_init(struct domain *d)
     /* Default to N-level ABI. */
     evtchn_2l_init(d);
 
-    BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE);
-    d->evtchn = xzalloc_array(struct evtchn *, NR_EVTCHN_BUCKETS);
+    d->evtchn = xzalloc_array(struct evtchn **, NR_EVTCHN_GROUPS);
     if ( d->evtchn == NULL )
         return -ENOMEM;
 
@@ -1192,7 +1201,7 @@ int evtchn_init(struct domain *d)
 
 void evtchn_destroy(struct domain *d)
 {
-    int i;
+    unsigned i, j;
 
     /* After this barrier no new event-channel allocations can occur. */
     BUG_ON(!d->is_dying);
@@ -1207,9 +1216,19 @@ void evtchn_destroy(struct domain *d)
 
     /* Free all event-channel buckets. */
     spin_lock(&d->event_lock);
-    for ( i = 0; i < NR_EVTCHN_BUCKETS; i++ )
+    for ( i = 0; i < NR_EVTCHN_GROUPS; i++ )
     {
-        xsm_free_security_evtchn(d->evtchn[i]);
+        if ( d->evtchn[i] == NULL )
+            continue;
+
+        for ( j = 0; j < BUCKETS_PER_GROUP; j++ )
+        {
+            if ( d->evtchn[i][j] == NULL )
+                continue;
+            xsm_free_security_evtchn(d->evtchn[i][j]);
+            xfree(d->evtchn[i][j]);
+            d->evtchn[i][j] = NULL;
+        }
         xfree(d->evtchn[i]);
         d->evtchn[i] = NULL;
     }
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index a49697c..a795ae6 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -69,11 +69,19 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq);
 /* Notify remote end of a Xen-attached event channel.*/
 void notify_via_xen_event_channel(struct domain *ld, int lport);
 
-/* Internal event channel object accessors */
+/*
+ * Internal event channel object storage:
+ * Objects are organized in two level scheme: group and bucket
+ * A group consists of several buckets, a bucket is an array of struct evtchn
+ */
+#define group_from_port(d,p) \
+    ((d)->evtchn[(p)/EVTCHNS_PER_GROUP])
+/* User should make sure group is not NULL */
 #define bucket_from_port(d,p) \
-    ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET])
+    ((group_from_port(d,p))[((p)%EVTCHNS_PER_GROUP)/EVTCHNS_PER_BUCKET])
 #define port_is_valid(d,p)    \
     (((p) >= 0) && ((p) < d->max_evtchns) && \
+    (group_from_port(d,p) != NULL) && \
      (bucket_from_port(d,p) != NULL))
 #define evtchn_from_port(d,p) \
     (&(bucket_from_port(d,p))[(p)&(EVTCHNS_PER_BUCKET-1)])
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3070555..f95cf99 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -50,9 +50,23 @@ extern struct domain *dom0;
 #else
 #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG)
 #endif
-#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
-#define EVTCHNS_PER_BUCKET 128
-#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
+
+#define BUCKETS_PER_GROUP  (PAGE_SIZE/sizeof(struct evtchn *))
+/* Round size of struct evtchn up to power of 2 size */
+#define b2(x)   (   (x) | (   (x) >> 1) )
+#define b4(x)   ( b2(x) | ( b2(x) >> 2) )
+#define b8(x)   ( b4(x) | ( b4(x) >> 4) )
+#define b16(x)  ( b8(x) | ( b8(x) >> 8) )
+#define b32(x)  (b16(x) | (b16(x) >>16) )
+#define next_power_of_2(x)      (b32(x-1) + 1)
+
+/* Maximum number of event channels for any ABI. */
+#define MAX_NR_EVTCHNS (max_t(unsigned, NR_EVENT_CHANNELS,  \
+                              1 << EVTCHN_FIFO_LINK_BITS))
+
+#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn)))
+#define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
+#define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
 
 struct evtchn
 {
@@ -272,7 +286,7 @@ struct domain
     spinlock_t       rangesets_lock;
 
     /* Event channel information. */
-    struct evtchn  **evtchn;
+    struct evtchn ***evtchn;
     unsigned         max_evtchns;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/8] evtchn: add FIFO-based event channel ABI
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (5 preceding siblings ...)
  2013-08-09 18:08 ` [PATCH 6/8] evtchn: alter internal object handling scheme David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-15 14:25   ` Jan Beulich
  2013-08-09 18:08 ` [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops David Vrabel
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add the event channel hypercall sub-ops and the definitions for the
shared data structures for the FIFO-based event channel ABI.

The design document for this new ABI is available here:

http://xenbits.xen.org/people/dvrabel/event-channels-C.pdf

In summary, events are reported using a per-domain shared event array
of event words.  Each event word has PENDING, LINKED and MASKED bits
and a LINK field for pointing to the next event in the event queue.

There are 16 event queues (with different priorities) per-VCPU.

Key advantages of this new ABI include:

- Support for over 100,000 events (2^17).
- 16 different event priorities.
- Improved fairness in event latency through the use of FIFOs.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/include/public/event_channel.h |   72 ++++++++++++++++++++++++++++++++++++
 xen/include/public/xen.h           |    2 +-
 2 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h
index 472efdb..c7a031c 100644
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -71,6 +71,10 @@
 #define EVTCHNOP_bind_vcpu        8
 #define EVTCHNOP_unmask           9
 #define EVTCHNOP_reset           10
+#define EVTCHNOP_init_control    11
+#define EVTCHNOP_expand_array    12
+#define EVTCHNOP_set_priority    13
+#define EVTCHNOP_set_limit       14
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
@@ -258,6 +262,48 @@ struct evtchn_reset {
 typedef struct evtchn_reset evtchn_reset_t;
 
 /*
+ * EVTCHNOP_init_control: initialize the control block for the FIFO ABI.
+ */
+struct evtchn_init_control {
+    /* IN parameters. */
+    uint64_t control_mfn;
+    uint32_t offset;
+    uint32_t vcpu;
+    /* OUT parameters. */
+    uint8_t link_bits;
+};
+typedef struct evtchn_init_control evtchn_init_control_t;
+
+/*
+ * EVTCHNOP_expand_array: add an additional page to the event array.
+ */
+struct evtchn_expand_array {
+    /* IN parameters. */
+    uint64_t array_mfn;
+};
+typedef struct evtchn_expand_array evtchn_expand_array_t;
+
+/*
+ * EVTCHNOP_set_priority: set the priority for an event channel.
+ */
+struct evtchn_set_priority {
+    /* IN parameters. */
+    uint32_t port;
+    uint32_t priority;
+};
+typedef struct evtchn_set_priority evtchn_set_priority_t;
+
+/*
+ * EVTCHNOP_set_limit: set the maximum event channel port that may be bound.
+ */
+struct evtchn_set_limit {
+    /* IN parameters. */
+    uint32_t domid;
+    uint32_t max_port;
+};
+typedef struct evtchn_set_limit evtchn_set_limit_t;
+
+/*
  * ` enum neg_errnoval
  * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op)
  * `
@@ -281,6 +327,32 @@ struct evtchn_op {
 typedef struct evtchn_op evtchn_op_t;
 DEFINE_XEN_GUEST_HANDLE(evtchn_op_t);
 
+/*
+ * FIFO ABI
+ */
+
+/* Events may have priorities from 0 (highest) to 15 (lowest). */
+#define EVTCHN_FIFO_PRIORITY_MIN     15
+#define EVTCHN_FIFO_PRIORITY_DEFAULT 7
+
+#define EVTCHN_FIFO_MAX_QUEUES (EVTCHN_FIFO_PRIORITY_MIN + 1)
+
+typedef uint32_t event_word_t;
+
+#define EVTCHN_FIFO_PENDING 31
+#define EVTCHN_FIFO_MASKED  30
+#define EVTCHN_FIFO_LINKED  29
+
+#define EVTCHN_FIFO_LINK_BITS 17
+#define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1)
+
+struct evtchn_fifo_control_block {
+    uint32_t ready;
+    uint32_t _rsvd1;
+    uint32_t head[EVTCHN_FIFO_MAX_QUEUES];
+};
+typedef struct evtchn_fifo_control_block evtchn_fifo_control_block_t;
+
 #endif /* __XEN_PUBLIC_EVENT_CHANNEL_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 3cab74f..eed3508 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -553,7 +553,7 @@ typedef struct multicall_entry multicall_entry_t;
 DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
 
 /*
- * Event channel endpoints per domain:
+ * Event channel endpoints per domain (when using the 2-level ABI):
  *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
  */
 #define NR_EVENT_CHANNELS (sizeof(xen_ulong_t) * sizeof(xen_ulong_t) * 64)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (6 preceding siblings ...)
  2013-08-09 18:08 ` [PATCH 7/8] evtchn: add FIFO-based event channel ABI David Vrabel
@ 2013-08-09 18:08 ` David Vrabel
  2013-08-16 16:33   ` Wei Liu
  2013-08-23 10:33   ` Jan Beulich
  2013-08-12 13:06 ` [RFC PATCH 0/8] Xen: FIFO-based event channel ABI George Dunlap
  2013-08-16 16:55 ` Wei Liu
  9 siblings, 2 replies; 28+ messages in thread
From: David Vrabel @ 2013-08-09 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Keir Fraser, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add the implementation for the FIFO-based event channel ABI.  The new
hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array,
EVTCHNOP_set_priority) and the required evtchn_ops (set_pending,
unmask, etc.).

This current implementation has three main limitations:

- EVTCHNOP_set_limit is not yet implemented so any guest will be able
  to use up to 2^17 (requiring 128 global mapping pages for a fully
  populated event array).

- The control block frames are not required to be shared with the
  vcpu_info structure.  This requires an additional global mapping
  page per-VCPU.  This does makes the guest implementation cleaner
  though so perhaps we do not need to fix this?

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/Makefile          |    1 +
 xen/common/event_channel.c   |   37 ++++
 xen/common/event_fifo.c      |  491 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/event.h      |    6 +-
 xen/include/xen/event_fifo.h |   56 +++++
 xen/include/xen/sched.h      |    4 +
 6 files changed, 593 insertions(+), 2 deletions(-)
 create mode 100644 xen/common/event_fifo.c
 create mode 100644 xen/include/xen/event_fifo.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index ef03eac..afd7d40 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -7,6 +7,7 @@ obj-y += domctl.o
 obj-y += domain.o
 obj-y += event_2l.o
 obj-y += event_channel.o
+obj-y += event_fifo.o
 obj-y += grant_table.o
 obj-y += irq.o
 obj-y += kernel.o
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 67dcdbc..a048107 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -26,6 +26,7 @@
 #include <xen/compat.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
+#include <xen/event_fifo.h>
 #include <asm/current.h>
 
 #include <public/xen.h>
@@ -1053,6 +1054,40 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case EVTCHNOP_init_control: {
+        struct evtchn_init_control init_control;
+        if ( copy_from_guest(&init_control, arg, 1) != 0 )
+            return -EFAULT;
+        rc = evtchn_fifo_init_control(&init_control);
+        if ( !rc && __copy_to_guest(arg, &init_control, 1) )
+            rc = -EFAULT;
+        break;
+    }
+
+    case EVTCHNOP_expand_array: {
+        struct evtchn_expand_array expand_array;
+        if ( copy_from_guest(&expand_array, arg, 1) != 0 )
+            return -EFAULT;
+        rc = evtchn_fifo_expand_array(&expand_array);
+        break;
+    }
+
+    case EVTCHNOP_set_priority: {
+        struct evtchn_set_priority set_priority;
+        if ( copy_from_guest(&set_priority, arg, 1) != 0 )
+            return -EFAULT;
+        rc = evtchn_fifo_set_priority(&set_priority);
+        break;
+    }
+
+    case EVTCHNOP_set_limit: {
+        struct evtchn_set_limit set_limit;
+        if ( copy_from_guest(&set_limit, arg, 1) != 0 )
+            return -EFAULT;
+        rc = evtchn_fifo_set_limit(&set_limit);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
@@ -1214,6 +1249,8 @@ void evtchn_destroy(struct domain *d)
         (void)__evtchn_close(d, i);
     }
 
+    evtchn_fifo_destroy(d);
+
     /* Free all event-channel buckets. */
     spin_lock(&d->event_lock);
     for ( i = 0; i < NR_EVTCHN_GROUPS; i++ )
diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
new file mode 100644
index 0000000..a7b5dc9
--- /dev/null
+++ b/xen/common/event_fifo.c
@@ -0,0 +1,491 @@
+/*
+ * FIFO event channel management.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ * 
+ * This source code is licensed under the GNU General Public License,
+ * Version 2 or later.  See the file COPYING for more details.
+ */
+
+#include <xen/config.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/event.h>
+#include <xen/event_fifo.h>
+#include <xen/paging.h>
+#include <xen/mm.h>
+
+#include <public/event_channel.h>
+
+static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d,
+                                                       unsigned port)
+{
+    unsigned p, w;
+
+    if ( unlikely(port >= d->evtchn_fifo->num_evtchns) )
+        return NULL;
+
+    p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
+    w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
+
+    return d->evtchn_fifo->event_array[p].virt + w;
+}
+
+static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
+{
+    event_word_t n, o, w;
+
+    w = *word;
+
+    do {
+        if ( !(w & (1 << EVTCHN_FIFO_LINKED)) )
+            return 0;
+        o = w;
+        n = (w & ~EVTCHN_FIFO_LINK_MASK) | link;
+    } while ( (w = cmpxchg(word, o, n)) != o );
+
+    return 1;
+}
+
+static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+{
+    struct domain *d = v->domain;
+    unsigned port;
+    event_word_t *word;
+    struct evtchn_fifo_queue *q;
+    unsigned long flags;
+    bool_t was_pending;
+
+    port = evtchn->port;
+    word = evtchn_fifo_word_from_port(d, port);
+    if ( unlikely(!word) )
+        return;
+
+    /*
+     * No locking around getting the queue. This may race with
+     * changing the priority but we are allowed to signal the event
+     * once on the old priority.
+     */
+    q = evtchn->queue;
+
+    was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word);
+
+    /*
+     * Link the event if it unmasked and not already linked.
+     */
+    if ( !test_bit(EVTCHN_FIFO_MASKED, word)
+         && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) )
+    {
+        event_word_t *tail_word;
+        bool_t linked = 0;
+
+        spin_lock_irqsave(&q->lock, flags);
+
+        /*
+         * Atomically link the tail to port iff the tail is linked.
+         * If the tail is unlinked the queue is empty.
+         *
+         * If port is the same as tail, the queue is empty but q->tail
+         * will appear linked as we just set LINKED above.
+         *
+         * If the queue is empty (i.e., we haven't linked to the new
+         * event), head must be updated.
+         */
+        if ( port != q->tail )
+        {
+            tail_word = evtchn_fifo_word_from_port(d, q->tail);
+            linked = evtchn_fifo_set_link(tail_word, port);
+        }
+        if ( !linked )
+            write_atomic(q->head, port);
+        q->tail = port;
+
+        spin_unlock_irqrestore(&q->lock, flags);
+
+        if ( !test_and_set_bit(q->priority, &v->evtchn_fifo->control_block->ready) )
+            vcpu_mark_events_pending(v);
+    }
+
+    if ( !was_pending )
+        evtchn_check_pollers(d, port);
+}
+
+static void evtchn_fifo_clear_pending(struct domain *d, struct evtchn *evtchn)
+{
+    event_word_t *word;
+
+    word = evtchn_fifo_word_from_port(d, evtchn->port);
+    if ( unlikely(!word) )
+        return;
+
+    /*
+     * Just clear the P bit.
+     *
+     * No need to unlink as the guest will unlink and ignore
+     * non-pending events.
+     */
+    clear_bit(EVTCHN_FIFO_PENDING, word);
+}
+
+static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn)
+{
+    struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id];
+    event_word_t *word;
+
+    word = evtchn_fifo_word_from_port(d, evtchn->port);
+    if ( unlikely(!word) )
+        return;
+
+    clear_bit(EVTCHN_FIFO_MASKED, word);
+
+    /* Relink if pending. */
+    if ( test_bit(EVTCHN_FIFO_PENDING, word ) )
+        evtchn_fifo_set_pending(v, evtchn);
+}
+
+static bool_t evtchn_fifo_is_pending(struct domain *d,
+                                     const struct evtchn *evtchn)
+{
+    event_word_t *word;
+
+    word = evtchn_fifo_word_from_port(d, evtchn->port);
+    if ( unlikely(!word) )
+        return 0;
+
+    return test_bit(EVTCHN_FIFO_PENDING, word);
+}
+
+static bool_t evtchn_fifo_is_masked(struct domain *d,
+                                    const struct evtchn *evtchn)
+{
+    event_word_t *word;
+
+    word = evtchn_fifo_word_from_port(d, evtchn->port);
+    if ( unlikely(!word) )
+        return 1;
+
+    return test_bit(EVTCHN_FIFO_MASKED, word);
+}
+
+static void evtchn_fifo_bind_to_vcpu(struct domain *d, struct evtchn *evtchn,
+                                     struct vcpu *v)
+{
+    unsigned priority;
+
+    /* Keep the same priority if possible, otherwise use the
+       default. */
+    if ( evtchn->queue )
+        priority = evtchn->queue->priority;
+    else
+        priority = EVTCHN_FIFO_PRIORITY_DEFAULT;
+
+    evtchn->queue = &v->evtchn_fifo->queue[priority];
+}
+
+static const struct evtchn_port_ops evtchn_port_ops_fifo =
+{
+    .set_pending   = evtchn_fifo_set_pending,
+    .clear_pending = evtchn_fifo_clear_pending,
+    .unmask        = evtchn_fifo_unmask,
+    .is_pending    = evtchn_fifo_is_pending,
+    .is_masked     = evtchn_fifo_is_masked,
+    .bind_to_vcpu  = evtchn_fifo_bind_to_vcpu,
+};
+
+static int map_guest_page(struct domain *d, uint64_t gfn,
+                          struct page_info **page, void **virt)
+{
+    struct page_info *p;
+
+    p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !p )
+        return -EINVAL;
+
+    if ( !get_page_type(p, PGT_writable_page) )
+    {
+        put_page(p);
+        return -EINVAL;
+    }
+
+    *virt = map_domain_page_global(gfn);
+    if ( *virt == NULL )
+    {
+        put_page_and_type(p);
+        return -ENOMEM;
+    }
+    *page = p;
+    return 0;
+}
+
+static void unmap_guest_page(struct page_info *page, void *virt)
+{
+    if ( page == NULL )
+        return;
+
+    unmap_domain_page_global(virt);
+    put_page_and_type(page);
+}
+
+static void cleanup_control_block(struct vcpu *v)
+{
+    if ( v->evtchn_fifo )
+    {
+        unmap_guest_page(v->evtchn_fifo->cb_page, v->evtchn_fifo->control_block);
+        xfree(v->evtchn_fifo);
+        v->evtchn_fifo = NULL;
+    }
+}
+
+static void init_queue(struct vcpu *v, struct evtchn_fifo_queue *q, unsigned i)
+{
+    spin_lock_init(&q->lock);
+    q->priority = i;
+    q->head = &v->evtchn_fifo->control_block->head[i];
+}
+
+static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset)
+{
+    struct domain *d = v->domain;
+    struct evtchn_fifo_vcpu *efv;
+    struct page_info *page;
+    void *virt;
+    unsigned i;
+    int rc;
+
+    if ( v->evtchn_fifo )
+        return -EINVAL;
+
+    efv = xzalloc(struct evtchn_fifo_vcpu);
+    if ( efv == NULL )
+        return -ENOMEM;
+
+    rc = map_guest_page(d, gfn, &page, &virt);
+    if ( rc < 0 )
+    {
+        xfree(efv);
+        return rc;
+    }
+
+    v->evtchn_fifo = efv;
+
+    v->evtchn_fifo->cb_page       = page;
+    v->evtchn_fifo->control_block = virt + offset;
+
+    for ( i = 0; i <= EVTCHN_FIFO_PRIORITY_MIN; i++ )
+        init_queue(v, &v->evtchn_fifo->queue[i], i);
+ 
+    return 0;
+}
+
+/*
+ * Setup an event array with no pages.
+ */
+static int setup_event_array(struct domain *d)
+{
+    if ( d->evtchn_fifo )
+        return 0;
+
+    d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain);
+    if ( d->evtchn_fifo == NULL )
+        return -ENOMEM;
+
+    d->evtchn_fifo->num_evtchns = 0;
+
+    return 0;
+}
+
+/*
+ * Some ports may already be bound, bind them to the correct VCPU so
+ * they have a valid queue.
+ *
+ * Note: any events that are currently pending will not be resent and
+ * will be lost.
+ */
+static void rebind_all_ports(struct domain *d)
+{
+    unsigned port;
+
+    for ( port = 1; port < d->max_evtchns; port++ )
+    {
+        struct evtchn *evtchn;
+
+        if ( !port_is_valid(d, port) )
+            break;
+
+        evtchn = evtchn_from_port(d, port);
+        switch ( evtchn->state )
+        {
+        case ECS_INTERDOMAIN:
+        case ECS_PIRQ:
+        case ECS_VIRQ:
+            evtchn_port_bind_to_vcpu(d, evtchn, d->vcpu[evtchn->notify_vcpu_id]);
+            break;
+        default:
+            break;
+        }
+    }
+}
+
+static void cleanup_event_array(struct domain *d)
+{
+    unsigned i;
+
+    if ( d->evtchn_fifo == NULL )
+        return;
+
+    for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ )
+    {
+        unmap_guest_page(d->evtchn_fifo->event_array[i].page,
+                         d->evtchn_fifo->event_array[i].virt);
+    }
+    xfree(d->evtchn_fifo);
+}
+
+int evtchn_fifo_init_control(struct evtchn_init_control *init_control)
+{
+    struct domain *d = current->domain;
+    uint32_t vcpu_id;
+    uint64_t gfn;
+    uint32_t offset;
+    struct vcpu *v;
+    int rc;
+
+    init_control->link_bits = EVTCHN_FIFO_LINK_BITS;
+
+    vcpu_id = init_control->vcpu;
+    gfn     = init_control->control_mfn;
+    offset  = init_control->offset;
+
+    if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) )
+        return -ENOENT;
+    v = d->vcpu[vcpu_id];
+
+    /* Must not cross page boundary. */
+    if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) )
+        return -EINVAL;
+
+    /* Must be 8-bytes aligned. */
+    if ( offset & (8 - 1) )
+        return -EINVAL;
+
+    spin_lock(&d->event_lock);
+
+    rc = setup_control_block(v, gfn, offset);
+
+    /* If this is the first control block, setup an empty event array
+       and switch to the fifo port ops. */
+    if ( d->evtchn_fifo == NULL )
+    {
+        rc = setup_event_array(d);
+        if ( rc < 0 )
+            cleanup_control_block(v);
+        else
+        {
+            d->evtchn_port_ops = &evtchn_port_ops_fifo;
+            d->max_evtchns = 1 << EVTCHN_FIFO_LINK_BITS;
+            rebind_all_ports(d);
+        }
+    }
+
+    spin_unlock(&d->event_lock);
+
+    return rc;
+}
+
+static int add_page_to_event_array(struct domain *d, unsigned long gfn)
+{
+    struct page_info *page = NULL;
+    void *virt;
+    unsigned slot;
+    int rc;
+
+    slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
+    if ( slot >= EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES )
+        return -ENOSPC;
+
+    rc = map_guest_page(d, gfn, &page, &virt);
+    if ( rc < 0 )
+        return rc;
+
+    d->evtchn_fifo->event_array[slot].page = page;
+    d->evtchn_fifo->event_array[slot].virt = virt;
+
+    d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
+
+    return 0;
+}
+
+int evtchn_fifo_expand_array(struct evtchn_expand_array *expand_array)
+{
+    struct domain *d = current->domain;
+    int rc;
+
+    spin_lock(&d->event_lock);
+    rc = add_page_to_event_array(d, expand_array->array_mfn);
+    spin_unlock(&d->event_lock);
+
+    return rc;
+}
+
+int evtchn_fifo_set_priority(struct evtchn_set_priority *set_priority)
+{
+    struct domain *d = current->domain;
+    struct vcpu *v;
+    uint32_t priority;
+    unsigned port;
+    struct evtchn *evtchn;
+
+    priority = set_priority->priority;
+    port     = set_priority->port;
+    
+    if ( priority > EVTCHN_FIFO_PRIORITY_MIN )
+        return -EINVAL;
+
+    spin_lock(&d->event_lock);
+
+    if ( !port_is_valid(d, port) )
+    {
+        spin_unlock(&d->event_lock);
+        return -EINVAL;
+    }
+
+    /*
+     * Switch to the new queue for future events. If the event is
+     * already pending or in the process of being linked it will be on
+     * the old queue -- this is fine.
+     */
+    evtchn = evtchn_from_port(d, port);
+    v = d->vcpu[evtchn->notify_vcpu_id];
+    evtchn->queue = &v->evtchn_fifo->queue[priority];
+
+    spin_unlock(&d->event_lock);
+
+    return 0;
+}
+
+int evtchn_fifo_set_limit(struct evtchn_set_limit *set_limit)
+{
+    /* FIXME: not supported yet. */
+    return -ENOSYS;
+}
+
+void evtchn_fifo_destroy(struct domain *d)
+{
+    struct vcpu *v;
+
+    for_each_vcpu( d, v )
+        cleanup_control_block(v);
+    cleanup_event_array(d);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index a795ae6..3a71956 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -127,12 +127,14 @@ struct evtchn_port_ops {
                          struct vcpu *vcpu);
 };
 
-static inline void evtchn_port_set_pending(struct vcpu *v, struct evtchn *evtchn)
+static inline void evtchn_port_set_pending(struct vcpu *v,
+                                           struct evtchn *evtchn)
 {
     v->domain->evtchn_port_ops->set_pending(v, evtchn);
 }
 
-static inline void evtchn_port_clear_pending(struct domain *d, struct evtchn *evtchn)
+static inline void evtchn_port_clear_pending(struct domain *d,
+                                             struct evtchn *evtchn)
 {
     d->evtchn_port_ops->clear_pending(d, evtchn);
 }
diff --git a/xen/include/xen/event_fifo.h b/xen/include/xen/event_fifo.h
new file mode 100644
index 0000000..2d7b7d0
--- /dev/null
+++ b/xen/include/xen/event_fifo.h
@@ -0,0 +1,56 @@
+/*
+ * FIFO-based event channel ABI.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2 or later.  See the file COPYING for more details.
+ */
+#ifndef __XEN_EVENT_FIFO_H__
+#define __XEN_EVENT_FIFO_H__
+
+struct evtchn_fifo_queue {
+    uint32_t *head; /* points into control block */
+    uint32_t tail;
+    spinlock_t lock;
+    uint8_t priority;
+};
+
+struct evtchn_fifo_vcpu {
+    struct page_info *cb_page;
+    struct evtchn_fifo_control_block *control_block;
+    struct evtchn_fifo_queue queue[EVTCHN_FIFO_MAX_QUEUES];
+};
+
+#define EVTCHN_FIFO_EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES \
+    ((1 << EVTCHN_FIFO_LINK_BITS) / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE)
+
+
+struct evtchn_fifo_array_page {
+    struct page_info *page;
+    event_word_t *virt;
+};
+
+struct evtchn_fifo_domain {
+    struct evtchn_fifo_array_page event_array[EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES];
+    unsigned num_evtchns;
+};
+
+int evtchn_fifo_init_control(struct evtchn_init_control *init_control);
+int evtchn_fifo_expand_array(struct evtchn_expand_array *expand_array);
+int evtchn_fifo_set_priority(struct evtchn_set_priority *set_priority);
+int evtchn_fifo_set_limit(struct evtchn_set_limit *set_limit);
+void evtchn_fifo_destroy(struct domain *domain);
+
+#endif /* __XEN_EVENT_FIFO_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f95cf99..4c5cc20 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -96,6 +96,7 @@ struct evtchn
         } pirq;        /* state == ECS_PIRQ */
         u16 virq;      /* state == ECS_VIRQ */
     } u;
+    struct evtchn_fifo_queue *queue;
 #ifdef FLASK_ENABLE
     void *ssid;
 #endif
@@ -210,6 +211,8 @@ struct vcpu
     /* Guest-specified relocation of vcpu_info. */
     unsigned long vcpu_info_mfn;
 
+    struct evtchn_fifo_vcpu *evtchn_fifo;
+
     struct arch_vcpu arch;
 };
 
@@ -290,6 +293,7 @@ struct domain
     unsigned         max_evtchns;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
+    struct evtchn_fifo_domain *evtchn_fifo;
 
     struct grant_table *grant_table;
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] Xen: FIFO-based event channel ABI
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (7 preceding siblings ...)
  2013-08-09 18:08 ` [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops David Vrabel
@ 2013-08-12 13:06 ` George Dunlap
  2013-08-12 13:49   ` David Vrabel
  2013-08-16 16:55 ` Wei Liu
  9 siblings, 1 reply; 28+ messages in thread
From: George Dunlap @ 2013-08-12 13:06 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Wei Liu, xen-devel

On Fri, Aug 9, 2013 at 7:08 PM, David Vrabel <david.vrabel@citrix.com> wrote:
> This is an RFC of the FIFO-based event channel ABI described in this
> design document:
>

Do you happen to have a public git tree for the lazy? :-)

 -George

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] Xen: FIFO-based event channel ABI
  2013-08-12 13:06 ` [RFC PATCH 0/8] Xen: FIFO-based event channel ABI George Dunlap
@ 2013-08-12 13:49   ` David Vrabel
  0 siblings, 0 replies; 28+ messages in thread
From: David Vrabel @ 2013-08-12 13:49 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Wei Liu, xen-devel

On 12/08/13 14:06, George Dunlap wrote:
> On Fri, Aug 9, 2013 at 7:08 PM, David Vrabel <david.vrabel@citrix.com> wrote:
>> This is an RFC of the FIFO-based event channel ABI described in this
>> design document:
>>
> 
> Do you happen to have a public git tree for the lazy? :-)

The orochi-v2 branch of:

git://xenbits.xen.org/people/dvrabel/xen.git

And the same for

git://xenbits.xen.org/people/dvrabel/linux.git

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys
  2013-08-09 18:08 ` [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys David Vrabel
@ 2013-08-15 13:55   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-08-15 13:55 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Wei Liu, xen-devel

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> With different event channel ABIs, showing the 2-level selector word
> may be misleading (if a different ABI is in use).

I'm not sure removing potentially useful information is good
practice. How about you instead add an ABI specific dump
handler? Might be useful to also dump non-generic state
for your new implementation.

> When dumping domain information, printing the state of the VIRQ_DEBUG
> port is redundant -- this information is available via the 'e' key.

This part is clearly fine.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/8] evtchn: refactor low-level event channel port ops
  2013-08-09 18:08 ` [PATCH 2/8] evtchn: refactor low-level event channel port ops David Vrabel
@ 2013-08-15 14:05   ` Jan Beulich
  2013-09-06 14:25     ` David Vrabel
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2013-08-15 14:05 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Wei Liu

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> +static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
> +{
> +    struct domain *d = v->domain;
> +    unsigned port = evtchn->port;
> +
> +    /*
> +     * The following bit operations must happen in strict order.
> +     * NB. On x86, the atomic bit operations also act as memory barriers.
> +     * There is therefore sufficiently strict ordering for this architecture --
> +     * others may require explicit memory barriers.
> +     */
> +
> +    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
> +        return;
> +
> +    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
> +         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
> +                           &vcpu_info(v, evtchn_pending_sel)) )

Up to here this is indeed 2-level specific, but the rest of the
function isn't, and would therefore better go back into
generic code.

> +    {
> +        vcpu_mark_events_pending(v);
> +    }
> +
> +    evtchn_check_pollers(d, port);
> +}
>[...]
> +static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn)
> +{
> +    struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id];
> +    unsigned port = evtchn->port;
> +
> +    /*
> +     * These operations must happen in strict order. Based on
> +     * evtchn_2l_set_pending() above.
> +     */
> +    if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
> +         test_bit          (port, &shared_info(d, evtchn_pending)) &&
> +         !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d),
> +                            &vcpu_info(v, evtchn_pending_sel)) )

Similarly here.

> +    {
> +        vcpu_mark_events_pending(v);
> +    }
> +}
>[...]
> @@ -615,43 +616,10 @@ out:
>  
>  static void evtchn_set_pending(struct vcpu *v, int port)
>  {
> -    struct domain *d = v->domain;
> -    int vcpuid;
> -
> -    /*
> -     * The following bit operations must happen in strict order.
> -     * NB. On x86, the atomic bit operations also act as memory barriers.
> -     * There is therefore sufficiently strict ordering for this architecture 
> --
> -     * others may require explicit memory barriers.
> -     */
> -
> -    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
> -        return;
> -
> -    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
> -         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
> -                           &vcpu_info(v, evtchn_pending_sel)) )
> -    {
> -        vcpu_mark_events_pending(v);
> -    }
> -    
> -    /* Check if some VCPU might be polling for this event. */
> -    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
> -        return;
> +    struct evtchn *evtchn;
>  
> -    /* Wake any interested (or potentially interested) pollers. */
> -    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
> -          vcpuid < d->max_vcpus;
> -          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
> -    {
> -        v = d->vcpu[vcpuid];
> -        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
> -             test_and_clear_bit(vcpuid, d->poll_mask) )
> -        {
> -            v->poll_evtchn = 0;
> -            vcpu_unblock(v);
> -        }
> -    }
> +    evtchn = evtchn_from_port(v->domain, port);
> +    evtchn_port_set_pending(v, evtchn);

I know it's a matter of taste, but having a variable that's used
just once is usually done only whether otherwise the code
would get much harder to read. Similarly further down.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] evtchn: use a per-domain variable for the max number of event channels
  2013-08-09 18:08 ` [PATCH 4/8] evtchn: use a per-domain variable for the max number of event channels David Vrabel
@ 2013-08-15 14:09   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-08-15 14:09 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Wei Liu

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> --- a/xen/include/xen/event.h
> +++ b/xen/include/xen/event.h
> @@ -73,7 +73,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
>  #define bucket_from_port(d,p) \
>      ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET])
>  #define port_is_valid(d,p)    \
> -    (((p) >= 0) && ((p) < MAX_EVTCHNS(d)) && \
> +    (((p) >= 0) && ((p) < d->max_evtchns) && \

Please parenthesize d.

Also I suppose this patch intends to eliminate all uses of
MAX_EVTCHNS() - why does it not remove the macro?

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/8] evtchn: dynamically allocate d->evtchn
  2013-08-09 18:08 ` [PATCH 5/8] evtchn: dynamically allocate d->evtchn David Vrabel
@ 2013-08-15 14:10   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-08-15 14:10 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Wei Liu

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> As we move to extended evtchn ABI we need bigger d->evtchn, as a result
> this will bloat struct domain. So move this array out of struct domain
> and allocate a dedicated array for it.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] evtchn: alter internal object handling scheme
  2013-08-09 18:08 ` [PATCH 6/8] evtchn: alter internal object handling scheme David Vrabel
@ 2013-08-15 14:21   ` Jan Beulich
  2013-08-15 15:46     ` David Vrabel
  2013-08-16 16:55   ` Wei Liu
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2013-08-15 14:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Malcolm Crossley, Keir Fraser, Wei Liu

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> +#define BUCKETS_PER_GROUP  (PAGE_SIZE/sizeof(struct evtchn *))
> +/* Round size of struct evtchn up to power of 2 size */
> +#define b2(x)   (   (x) | (   (x) >> 1) )
> +#define b4(x)   ( b2(x) | ( b2(x) >> 2) )
> +#define b8(x)   ( b4(x) | ( b4(x) >> 4) )
> +#define b16(x)  ( b8(x) | ( b8(x) >> 8) )
> +#define b32(x)  (b16(x) | (b16(x) >>16) )

The names of these are too generic.

> +#define next_power_of_2(x)      (b32(x-1) + 1)

x needs to be parenthesized.

> +/* Maximum number of event channels for any ABI. */
> +#define MAX_NR_EVTCHNS (max_t(unsigned, NR_EVENT_CHANNELS,  \
> +                              1 << EVTCHN_FIFO_LINK_BITS))

I don't recall EVTCHN_FIFO_LINK_BITS having got defined in earlier
patches, and it certainly doesn't get defined in this one.

> +#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn)))
> +#define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
> +#define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)

So for the 2-level case this still results in a full page allocation for
the top level structure. Not too nice...

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/8] evtchn: add FIFO-based event channel ABI
  2013-08-09 18:08 ` [PATCH 7/8] evtchn: add FIFO-based event channel ABI David Vrabel
@ 2013-08-15 14:25   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-08-15 14:25 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Wei Liu

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> +struct evtchn_init_control {
> +    /* IN parameters. */
> +    uint64_t control_mfn;
> +    uint32_t offset;
> +    uint32_t vcpu;
> +    /* OUT parameters. */
> +    uint8_t link_bits;
> +};

This needs explicit padding at the end, or else its size for 32-bit
and 64-bit guests will differ.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] evtchn: alter internal object handling scheme
  2013-08-15 14:21   ` Jan Beulich
@ 2013-08-15 15:46     ` David Vrabel
  2013-08-16  7:14       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-15 15:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Malcolm Crossley, Keir Fraser, Wei Liu

On 15/08/13 15:21, Jan Beulich wrote:
>>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
>> +#define BUCKETS_PER_GROUP  (PAGE_SIZE/sizeof(struct evtchn *))
>> +/* Round size of struct evtchn up to power of 2 size */
>> +#define b2(x)   (   (x) | (   (x) >> 1) )
>> +#define b4(x)   ( b2(x) | ( b2(x) >> 2) )
>> +#define b8(x)   ( b4(x) | ( b4(x) >> 4) )
>> +#define b16(x)  ( b8(x) | ( b8(x) >> 8) )
>> +#define b32(x)  (b16(x) | (b16(x) >>16) )

>> +/* Maximum number of event channels for any ABI. */
>> +#define MAX_NR_EVTCHNS (max_t(unsigned, NR_EVENT_CHANNELS,  \
>> +                              1 << EVTCHN_FIFO_LINK_BITS))
> 
>> +#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct evtchn)))
>> +#define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
>> +#define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
> 
> So for the 2-level case this still results in a full page allocation for
> the top level structure. Not too nice...

Without XSM enabled sizeof(struct evtchn) is 32 so:

EVTCHNS_PER_BUCKET = 128
BUCKETS_PER_GROUP = 512
NR_EVTCHN_GROUPS = 2

The minimal allocation is then:

1 bucket page, 1 group page and 16 bytes[*] for the group pointers.

Which I agree is 1 page more than previously.  Is saving 1 xenheap page
per domain worth adding extra complexity to the lookup?

We could drop patch 4 (dynamically allocate d->evtchns) and have a array
in struct domain since it is only a 2 element array.

David

[*] not sure what the minimum size for a xmalloc() allocation is.  Is it
actually 128 bytes?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] evtchn: alter internal object handling scheme
  2013-08-15 15:46     ` David Vrabel
@ 2013-08-16  7:14       ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-08-16  7:14 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Malcolm Crossley, KeirFraser, Wei Liu

>>> On 15.08.13 at 17:46, David Vrabel <david.vrabel@citrix.com> wrote:
> On 15/08/13 15:21, Jan Beulich wrote:
>>>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
>>> +#define BUCKETS_PER_GROUP  (PAGE_SIZE/sizeof(struct evtchn *))
>>> +/* Round size of struct evtchn up to power of 2 size */
>>> +#define b2(x)   (   (x) | (   (x) >> 1) )
>>> +#define b4(x)   ( b2(x) | ( b2(x) >> 2) )
>>> +#define b8(x)   ( b4(x) | ( b4(x) >> 4) )
>>> +#define b16(x)  ( b8(x) | ( b8(x) >> 8) )
>>> +#define b32(x)  (b16(x) | (b16(x) >>16) )
> 
>>> +/* Maximum number of event channels for any ABI. */
>>> +#define MAX_NR_EVTCHNS (max_t(unsigned, NR_EVENT_CHANNELS,  \
>>> +                              1 << EVTCHN_FIFO_LINK_BITS))
>> 
>>> +#define EVTCHNS_PER_BUCKET (PAGE_SIZE / next_power_of_2(sizeof(struct 
> evtchn)))
>>> +#define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
>>> +#define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
>> 
>> So for the 2-level case this still results in a full page allocation for
>> the top level structure. Not too nice...
> 
> Without XSM enabled sizeof(struct evtchn) is 32 so:
> 
> EVTCHNS_PER_BUCKET = 128
> BUCKETS_PER_GROUP = 512
> NR_EVTCHN_GROUPS = 2
> 
> The minimal allocation is then:
> 
> 1 bucket page, 1 group page and 16 bytes[*] for the group pointers.
> 
> Which I agree is 1 page more than previously.  Is saving 1 xenheap page
> per domain worth adding extra complexity to the lookup?

Depends on the amount of extra complexity.

> We could drop patch 4 (dynamically allocate d->evtchns) and have a array
> in struct domain since it is only a 2 element array.

Hmm, that's with the 17 bits you currently use? And would double
with each eventual increment? A 2-element array is certainly fine
in struct domain, but we ought to set some boundary (e.g. via
BUILD_BUG_ON()) to trigger implementation of dynamic allocation
then.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops
  2013-08-09 18:08 ` [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops David Vrabel
@ 2013-08-16 16:33   ` Wei Liu
  2013-08-19 10:32     ` David Vrabel
  2013-08-23 10:33   ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Wei Liu @ 2013-08-16 16:33 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, Keir Fraser, xen-devel

On Fri, Aug 09, 2013 at 07:08:40PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Add the implementation for the FIFO-based event channel ABI.  The new
> hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array,
> EVTCHNOP_set_priority) and the required evtchn_ops (set_pending,
> unmask, etc.).
> 
> This current implementation has three main limitations:
> 
> - EVTCHNOP_set_limit is not yet implemented so any guest will be able
>   to use up to 2^17 (requiring 128 global mapping pages for a fully
>   populated event array).
> 
> - The control block frames are not required to be shared with the
>   vcpu_info structure.  This requires an additional global mapping
>   page per-VCPU.  This does makes the guest implementation cleaner
>   though so perhaps we do not need to fix this?
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/common/Makefile          |    1 +
>  xen/common/event_channel.c   |   37 ++++
>  xen/common/event_fifo.c      |  491 ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/event.h      |    6 +-
>  xen/include/xen/event_fifo.h |   56 +++++
>  xen/include/xen/sched.h      |    4 +
>  6 files changed, 593 insertions(+), 2 deletions(-)
>  create mode 100644 xen/common/event_fifo.c
>  create mode 100644 xen/include/xen/event_fifo.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index ef03eac..afd7d40 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -7,6 +7,7 @@ obj-y += domctl.o
>  obj-y += domain.o
>  obj-y += event_2l.o
>  obj-y += event_channel.o
> +obj-y += event_fifo.o
>  obj-y += grant_table.o
>  obj-y += irq.o
>  obj-y += kernel.o
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index 67dcdbc..a048107 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -26,6 +26,7 @@
>  #include <xen/compat.h>
>  #include <xen/guest_access.h>
>  #include <xen/keyhandler.h>
> +#include <xen/event_fifo.h>
>  #include <asm/current.h>
>  
>  #include <public/xen.h>
> @@ -1053,6 +1054,40 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case EVTCHNOP_init_control: {
> +        struct evtchn_init_control init_control;
> +        if ( copy_from_guest(&init_control, arg, 1) != 0 )
> +            return -EFAULT;
> +        rc = evtchn_fifo_init_control(&init_control);
> +        if ( !rc && __copy_to_guest(arg, &init_control, 1) )
> +            rc = -EFAULT;
> +        break;
> +    }
> +
> +    case EVTCHNOP_expand_array: {
> +        struct evtchn_expand_array expand_array;
> +        if ( copy_from_guest(&expand_array, arg, 1) != 0 )
> +            return -EFAULT;
> +        rc = evtchn_fifo_expand_array(&expand_array);
> +        break;
> +    }
> +
> +    case EVTCHNOP_set_priority: {
> +        struct evtchn_set_priority set_priority;
> +        if ( copy_from_guest(&set_priority, arg, 1) != 0 )
> +            return -EFAULT;
> +        rc = evtchn_fifo_set_priority(&set_priority);
> +        break;
> +    }
> +
> +    case EVTCHNOP_set_limit: {
> +        struct evtchn_set_limit set_limit;
> +        if ( copy_from_guest(&set_limit, arg, 1) != 0 )
> +            return -EFAULT;
> +        rc = evtchn_fifo_set_limit(&set_limit);
> +        break;
> +    }
> +
>      default:
>          rc = -ENOSYS;
>          break;
> @@ -1214,6 +1249,8 @@ void evtchn_destroy(struct domain *d)
>          (void)__evtchn_close(d, i);
>      }
>  
> +    evtchn_fifo_destroy(d);
> +

Don't need to put this inside event_lock critical region?

>      /* Free all event-channel buckets. */
>      spin_lock(&d->event_lock);
>      for ( i = 0; i < NR_EVTCHN_GROUPS; i++ )
> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> new file mode 100644
> index 0000000..a7b5dc9
> --- /dev/null
> +++ b/xen/common/event_fifo.c
> @@ -0,0 +1,491 @@
> +/*
> + * FIFO event channel management.
> + *
> + * Copyright (C) 2013 Citrix Systems R&D Ltd.
> + * 
> + * This source code is licensed under the GNU General Public License,
> + * Version 2 or later.  See the file COPYING for more details.
> + */
> +
> +#include <xen/config.h>
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/event.h>
> +#include <xen/event_fifo.h>
> +#include <xen/paging.h>
> +#include <xen/mm.h>
> +
> +#include <public/event_channel.h>
> +
> +static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d,
> +                                                       unsigned port)
> +{
> +    unsigned p, w;
> +
> +    if ( unlikely(port >= d->evtchn_fifo->num_evtchns) )
> +        return NULL;
> +
> +    p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
> +    w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
> +
> +    return d->evtchn_fifo->event_array[p].virt + w;
> +}
> +
> +static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
> +{
> +    event_word_t n, o, w;
> +
> +    w = *word;
> +
> +    do {
> +        if ( !(w & (1 << EVTCHN_FIFO_LINKED)) )
> +            return 0;
> +        o = w;
> +        n = (w & ~EVTCHN_FIFO_LINK_MASK) | link;
> +    } while ( (w = cmpxchg(word, o, n)) != o );
> +
> +    return 1;
> +}
> +
> +static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
> +{
> +    struct domain *d = v->domain;
> +    unsigned port;
> +    event_word_t *word;
> +    struct evtchn_fifo_queue *q;
> +    unsigned long flags;
> +    bool_t was_pending;
> +
> +    port = evtchn->port;
> +    word = evtchn_fifo_word_from_port(d, port);
> +    if ( unlikely(!word) )
> +        return;
> +
> +    /*
> +     * No locking around getting the queue. This may race with
> +     * changing the priority but we are allowed to signal the event
> +     * once on the old priority.
> +     */
> +    q = evtchn->queue;
> +
> +    was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word);
> +
> +    /*
> +     * Link the event if it unmasked and not already linked.
> +     */
> +    if ( !test_bit(EVTCHN_FIFO_MASKED, word)
> +         && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) )
> +    {
> +        event_word_t *tail_word;
> +        bool_t linked = 0;
> +
> +        spin_lock_irqsave(&q->lock, flags);
> +
> +        /*
> +         * Atomically link the tail to port iff the tail is linked.
> +         * If the tail is unlinked the queue is empty.
> +         *
> +         * If port is the same as tail, the queue is empty but q->tail
> +         * will appear linked as we just set LINKED above.
> +         *
> +         * If the queue is empty (i.e., we haven't linked to the new
> +         * event), head must be updated.
> +         */
> +        if ( port != q->tail )
> +        {
> +            tail_word = evtchn_fifo_word_from_port(d, q->tail);
> +            linked = evtchn_fifo_set_link(tail_word, port);
> +        }
> +        if ( !linked )
> +            write_atomic(q->head, port);
> +        q->tail = port;
> +
> +        spin_unlock_irqrestore(&q->lock, flags);
> +
> +        if ( !test_and_set_bit(q->priority, &v->evtchn_fifo->control_block->ready) )

Line too long.

> +            vcpu_mark_events_pending(v);
> +    }
> +
> +    if ( !was_pending )
> +        evtchn_check_pollers(d, port);
> +}
> +
> +static void evtchn_fifo_clear_pending(struct domain *d, struct evtchn *evtchn)
> +{
> +    event_word_t *word;
> +
> +    word = evtchn_fifo_word_from_port(d, evtchn->port);
> +    if ( unlikely(!word) )
> +        return;
> +
> +    /*
> +     * Just clear the P bit.
> +     *
> +     * No need to unlink as the guest will unlink and ignore
> +     * non-pending events.
> +     */
> +    clear_bit(EVTCHN_FIFO_PENDING, word);
> +}
> +
> +static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn)
> +{
> +    struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id];
> +    event_word_t *word;
> +
> +    word = evtchn_fifo_word_from_port(d, evtchn->port);
> +    if ( unlikely(!word) )
> +        return;
> +
> +    clear_bit(EVTCHN_FIFO_MASKED, word);
> +
> +    /* Relink if pending. */
> +    if ( test_bit(EVTCHN_FIFO_PENDING, word ) )

Extra space after "word".

> +        evtchn_fifo_set_pending(v, evtchn);
> +}
> +
> +static bool_t evtchn_fifo_is_pending(struct domain *d,
> +                                     const struct evtchn *evtchn)
> +{
> +    event_word_t *word;
> +
> +    word = evtchn_fifo_word_from_port(d, evtchn->port);
> +    if ( unlikely(!word) )
> +        return 0;
> +
> +    return test_bit(EVTCHN_FIFO_PENDING, word);
> +}
> +
> +static bool_t evtchn_fifo_is_masked(struct domain *d,
> +                                    const struct evtchn *evtchn)
> +{
> +    event_word_t *word;
> +
> +    word = evtchn_fifo_word_from_port(d, evtchn->port);
> +    if ( unlikely(!word) )
> +        return 1;
> +
> +    return test_bit(EVTCHN_FIFO_MASKED, word);
> +}
> +
> +static void evtchn_fifo_bind_to_vcpu(struct domain *d, struct evtchn *evtchn,
> +                                     struct vcpu *v)
> +{
> +    unsigned priority;
> +
> +    /* Keep the same priority if possible, otherwise use the
> +       default. */

Comment style.

> +    if ( evtchn->queue )
> +        priority = evtchn->queue->priority;
> +    else
> +        priority = EVTCHN_FIFO_PRIORITY_DEFAULT;
> +
> +    evtchn->queue = &v->evtchn_fifo->queue[priority];
> +}
> +
> +static const struct evtchn_port_ops evtchn_port_ops_fifo =
> +{
> +    .set_pending   = evtchn_fifo_set_pending,
> +    .clear_pending = evtchn_fifo_clear_pending,
> +    .unmask        = evtchn_fifo_unmask,
> +    .is_pending    = evtchn_fifo_is_pending,
> +    .is_masked     = evtchn_fifo_is_masked,
> +    .bind_to_vcpu  = evtchn_fifo_bind_to_vcpu,
> +};
> +
> +static int map_guest_page(struct domain *d, uint64_t gfn,
> +                          struct page_info **page, void **virt)
> +{
> +    struct page_info *p;
> +
> +    p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> +    if ( !p )
> +        return -EINVAL;
> +
> +    if ( !get_page_type(p, PGT_writable_page) )
> +    {
> +        put_page(p);
> +        return -EINVAL;
> +    }
> +
> +    *virt = map_domain_page_global(gfn);
> +    if ( *virt == NULL )
> +    {
> +        put_page_and_type(p);
> +        return -ENOMEM;
> +    }
> +    *page = p;
> +    return 0;
> +}
> +
> +static void unmap_guest_page(struct page_info *page, void *virt)
> +{
> +    if ( page == NULL )
> +        return;
> +
> +    unmap_domain_page_global(virt);
> +    put_page_and_type(page);
> +}
> +
> +static void cleanup_control_block(struct vcpu *v)
> +{
> +    if ( v->evtchn_fifo )
> +    {
> +        unmap_guest_page(v->evtchn_fifo->cb_page, v->evtchn_fifo->control_block);
> +        xfree(v->evtchn_fifo);
> +        v->evtchn_fifo = NULL;
> +    }
> +}
> +
> +static void init_queue(struct vcpu *v, struct evtchn_fifo_queue *q, unsigned i)
> +{
> +    spin_lock_init(&q->lock);
> +    q->priority = i;
> +    q->head = &v->evtchn_fifo->control_block->head[i];
> +}
> +
> +static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset)
> +{
> +    struct domain *d = v->domain;
> +    struct evtchn_fifo_vcpu *efv;
> +    struct page_info *page;
> +    void *virt;
> +    unsigned i;
> +    int rc;
> +
> +    if ( v->evtchn_fifo )
> +        return -EINVAL;
> +
> +    efv = xzalloc(struct evtchn_fifo_vcpu);
> +    if ( efv == NULL )
> +        return -ENOMEM;
> +
> +    rc = map_guest_page(d, gfn, &page, &virt);
> +    if ( rc < 0 )
> +    {
> +        xfree(efv);
> +        return rc;
> +    }
> +
> +    v->evtchn_fifo = efv;
> +
> +    v->evtchn_fifo->cb_page       = page;
> +    v->evtchn_fifo->control_block = virt + offset;
> +
> +    for ( i = 0; i <= EVTCHN_FIFO_PRIORITY_MIN; i++ )
> +        init_queue(v, &v->evtchn_fifo->queue[i], i);
> + 
> +    return 0;
> +}
> +
> +/*
> + * Setup an event array with no pages.
> + */
> +static int setup_event_array(struct domain *d)
> +{
> +    if ( d->evtchn_fifo )
> +        return 0;
> +
> +    d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain);

struct evtchn_fifo_domain is very big because it contains event_array
which has 1024 elements, however most domains will not use all 2^17
ports. Could you add in dynamically allocation for event_array?

> +    if ( d->evtchn_fifo == NULL )
> +        return -ENOMEM;
> +
> +    d->evtchn_fifo->num_evtchns = 0;
> +
> +    return 0;
> +}
> +
> +/*
> + * Some ports may already be bound, bind them to the correct VCPU so
> + * they have a valid queue.
> + *
> + * Note: any events that are currently pending will not be resent and
> + * will be lost.
> + */

Can this be fixed? I presume you can check the state of the port and set
it to pending in the new ABI when necessary? As you've got hold of the
event lock at this stage it should be easy to synchronize the state?

> +static void rebind_all_ports(struct domain *d)
> +{
> +    unsigned port;
> +
> +    for ( port = 1; port < d->max_evtchns; port++ )
> +    {
> +        struct evtchn *evtchn;
> +
> +        if ( !port_is_valid(d, port) )
> +            break;
> +
> +        evtchn = evtchn_from_port(d, port);
> +        switch ( evtchn->state )
> +        {
> +        case ECS_INTERDOMAIN:
> +        case ECS_PIRQ:
> +        case ECS_VIRQ:
> +            evtchn_port_bind_to_vcpu(d, evtchn, d->vcpu[evtchn->notify_vcpu_id]);
> +            break;
> +        default:
> +            break;
> +        }
> +    }
> +}
> +
> +static void cleanup_event_array(struct domain *d)
> +{
> +    unsigned i;
> +
> +    if ( d->evtchn_fifo == NULL )
> +        return;
> +
> +    for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ )
> +    {
> +        unmap_guest_page(d->evtchn_fifo->event_array[i].page,
> +                         d->evtchn_fifo->event_array[i].virt);

Suggest reset page and virt to NULL / 0, just like you did for 
cleanup_control_block.

> +    }
> +    xfree(d->evtchn_fifo);

Same for evtchn_fifo.

> +}
> +
> +int evtchn_fifo_init_control(struct evtchn_init_control *init_control)
> +{
> +    struct domain *d = current->domain;
> +    uint32_t vcpu_id;
> +    uint64_t gfn;
> +    uint32_t offset;
> +    struct vcpu *v;
> +    int rc;
> +
> +    init_control->link_bits = EVTCHN_FIFO_LINK_BITS;
> +
> +    vcpu_id = init_control->vcpu;
> +    gfn     = init_control->control_mfn;
> +    offset  = init_control->offset;
> +
> +    if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) )
> +        return -ENOENT;
> +    v = d->vcpu[vcpu_id];
> +
> +    /* Must not cross page boundary. */
> +    if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) )
> +        return -EINVAL;
> +
> +    /* Must be 8-bytes aligned. */
> +    if ( offset & (8 - 1) )
> +        return -EINVAL;
> +
> +    spin_lock(&d->event_lock);
> +
> +    rc = setup_control_block(v, gfn, offset);
> +
> +    /* If this is the first control block, setup an empty event array
> +       and switch to the fifo port ops. */

Comment style.

> +    if ( d->evtchn_fifo == NULL )
> +    {
> +        rc = setup_event_array(d);
> +        if ( rc < 0 )
> +            cleanup_control_block(v);
> +        else
> +        {
> +            d->evtchn_port_ops = &evtchn_port_ops_fifo;
> +            d->max_evtchns = 1 << EVTCHN_FIFO_LINK_BITS;
> +            rebind_all_ports(d);
> +        }
> +    }
> +
> +    spin_unlock(&d->event_lock);
> +
> +    return rc;
> +}
> +
> +static int add_page_to_event_array(struct domain *d, unsigned long gfn)
> +{
> +    struct page_info *page = NULL;
> +    void *virt;
> +    unsigned slot;
> +    int rc;
> +
> +    slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
> +    if ( slot >= EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES )
> +        return -ENOSPC;
> +
> +    rc = map_guest_page(d, gfn, &page, &virt);
> +    if ( rc < 0 )
> +        return rc;
> +
> +    d->evtchn_fifo->event_array[slot].page = page;
> +    d->evtchn_fifo->event_array[slot].virt = virt;
> +
> +    d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
> +
> +    return 0;
> +}
> +
> +int evtchn_fifo_expand_array(struct evtchn_expand_array *expand_array)
> +{
> +    struct domain *d = current->domain;
> +    int rc;
> +
> +    spin_lock(&d->event_lock);
> +    rc = add_page_to_event_array(d, expand_array->array_mfn);
> +    spin_unlock(&d->event_lock);
> +
> +    return rc;
> +}
> +
> +int evtchn_fifo_set_priority(struct evtchn_set_priority *set_priority)
> +{
> +    struct domain *d = current->domain;
> +    struct vcpu *v;
> +    uint32_t priority;
> +    unsigned port;
> +    struct evtchn *evtchn;
> +
> +    priority = set_priority->priority;
> +    port     = set_priority->port;
> +    
> +    if ( priority > EVTCHN_FIFO_PRIORITY_MIN )
> +        return -EINVAL;
> +
> +    spin_lock(&d->event_lock);
> +
> +    if ( !port_is_valid(d, port) )
> +    {
> +        spin_unlock(&d->event_lock);
> +        return -EINVAL;
> +    }
> +
> +    /*
> +     * Switch to the new queue for future events. If the event is
> +     * already pending or in the process of being linked it will be on
> +     * the old queue -- this is fine.
> +     */
> +    evtchn = evtchn_from_port(d, port);
> +    v = d->vcpu[evtchn->notify_vcpu_id];
> +    evtchn->queue = &v->evtchn_fifo->queue[priority];
> +
> +    spin_unlock(&d->event_lock);
> +
> +    return 0;
> +}
> +
> +int evtchn_fifo_set_limit(struct evtchn_set_limit *set_limit)
> +{
> +    /* FIXME: not supported yet. */
> +    return -ENOSYS;
> +}
> +
> +void evtchn_fifo_destroy(struct domain *d)
> +{
> +    struct vcpu *v;
> +
> +    for_each_vcpu( d, v )
> +        cleanup_control_block(v);
> +    cleanup_event_array(d);
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
> index a795ae6..3a71956 100644
> --- a/xen/include/xen/event.h
> +++ b/xen/include/xen/event.h
> @@ -127,12 +127,14 @@ struct evtchn_port_ops {
>                           struct vcpu *vcpu);
>  };
>  
> -static inline void evtchn_port_set_pending(struct vcpu *v, struct evtchn *evtchn)
> +static inline void evtchn_port_set_pending(struct vcpu *v,
> +                                           struct evtchn *evtchn)

This kind of adjustments should not happen as the styling problems are
introduced in patch 2. Could you fix them there?

>  {
>      v->domain->evtchn_port_ops->set_pending(v, evtchn);
>  }
>  
> -static inline void evtchn_port_clear_pending(struct domain *d, struct evtchn *evtchn)
> +static inline void evtchn_port_clear_pending(struct domain *d,
> +                                             struct evtchn *evtchn)

Ditto.

>  {
>      d->evtchn_port_ops->clear_pending(d, evtchn);
>  }
> diff --git a/xen/include/xen/event_fifo.h b/xen/include/xen/event_fifo.h
> new file mode 100644
> index 0000000..2d7b7d0
> --- /dev/null
> +++ b/xen/include/xen/event_fifo.h
> @@ -0,0 +1,56 @@
> +/*
> + * FIFO-based event channel ABI.
> + *
> + * Copyright (C) 2013 Citrix Systems R&D Ltd.
> + *
> + * This source code is licensed under the GNU General Public License,
> + * Version 2 or later.  See the file COPYING for more details.
> + */
> +#ifndef __XEN_EVENT_FIFO_H__
> +#define __XEN_EVENT_FIFO_H__
> +
> +struct evtchn_fifo_queue {
> +    uint32_t *head; /* points into control block */
> +    uint32_t tail;
> +    spinlock_t lock;
> +    uint8_t priority;
> +};
> +
> +struct evtchn_fifo_vcpu {
> +    struct page_info *cb_page;
> +    struct evtchn_fifo_control_block *control_block;
> +    struct evtchn_fifo_queue queue[EVTCHN_FIFO_MAX_QUEUES];
> +};
> +
> +#define EVTCHN_FIFO_EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
> +#define EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES \
> +    ((1 << EVTCHN_FIFO_LINK_BITS) / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE)

Should probably define a macro for (1 << EVTCHN_FIFO_LINK_BITS)? I see
you repeat this snip for twice in other places.


Wei.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] evtchn: alter internal object handling scheme
  2013-08-09 18:08 ` [PATCH 6/8] evtchn: alter internal object handling scheme David Vrabel
  2013-08-15 14:21   ` Jan Beulich
@ 2013-08-16 16:55   ` Wei Liu
  1 sibling, 0 replies; 28+ messages in thread
From: Wei Liu @ 2013-08-16 16:55 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, Malcolm Crossley, Keir Fraser, xen-devel

On Fri, Aug 09, 2013 at 07:08:38PM +0100, David Vrabel wrote:
[...]
>  #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_XEN_ULONG)
>  #endif
> -#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
> -#define EVTCHNS_PER_BUCKET 128
> -#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
> +
> +#define BUCKETS_PER_GROUP  (PAGE_SIZE/sizeof(struct evtchn *))
> +/* Round size of struct evtchn up to power of 2 size */
> +#define b2(x)   (   (x) | (   (x) >> 1) )
> +#define b4(x)   ( b2(x) | ( b2(x) >> 2) )
> +#define b8(x)   ( b4(x) | ( b4(x) >> 4) )
> +#define b16(x)  ( b8(x) | ( b8(x) >> 8) )
> +#define b32(x)  (b16(x) | (b16(x) >>16) )
> +#define next_power_of_2(x)      (b32(x-1) + 1)
> +

These macros are generic, can they be moved to a generic header so that
other code can reuse them?


Wei.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] Xen: FIFO-based event channel ABI
  2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
                   ` (8 preceding siblings ...)
  2013-08-12 13:06 ` [RFC PATCH 0/8] Xen: FIFO-based event channel ABI George Dunlap
@ 2013-08-16 16:55 ` Wei Liu
  9 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2013-08-16 16:55 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, Keir Fraser, xen-devel

On Fri, Aug 09, 2013 at 07:08:32PM +0100, David Vrabel wrote:
> This is an RFC of the FIFO-based event channel ABI described in this
> design document:
> 
> http://xenbits.xen.org/people/dvrabel/event-channels-D.pdf
> 
> Changes in draft D include:
> 
> - Only Xen writes to the HEAD field in the control block
> 
> - Added a link_bits field to EVTCHNOP_init_control to return the
>   number of valid bits for the LINK field.
> 
> Patch 1-4 do some preparatory work for supporting alternate ABIs.
> 
> Patch 5-8 expand the number of evtchn objects a domain may have to

Patch 5-6. :-)

> changing how they are allocated.
> 
> Patch 7-8 add the ABI and the implementation.  Patch 8 list some of
> the known limitations with this RFC implementation.
> 
> Changes in v1:
> 
> - Updates for Draft D of the design.
> - 130,000+ event channels are now supported.
> - event_port.c -> event_2l.c and only contains 2l functions.
> - Addresses various review comments
>   - int -> unsigned in lots of places
>   - use write_atomic() to set HEAD
>   - removed MAX_EVTCHNS
>   - evtchn_ops are const.
> - Pack struct evtchns better to reduce memory needed.
> 
> David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops
  2013-08-16 16:33   ` Wei Liu
@ 2013-08-19 10:32     ` David Vrabel
  2013-08-19 10:46       ` Wei Liu
  0 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-08-19 10:32 UTC (permalink / raw)
  To: Wei Liu; +Cc: Keir Fraser, David Vrabel, xen-devel

Can you trim your replies?  There's no need to quote the whole patch.

On 16/08/13 17:33, Wei Liu wrote:
>> +    evtchn_fifo_destroy(d);
>> +
> 
> Don't need to put this inside event_lock critical region?

No, but it does need to be reordered to be after freeing all the event
channel buckets.

>> +static int setup_event_array(struct domain *d)
>> +{
>> +    if ( d->evtchn_fifo )
>> +        return 0;
>> +
>> +    d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain);
> 
> struct evtchn_fifo_domain is very big because it contains event_array
> which has 1024 elements, however most domains will not use all 2^17
> ports. Could you add in dynamically allocation for event_array?

It has

   2^17 / (PAGE_SIZE / sizeof(event_word_t))
=  131072 / (4096 / 4)
=  128 entries.

So sizeof(struct evtchn_fifo_domain) == 2052 bytes (so 1 page).

Would realloc'ing this as the number of event channels grows actually
save a useful amount of memory?

>> +/*
>> + * Some ports may already be bound, bind them to the correct VCPU so
>> + * they have a valid queue.
>> + *
>> + * Note: any events that are currently pending will not be resent and
>> + * will be lost.
>> + */
> 
> Can this be fixed? I presume you can check the state of the port and set
> it to pending in the new ABI when necessary? As you've got hold of the
> event lock at this stage it should be easy to synchronize the state?

Why?  Guests should setup the ABI before binding any events if they
don't want to risk losing any.

>> +static void cleanup_event_array(struct domain *d)
>> +{
>> +    unsigned i;
>> +
>> +    if ( d->evtchn_fifo == NULL )
>> +        return;
>> +
>> +    for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ )
>> +    {
>> +        unmap_guest_page(d->evtchn_fifo->event_array[i].page,
>> +                         d->evtchn_fifo->event_array[i].virt);
> 
> Suggest reset page and virt to NULL / 0, just like you did for 
> cleanup_control_block.

Why? d->evtchn_fifo is about to be freed.

>> +    }
>> +    xfree(d->evtchn_fifo);
> 
> Same for evtchn_fifo.

Why? The domain is being destroyed.

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops
  2013-08-19 10:32     ` David Vrabel
@ 2013-08-19 10:46       ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2013-08-19 10:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Wei Liu, xen-devel

On Mon, Aug 19, 2013 at 11:32:29AM +0100, David Vrabel wrote:
> Can you trim your replies?  There's no need to quote the whole patch.
> 

Sorry, forgot to do that.

> On 16/08/13 17:33, Wei Liu wrote:
> >> +    evtchn_fifo_destroy(d);
> >> +
> > 
> > Don't need to put this inside event_lock critical region?
> 
> No, but it does need to be reordered to be after freeing all the event
> channel buckets.
> 

Yes, please reorder that.

> >> +static int setup_event_array(struct domain *d)
> >> +{
> >> +    if ( d->evtchn_fifo )
> >> +        return 0;
> >> +
> >> +    d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain);
> > 
> > struct evtchn_fifo_domain is very big because it contains event_array
> > which has 1024 elements, however most domains will not use all 2^17
> > ports. Could you add in dynamically allocation for event_array?
> 
> It has
> 
>    2^17 / (PAGE_SIZE / sizeof(event_word_t))
> =  131072 / (4096 / 4)
> =  128 entries.
> 

Ah yes, my calculation was wrong (just came back to Cambridge and still
had jet lag when I reviewed your patch, sorry).

> So sizeof(struct evtchn_fifo_domain) == 2052 bytes (so 1 page).
> 
> Would realloc'ing this as the number of event channels grows actually
> save a useful amount of memory?
> 

It depends on how the allocator works? But I think 1 page is not a big
deal. I raised that based on my wrong calculation so my comment is
effectively void.

> >> +/*
> >> + * Some ports may already be bound, bind them to the correct VCPU so
> >> + * they have a valid queue.
> >> + *
> >> + * Note: any events that are currently pending will not be resent and
> >> + * will be lost.
> >> + */
> > 
> > Can this be fixed? I presume you can check the state of the port and set
> > it to pending in the new ABI when necessary? As you've got hold of the
> > event lock at this stage it should be easy to synchronize the state?
> 
> Why?  Guests should setup the ABI before binding any events if they
> don't want to risk losing any.
> 

Then can you please add the above line to the comment?

> >> +static void cleanup_event_array(struct domain *d)
> >> +{
> >> +    unsigned i;
> >> +
> >> +    if ( d->evtchn_fifo == NULL )
> >> +        return;
> >> +
> >> +    for ( i = 0; i < EVTCHN_FIFO_MAX_EVENT_ARRAY_PAGES; i++ )
> >> +    {
> >> +        unmap_guest_page(d->evtchn_fifo->event_array[i].page,
> >> +                         d->evtchn_fifo->event_array[i].virt);
> > 
> > Suggest reset page and virt to NULL / 0, just like you did for 
> > cleanup_control_block.
> 
> Why? d->evtchn_fifo is about to be freed.
> 
> >> +    }
> >> +    xfree(d->evtchn_fifo);
> > 
> > Same for evtchn_fifo.
> 
> Why? The domain is being destroyed.

I just prefer everything to be in known state if possible. Just my
$0.02.


Wei.

> 
> David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops
  2013-08-09 18:08 ` [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops David Vrabel
  2013-08-16 16:33   ` Wei Liu
@ 2013-08-23 10:33   ` Jan Beulich
  2013-08-23 11:00     ` David Vrabel
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2013-08-23 10:33 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Wei Liu

>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
> +static void evtchn_fifo_bind_to_vcpu(struct domain *d, struct evtchn *evtchn,
> +                                     struct vcpu *v)
> +{
> +    unsigned priority;
> +
> +    /* Keep the same priority if possible, otherwise use the
> +       default. */
> +    if ( evtchn->queue )
> +        priority = evtchn->queue->priority;
> +    else
> +        priority = EVTCHN_FIFO_PRIORITY_DEFAULT;
> +
> +    evtchn->queue = &v->evtchn_fifo->queue[priority];

Why not simply

    if ( !evtchn->queue )
        evtchn->queue = &v->evtchn_fifo->queue[EVTCHN_FIFO_PRIORITY_DEFAULT];

?

> +static int setup_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset)

No fixed-size integer types please when they're not really needed
(here they could be unsigned long and unsigned int respectively).

> +static int setup_event_array(struct domain *d)
> +{
> +    if ( d->evtchn_fifo )
> +        return 0;
> +
> +    d->evtchn_fifo = xzalloc(struct evtchn_fifo_domain);
> +    if ( d->evtchn_fifo == NULL )
> +        return -ENOMEM;
> +
> +    d->evtchn_fifo->num_evtchns = 0;

So you just used xzalloc() and then set this field to zero again?

> +int evtchn_fifo_init_control(struct evtchn_init_control *init_control)
> +{
> +    struct domain *d = current->domain;
> +    uint32_t vcpu_id;
> +    uint64_t gfn;
> +    uint32_t offset;
> +    struct vcpu *v;
> +    int rc;
> +
> +    init_control->link_bits = EVTCHN_FIFO_LINK_BITS;
> +
> +    vcpu_id = init_control->vcpu;
> +    gfn     = init_control->control_mfn;
> +    offset  = init_control->offset;
> +
> +    if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) )
> +        return -ENOENT;
> +    v = d->vcpu[vcpu_id];
> +
> +    /* Must not cross page boundary. */
> +    if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) )
> +        return -EINVAL;
> +
> +    /* Must be 8-bytes aligned. */
> +    if ( offset & (8 - 1) )

This would better use __alignof() to document where the requirement
stems from.

> +int evtchn_fifo_expand_array(struct evtchn_expand_array *expand_array)
> +{
> +    struct domain *d = current->domain;
> +    int rc;
> +
> +    spin_lock(&d->event_lock);
> +    rc = add_page_to_event_array(d, expand_array->array_mfn);

Neither here, not in its caller, nor in the called function seems to
be any protection against this being called without first having
used EVTCHNOP_init_control.

Same for evtchn_fifo_set_priority().

Further all but evtchn_fifo_init_control() appear to have no outputs,
and hence - for documentation purposes - constifying the parameters
of the respective handler functions would be appreciated.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -96,6 +96,7 @@ struct evtchn
>          } pirq;        /* state == ECS_PIRQ */
>          u16 virq;      /* state == ECS_VIRQ */
>      } u;
> +    struct evtchn_fifo_queue *queue;

I wonder,whether you need to store the (8-byte) queue pointer
here at all: Why can't you just store the (1-byte) priority instead?
Can't the queue always be determined via notify_vcpu_id?

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops
  2013-08-23 10:33   ` Jan Beulich
@ 2013-08-23 11:00     ` David Vrabel
  0 siblings, 0 replies; 28+ messages in thread
From: David Vrabel @ 2013-08-23 11:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Wei Liu

On 23/08/13 11:33, Jan Beulich wrote:
>>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
>> +static void evtchn_fifo_bind_to_vcpu(struct domain *d, struct evtchn *evtchn,
>> +                                     struct vcpu *v)
>> +{
>> +    unsigned priority;
>> +
>> +    /* Keep the same priority if possible, otherwise use the
>> +       default. */
>> +    if ( evtchn->queue )
>> +        priority = evtchn->queue->priority;
>> +    else
>> +        priority = EVTCHN_FIFO_PRIORITY_DEFAULT;
>> +
>> +    evtchn->queue = &v->evtchn_fifo->queue[priority];
> 
> Why not simply
> 
>     if ( !evtchn->queue )
>         evtchn->queue = &v->evtchn_fifo->queue[EVTCHN_FIFO_PRIORITY_DEFAULT];
> 
> ?

event->queue might point to a queue on VCPU w, and we want it to point
to one (of the same priority) on VCPU v.

But as you said later on, maybe it would be better to just store the
priority in the struct evtchn.  This would avoid the need to this
function/hook entirely.

>> +int evtchn_fifo_init_control(struct evtchn_init_control *init_control)
>> +{
[...]
>> +
>> +    /* Must be 8-bytes aligned. */
>> +    if ( offset & (8 - 1) )
> 
> This would better use __alignof() to document where the requirement
> stems from.

There's no specific requirement for it to be 8 byte aligned at this time
(the ready word is only 32-bits), this is to make it more future proof
if the structure is extended later.

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/8] evtchn: refactor low-level event channel port ops
  2013-08-15 14:05   ` Jan Beulich
@ 2013-09-06 14:25     ` David Vrabel
  2013-09-06 14:55       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: David Vrabel @ 2013-09-06 14:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Wei Liu

On 15/08/13 15:05, Jan Beulich wrote:
>>>> On 09.08.13 at 20:08, David Vrabel <david.vrabel@citrix.com> wrote:
>> +static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
>> +{
>> +    struct domain *d = v->domain;
>> +    unsigned port = evtchn->port;
>> +
>> +    /*
>> +     * The following bit operations must happen in strict order.
>> +     * NB. On x86, the atomic bit operations also act as memory barriers.
>> +     * There is therefore sufficiently strict ordering for this architecture --
>> +     * others may require explicit memory barriers.
>> +     */
>> +
>> +    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
>> +        return;
>> +
>> +    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
>> +         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
>> +                           &vcpu_info(v, evtchn_pending_sel)) )
> 
> Up to here this is indeed 2-level specific, but the rest of the
> function isn't, and would therefore better go back into
> generic code.

I think it is fine for the ABI specific hooks to make calls to common
code but tried this anyway and I don't think it's an improvement.

The set_pending has to return three different states:

1. Do nothing.
2. Mark vcpu pending
3. Mark vcpu pending and check pollers.

I tried a couple of ways of doing this but they all look ugly with extra
branches with an interface that's less clear.

e.g.,

static bool_t evtchn_2l_set_pending(struct vcpu *v,
    struct evtchn *evtchn)
{
    struct domain *d = v->domain;
    unsigned port = evtchn->port;
    unsigned action = 0;

    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
        return action;

    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
                           &vcpu_info(v, evtchn_pending_sel)) )
    {
        action |= MARK_PENDING;
    }
    action |= CHECK_POLLERS
    return action;
}
[...]
static void evtchn_set_pending(struct vcpu *v, int port)
{
    struct domain *d = v->domain;
    unsigned action;

    action = evtchn_port_set_pending(v, evtchn_from_port(d, port));
    if ( action & MARK_PENDING )
        vcpu_mark_pending(v);
    if ( action & CHECK_PENDING )
        evtchn_check_pollers(d, port);
}

Which just looks bleah to me.

I also tried:

static bool_t evtchn_2l_set_pending(struct vcpu *v,
    struct evtchn *evtchn)
{
    struct domain *d = v->domain;
    unsigned port = evtchn->port;

    if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
        return 0;

    if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
         !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
                           &vcpu_info(v, evtchn_pending_sel)) )
    {
        vcpu_mark_events_pending(v);
    }
    return 1;
}
[...]
static void evtchn_set_pending(struct vcpu *v, int port)
{
    struct domain *d = v->domain;

    if (evtchn_port_set_pending(v, evtchn_from_port(d, port)))
        evtchn_check_pollers(d, port);
}

But this means we can't move the vcpu_mark_events_pending() out of the
unmask hook because the FIFO unmask calls set_pending which calls
vcpu_mark_events_pending().

Any other suggestions or is the original fine as-is?

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/8] evtchn: refactor low-level event channel port ops
  2013-09-06 14:25     ` David Vrabel
@ 2013-09-06 14:55       ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-09-06 14:55 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Wei Liu

>>> On 06.09.13 at 16:25, David Vrabel <david.vrabel@citrix.com> wrote:
> Any other suggestions or is the original fine as-is?

Yeah, I guess it's fine then as is.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-09-06 14:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-09 18:08 [RFC PATCH 0/8] Xen: FIFO-based event channel ABI David Vrabel
2013-08-09 18:08 ` [PATCH 1/8] debug: remove some event channel info from the 'i' and 'q' debug keys David Vrabel
2013-08-15 13:55   ` Jan Beulich
2013-08-09 18:08 ` [PATCH 2/8] evtchn: refactor low-level event channel port ops David Vrabel
2013-08-15 14:05   ` Jan Beulich
2013-09-06 14:25     ` David Vrabel
2013-09-06 14:55       ` Jan Beulich
2013-08-09 18:08 ` [PATCH 3/8] evtchn: add a hook to bind an event port to a VCPU David Vrabel
2013-08-09 18:08 ` [PATCH 4/8] evtchn: use a per-domain variable for the max number of event channels David Vrabel
2013-08-15 14:09   ` Jan Beulich
2013-08-09 18:08 ` [PATCH 5/8] evtchn: dynamically allocate d->evtchn David Vrabel
2013-08-15 14:10   ` Jan Beulich
2013-08-09 18:08 ` [PATCH 6/8] evtchn: alter internal object handling scheme David Vrabel
2013-08-15 14:21   ` Jan Beulich
2013-08-15 15:46     ` David Vrabel
2013-08-16  7:14       ` Jan Beulich
2013-08-16 16:55   ` Wei Liu
2013-08-09 18:08 ` [PATCH 7/8] evtchn: add FIFO-based event channel ABI David Vrabel
2013-08-15 14:25   ` Jan Beulich
2013-08-09 18:08 ` [PATCH 8/8] evtchn: add FIFO-based event channel hypercalls and port ops David Vrabel
2013-08-16 16:33   ` Wei Liu
2013-08-19 10:32     ` David Vrabel
2013-08-19 10:46       ` Wei Liu
2013-08-23 10:33   ` Jan Beulich
2013-08-23 11:00     ` David Vrabel
2013-08-12 13:06 ` [RFC PATCH 0/8] Xen: FIFO-based event channel ABI George Dunlap
2013-08-12 13:49   ` David Vrabel
2013-08-16 16:55 ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.