All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/2] Xen: FIFO-based event channel ABI fixes
@ 2013-11-11 16:03 David Vrabel
  2013-11-11 16:03 ` [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK David Vrabel
  2013-11-11 16:03 ` [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues David Vrabel
  0 siblings, 2 replies; 9+ messages in thread
From: David Vrabel @ 2013-11-11 16:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich

This series address two design flaws in the FIFO-based event channel ABI.

1. Fix a potential DoS caused by an unbounded loop when setting LINK.

2. Fix queue corruption that may occurs when events are moved between
queues. In this patch we add a q field to struct evtchn rather than an
(last_vcpu_id, last_priority) tuple.

An updated design document is available from:

  http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf

- Add the BUSY bit to indicate that the guest must not clear MASKED.

v9 of the Linux patches will be posted shortly.

Changes since v2:

- Use a new BUSY bit to block guests from clearing UNMASKED, this is
  lower overhead than the previous solution (which required a
  hypercall).
- Fix another problem with moving events between queues.
- Add evtchn->last_vpcu_id and evtchn->last_priority instead of
  evtchn->q.  This keeps the structure at 32 bytes long.

Changes since v1:

- Add MAINTAINERS patch
- Remove some unnecessary temporary pending state clears
- Add fix for DoS

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK
  2013-11-11 16:03 [PATCHv3 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel
@ 2013-11-11 16:03 ` David Vrabel
  2013-11-11 16:38   ` Jan Beulich
  2013-11-11 16:03 ` [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues David Vrabel
  1 sibling, 1 reply; 9+ messages in thread
From: David Vrabel @ 2013-11-11 16:03 UTC (permalink / raw)
  To: xen-devel; +Cc: David Vrabel, Keir Fraser, David Vrabel, Jan Beulich

From: David Vrabel <dvrabel@cantab.net>

A malicious or buggy guest can cause another domain to spin
indefinitely by repeatedly writing to an event word when the other
guest is trying to link a new event.  The cmpxchg() in
evtchn_fifo_set_link() will repeatedly fail and the loop may never
terminate.

Fixing this requires a change to the ABI which is documented in draft
H of the design.

  http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf

Since a well-behaved guest only makes a limited set of state changes,
the loop can terminate early if the guest makes an invalid state
transition.

The guest may:

- clear LINKED and LINK.
- clear PENDING
- set MASKED
- clear MASKED

It is valid for the guest to mask and unmask an event at any time so
specify that it is not valid for a guest to clear MASKED if Xen is
trying to update LINK.  Indicate this to the guest with an additional
BUSY bit in the event word.  The guest must not clear MASKED if BUSY
is set and it should spin until BUSY is cleared.

The remaining valid writes (clear LINKED, clear PENDING, set MASKED,
clear MASKED by Xen) will limit the number of failures of the
cmpxchg() to at most 4.  A clear of LINKED will also terminate the
loop early. Therefore, the loop can then be limited to at most 4
iterations.

If the buggy or malicious guest does cause the loop to exit with
LINKED set and LINK unset then that buggy guest will lose events.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/event_fifo.c            |   74 +++++++++++++++++++++++++++++------
 xen/include/public/event_channel.h |    1 +
 2 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index 64dbfff..7e9aa64 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -34,19 +34,67 @@ static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d,
     return d->evtchn_fifo->event_array[p] + w;
 }
 
-static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
+
+static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link)
+{
+    event_word_t new, old;
+
+    if ( !(*w & (1 << EVTCHN_FIFO_LINKED)) )
+        return 0;
+
+    old = *w;
+    new = (old & ~EVTCHN_FIFO_LINK_MASK) | link;
+    *w = cmpxchg(word, old, new);
+    if ( *w == old )
+        return 1;
+
+    return -EAGAIN;
+}
+
+/*
+ * Atomically set the LINK field iff it is still LINKED.
+ *
+ * The guest is only permitted to make the following changes to a
+ * LINKED event.
+ *
+ * - set MASKED
+ * - clear MASKED
+ * - clear PENDING
+ * - clear LINKED (and LINK)
+ *
+ * We block unmasking by the guest by marking the tail word as BUSY,
+ * therefore, the cmpxchg() may fail at most 4 times.
+ */
+static bool_t evtchn_fifo_set_link(struct domain *d, event_word_t *word,
+                                   uint32_t link)
 {
-    event_word_t n, o, w;
+    event_word_t w;
+    unsigned int try;
+    int ret;
+
+    w = read_atomic(word);
 
-    w = *word;
+    ret = try_set_link(word, &w, link);
+    if ( ret >= 0 )
+        return ret;
 
-    do {
-        if ( !(w & (1 << EVTCHN_FIFO_LINKED)) )
-            return 0;
-        o = w;
-        n = (w & ~EVTCHN_FIFO_LINK_MASK) | link;
-    } while ( (w = cmpxchg(word, o, n)) != o );
+    /* Lock the word to prevent guest unmasking. */
+    set_bit(EVTCHN_FIFO_BUSY, word);
 
+    w = read_atomic(word);
+
+    for ( try = 0; try < 4; try++ )
+    {
+        ret = try_set_link(word, &w, link);
+        if ( ret >= 0 )
+        {
+            clear_bit(EVTCHN_FIFO_BUSY, word);
+            return ret;
+        }
+    }
+    gdprintk(XENLOG_WARNING, "domain %d, port %d not linked\n",
+             d->domain_id, link);
+    clear_bit(EVTCHN_FIFO_BUSY, word);
     return 1;
 }
 
@@ -105,7 +153,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         if ( port != q->tail )
         {
             tail_word = evtchn_fifo_word_from_port(d, q->tail);
-            linked = evtchn_fifo_set_link(tail_word, port);
+            linked = evtchn_fifo_set_link(d, tail_word, port);
         }
         if ( !linked )
             write_atomic(q->head, port);
@@ -202,11 +250,11 @@ static void evtchn_fifo_print_state(struct domain *d,
 
     word = evtchn_fifo_word_from_port(d, evtchn->port);
     if ( !word )
-        printk("?   ");
+        printk("?     ");
     else if ( test_bit(EVTCHN_FIFO_LINKED, word) )
-        printk("%-4u", *word & EVTCHN_FIFO_LINK_MASK);
+        printk("%c/%-4u", *word & EVTCHN_FIFO_BUSY ? 'B' : '.', *word & EVTCHN_FIFO_LINK_MASK);
     else
-        printk("-   ");
+        printk("%c/-   ", *word & EVTCHN_FIFO_BUSY ? 'B' : '.');
 }
 
 static const struct evtchn_port_ops evtchn_port_ops_fifo =
diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h
index 4a53484..b78ee32 100644
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -343,6 +343,7 @@ typedef uint32_t event_word_t;
 #define EVTCHN_FIFO_PENDING 31
 #define EVTCHN_FIFO_MASKED  30
 #define EVTCHN_FIFO_LINKED  29
+#define EVTCHN_FIFO_BUSY    28
 
 #define EVTCHN_FIFO_LINK_BITS 17
 #define EVTCHN_FIFO_LINK_MASK ((1 << EVTCHN_FIFO_LINK_BITS) - 1)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues
  2013-11-11 16:03 [PATCHv3 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel
  2013-11-11 16:03 ` [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK David Vrabel
@ 2013-11-11 16:03 ` David Vrabel
  2013-11-11 16:46   ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: David Vrabel @ 2013-11-11 16:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich

From: David Vrabel <david.vrabel@citrix.com>

An event may still be the tail of a queue even if the queue is now
empty (an 'old tail' event).  There is logic to handle the case when
this old tail event needs to be added to the now empty queue (by
checking for q->tail == port).

However, if the old tail event on queue A is moved to a different
queue B (by changing its VCPU or priority), the event may then be
linked onto queue B.  When another event is linked onto queue A it
will check the old tail, see that it is linked (but on queue B) and
overwrite the LINK field, corrupting both queues.

When an event is linked, save the vcpu id and priority of thee queue
it is being linked onto.  Use this when linking an event to check if
it is an unlinked old tail event. i.e., a) it has moved queues; b) it
is currently unlinked; and c) it's the tail of the old queue.  If it
is an unlinked, old tail event, the old queue is empty and old_q->tail
is invalidated to ensure adding another event to old_q will update
HEAD.  The tail is invalidated by setting it to 0 since the event 0 is
never linked.

The old_q->lock is held while setting LINKED to avoid a race with the
test of LINKED in evtchn_fifo_set_link().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_fifo.c |   42 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/xen/sched.h |    2 ++
 2 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index 7e9aa64..97b07f5 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -98,6 +98,46 @@ static bool_t evtchn_fifo_set_link(struct domain *d, event_word_t *word,
     return 1;
 }
 
+static bool_t test_and_set_linked(struct domain *d, struct evtchn *evtchn,
+                                  struct evtchn_fifo_queue *q,
+                                  event_word_t *word)
+{
+    struct vcpu *old_v;
+    struct evtchn_fifo_queue *old_q;
+    bool_t was_linked;
+    unsigned long flags;
+
+    old_v = d->vcpu[evtchn->last_vcpu_id];
+    old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority];
+
+    evtchn->last_vcpu_id = evtchn->notify_vcpu_id;
+    evtchn->last_priority = evtchn->priority;
+
+    if ( q == old_q )
+        return test_and_set_bit(EVTCHN_FIFO_LINKED, word);
+
+    /*
+     * This event is now on a different queue.
+     *
+     * If the event is still linked in the old queue it won't be moved
+     * yet.
+     *
+     * If this event is unlinked /and/ it's the old queue's tail, the
+     * old queue is empty and its tail must be invalidated to prevent
+     * adding an event to the old queue from corrupting the new queue.
+     */
+    spin_lock_irqsave(&old_q->lock, flags);
+
+    was_linked = test_and_set_bit(EVTCHN_FIFO_LINKED, word);
+
+    if ( !was_linked && old_q->tail == evtchn->port )
+        old_q->tail = 0;
+
+    spin_unlock_irqrestore(&old_q->lock, flags);
+
+    return was_linked;
+}
+
 static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
 {
     struct domain *d = v->domain;
@@ -133,7 +173,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
      * Link the event if it unmasked and not already linked.
      */
     if ( !test_bit(EVTCHN_FIFO_MASKED, word)
-         && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) )
+         && !test_and_set_linked(d, evtchn, q, word) )
     {
         event_word_t *tail_word;
         bool_t linked = 0;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2397537..3714c37 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -98,6 +98,8 @@ struct evtchn
     } u;
     u8 priority;
     u8 pending:1;
+    u16 last_vcpu_id;
+    u8 last_priority;
 #ifdef FLASK_ENABLE
     void *ssid;
 #endif
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK
  2013-11-11 16:03 ` [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK David Vrabel
@ 2013-11-11 16:38   ` Jan Beulich
  2013-11-11 16:56     ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-11-11 16:38 UTC (permalink / raw)
  To: David Vrabel; +Cc: David Vrabel, xen-devel, Keir Fraser

>>> On 11.11.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote:
> +static bool_t evtchn_fifo_set_link(struct domain *d, event_word_t *word,

All you need d for is to get its domain_id, hence it could easily be a
pointer to const.

> +    w = read_atomic(word);
> +
> +    for ( try = 0; try < 4; try++ )
> +    {
> +        ret = try_set_link(word, &w, link);
> +        if ( ret >= 0 )
> +        {
> +            clear_bit(EVTCHN_FIFO_BUSY, word);

Considering that this is another atomic operation, wouldn't it
make sense to have the cmpxchg() at once clear the flag,
and hence you'd need to clear it here only when ret == 0
(which I understand isn't the common case)?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues
  2013-11-11 16:03 ` [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues David Vrabel
@ 2013-11-11 16:46   ` Jan Beulich
  2013-11-11 17:47     ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-11-11 16:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 11.11.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote:
> +static bool_t test_and_set_linked(struct domain *d, struct evtchn *evtchn,

Same comment here regarding constification of d.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -98,6 +98,8 @@ struct evtchn
>      } u;
>      u8 priority;
>      u8 pending:1;
> +    u16 last_vcpu_id;
> +    u8 last_priority;

So this adds up to 5 bytes now, whereas you apparently could do
with 4. On ARM this means the structure is larger than needed; on
64-bit (ARM or x86) it means that further additions are going to
be less obvious. I'd therefore suggest

    } u;
    u16 last_vcpu_id;
    u8 priority:4;
    u8 last_priority:4;
    u8 pending:1;

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK
  2013-11-11 16:38   ` Jan Beulich
@ 2013-11-11 16:56     ` David Vrabel
  2013-11-11 17:03       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2013-11-11 16:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: David Vrabel, xen-devel, Keir Fraser

On 11/11/13 16:38, Jan Beulich wrote:
>>>> On 11.11.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote:
> 
>> +    w = read_atomic(word);
>> +
>> +    for ( try = 0; try < 4; try++ )
>> +    {
>> +        ret = try_set_link(word, &w, link);
>> +        if ( ret >= 0 )
>> +        {
>> +            clear_bit(EVTCHN_FIFO_BUSY, word);
> 
> Considering that this is another atomic operation, wouldn't it
> make sense to have the cmpxchg() at once clear the flag,
> and hence you'd need to clear it here only when ret == 0
> (which I understand isn't the common case)?

The common case (I believe, but haven't measured it) is the first
try_set_link() call without BUSY set.

In the loop, I suspect the mostly likely write by the guest is clearing
LINKED, i.e., ret == 0.

Still, it seems easy enough to have:

    for ( try = 0; try < 4; try++ )
    {
        ret = try_set_link(word, &w, link);
        if ( ret >= 0 )
        {
            if ( ret == 0 )
                clear_bit(EVTCHN_FIFO_BUSY, word);
            return ret;
        }
    }

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK
  2013-11-11 16:56     ` David Vrabel
@ 2013-11-11 17:03       ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2013-11-11 17:03 UTC (permalink / raw)
  To: David Vrabel; +Cc: David Vrabel, xen-devel, Keir Fraser

>>> On 11.11.13 at 17:56, David Vrabel <david.vrabel@citrix.com> wrote:
> Still, it seems easy enough to have:
> 
>     for ( try = 0; try < 4; try++ )
>     {
>         ret = try_set_link(word, &w, link);
>         if ( ret >= 0 )
>         {
>             if ( ret == 0 )
>                 clear_bit(EVTCHN_FIFO_BUSY, word);
>             return ret;

Except that I'd recommend implementing this with a switch
entertaining fall-through from case 0 to case 1. But it's your
code, so it's up to you anyway.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues
  2013-11-11 16:46   ` Jan Beulich
@ 2013-11-11 17:47     ` David Vrabel
  2013-11-12  8:14       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2013-11-11 17:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 11/11/13 16:46, Jan Beulich wrote:
>>>> On 11.11.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote:
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -98,6 +98,8 @@ struct evtchn
>>      } u;
>>      u8 priority;
>>      u8 pending:1;
>> +    u16 last_vcpu_id;
>> +    u8 last_priority;
> 
> So this adds up to 5 bytes now, whereas you apparently could do
> with 4. On ARM this means the structure is larger than needed; on
> 64-bit (ARM or x86) it means that further additions are going to
> be less obvious. I'd therefore suggest

ARM:    24 bytes (3 spare bytes), 128 evtchns per page.
x86_64: 32 bytes (3 spare bytes), 128 evtchns per page.

>     } u;
>     u16 last_vcpu_id;
>     u8 priority:4;
>     u8 last_priority:4;
>     u8 pending:1;

ARM:    24 bytes (4 spare bytes), 128 evtchns per page.
x86_64: 32 bytes (4 spare bytes), 128 evtchns per page.

As of ea963e094a (evtchn: allow many more evtchn objects to be allocated
per domain) the number of evtchns per page is a power of two so there is
no change to the number with either layout.

I was avoiding using bitfields for things other than single bits. Is the
extra spare byte preferable to the code size increase from dealing with
the bitfields?

On my x86_64 build text size increases by 1684425 - 1684393 = 32 bytes.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues
  2013-11-11 17:47     ` David Vrabel
@ 2013-11-12  8:14       ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2013-11-12  8:14 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 11.11.13 at 18:47, David Vrabel <david.vrabel@citrix.com> wrote:
> On 11/11/13 16:46, Jan Beulich wrote:
>>>>> On 11.11.13 at 17:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -98,6 +98,8 @@ struct evtchn
>>>      } u;
>>>      u8 priority;
>>>      u8 pending:1;
>>> +    u16 last_vcpu_id;
>>> +    u8 last_priority;
>> 
>> So this adds up to 5 bytes now, whereas you apparently could do
>> with 4. On ARM this means the structure is larger than needed; on
>> 64-bit (ARM or x86) it means that further additions are going to
>> be less obvious. I'd therefore suggest
> 
> ARM:    24 bytes (3 spare bytes), 128 evtchns per page.
> x86_64: 32 bytes (3 spare bytes), 128 evtchns per page.
> 
>>     } u;
>>     u16 last_vcpu_id;
>>     u8 priority:4;
>>     u8 last_priority:4;
>>     u8 pending:1;
> 
> ARM:    24 bytes (4 spare bytes), 128 evtchns per page.
> x86_64: 32 bytes (4 spare bytes), 128 evtchns per page.
> 
> As of ea963e094a (evtchn: allow many more evtchn objects to be allocated
> per domain) the number of evtchns per page is a power of two so there is
> no change to the number with either layout.

Oh, right.

> I was avoiding using bitfields for things other than single bits. Is the
> extra spare byte preferable to the code size increase from dealing with
> the bitfields?

Okay then without bitfields.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-11-12  8:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-11 16:03 [PATCHv3 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel
2013-11-11 16:03 ` [PATCH 1/2] events/fifo: don't spin indefinitely when setting LINK David Vrabel
2013-11-11 16:38   ` Jan Beulich
2013-11-11 16:56     ` David Vrabel
2013-11-11 17:03       ` Jan Beulich
2013-11-11 16:03 ` [PATCH 2/2] events/fifo: don't corrupt queues if an old tail moves queues David Vrabel
2013-11-11 16:46   ` Jan Beulich
2013-11-11 17:47     ` David Vrabel
2013-11-12  8:14       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.