All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sander Eikelenboom <linux@eikelenboom.it>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: andrew.cooper3@citrix.com, malcolm.crossley@citrix.com,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
Date: Tue, 17 Mar 2015 23:16:28 +0100	[thread overview]
Message-ID: <1988745644.20150317231628@eikelenboom.it> (raw)
In-Reply-To: <20150317174454.GA16436@l.oracle.com>


Tuesday, March 17, 2015, 6:44:54 PM, you wrote:


>> >> Additionally I think it should be considered whether the bitmap
>> >> approach of interpreting ->state is the right one, and we don't
>> >> instead want a clean 3-state (idle, sched, run) model.
>> > 
>> > Could you elaborate a bit more please? As in three different unsigned int
>> > (or bool_t) that set in what state we are in?
>> 
>> An enum { STATE_IDLE, STATE_SCHED, STATE_RUN }. Especially
>> if my comment above turns out to be wrong, you'd have no real
>> need for the SCHED and RUN flags to be set at the same time.

> I cobbled what I believe is what you were thinking off.

> As you can see to preserve the existing functionality such as
> being able to schedule N amount of interrupt injections
> for the N interrupts we might get - I modified '->masked'
> to be an atomic counter.

> The end result is that we can still live-lock. Unless we:
>  - Drop on the floor the injection of N interrupts and
>    just deliever at max one per VMX_EXIT (and not bother
>    with interrupts arriving when we are in the VMX handler).

>  - Alter the softirq code slightly - to have an variant
>    which will only iterate once over the pending softirq
>    bits per call. (so save an copy of the bitmap on the
>    stack when entering the softirq handler - and use that.
>    We could also xor it against the current to catch any
>    non-duplicate bits being set that we should deal with).

> Here is the compile, but not run-time tested patch.

> From e7d8bcd7c5d32c520554a4ad69c4716246036002 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date: Tue, 17 Mar 2015 13:31:52 -0400
> Subject: [RFC PATCH] dpci: Switch to tristate instead of bitmap

> *TODO*:
>  - Writeup.
>  - Tests

Done, and unfortunately it doesn't fly ..
Some devices seem to work fine, others don't receive any interrupts shortly 
after boot like:
 40:          3          0          0          0  xen-pirq-ioapic-level  cx25821[1]

Don't see any crashes or errors though, so it seems to silently lock somewhere.

--
Sander

> Suggested-by: Jan Beulich <jbuelich@suse.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/drivers/passthrough/io.c | 140 ++++++++++++++++++++++++-------------------
>  xen/include/xen/hvm/irq.h    |   4 +-
>  2 files changed, 82 insertions(+), 62 deletions(-)

> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..663e104 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -30,42 +30,28 @@
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
>  
>  /*
> - * These two bit states help to safely schedule, deschedule, and wait until
> - * the softirq has finished.
> - *
> - * The semantics behind these two bits is as follow:
> - *  - STATE_SCHED - whoever modifies it has to ref-count the domain (->dom).
> - *  - STATE_RUN - only softirq is allowed to set and clear it. If it has
> - *      been set hvm_dirq_assist will RUN with a saved value of the
> - *      'struct domain' copied from 'pirq_dpci->dom' before STATE_RUN was set.
> - *
> - * The usual states are: STATE_SCHED(set) -> STATE_RUN(set) ->
> - * STATE_SCHED(unset) -> STATE_RUN(unset).
> - *
> - * However the states can also diverge such as: STATE_SCHED(set) ->
> - * STATE_SCHED(unset) -> STATE_RUN(set) -> STATE_RUN(unset). That means
> - * the 'hvm_dirq_assist' never run and that the softirq did not do any
> - * ref-counting.
> - */
> -
> -enum {
> -    STATE_SCHED,
> -    STATE_RUN
> -};
> -
> -/*
>   * This can be called multiple times, but the softirq is only raised once.
> - * That is until the STATE_SCHED state has been cleared. The state can be
> - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
> - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
> + * That is until state is in init. The state can be changed  by:
> + * the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
> + * or by 'pt_pirq_softirq_reset' (which will try to init the state before
>   * the softirq had a chance to run).
>   */
>  static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
>  {
>      unsigned long flags;
>  
> -    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> +    switch ( cmpxchg(&pirq_dpci->state, STATE_INIT, STATE_SCHED) )
> +    {
> +    case STATE_RUN:
> +    case STATE_SCHED:
> +        /*
> +         * The pirq_dpci->mapping has been incremented to let us know
> +         * how many we have left to do.
> +         */
>          return;
> +    case STATE_INIT:
> +        break;
> +    }
>  
>      get_knownalive_domain(pirq_dpci->dom);
>  
> @@ -85,7 +71,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
>   */
>  bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
>  {
> -    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
> +    if ( pirq_dpci->state != STATE_INIT )
>          return 1;
>  
>      /*
> @@ -109,22 +95,22 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
>  
>      ASSERT(spin_is_locked(&d->event_lock));
>  
> -    switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
> +    switch ( cmpxchg(&pirq_dpci->state, STATE_SCHED, STATE_INIT) )
>      {
> -    case (1 << STATE_SCHED):
> +    case STATE_SCHED:
>          /*
> -         * We are going to try to de-schedule the softirq before it goes in
> -         * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
> +         * We are going to try to de-schedule the softirq before it goes to
> +         * running state. Whoever moves from sched MUST refcount the 'dom'.
>           */
>          put_domain(d);
>          /* fallthrough. */
> -    case (1 << STATE_RUN):
> -    case (1 << STATE_RUN) | (1 << STATE_SCHED):
> +    case STATE_RUN:
> +    case STATE_INIT:
>          /*
> -         * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due
> +         * The reason it is OK to reset 'dom' when init or running is set is due
>           * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
> -         * in local variable before it sets STATE_RUN - and therefore will not
> -         * dereference '->dom' which would crash.
> +         * in local variable before it sets state to running - and therefore
> +         * will not dereference '->dom' which would crash.
>           */
>          pirq_dpci->dom = NULL;
>          break;
> @@ -135,7 +121,7 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
>       * or never initialized it). Note that we hold the lock that
>       * 'hvm_dirq_assist' could be spinning on.
>       */
-    pirq_dpci->>masked = 0;
> +    atomic_set(&pirq_dpci->masked, 0);
>  }
>  
>  bool_t pt_irq_need_timer(uint32_t flags)
> @@ -149,7 +135,8 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>      if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
>                                &pirq_dpci->flags) )
>      {
-        pirq_dpci->>masked = 0;
> +        ASSERT(atomic_read(&pirq_dpci->masked) <= 1);
> +        atomic_set(&pirq_dpci->masked, 0);
>          pirq_dpci->pending = 0;
>          pirq_guest_eoi(dpci_pirq(pirq_dpci));
>      }
> @@ -278,6 +265,8 @@ int pt_irq_create_bind(
>               * As such on every 'pt_irq_create_bind' call we MUST set it.
>               */
>              pirq_dpci->dom = d;
> +            atomic_set(&pirq_dpci->masked, 0);
> +            pirq_dpci->state = STATE_INIT;
>              /* bind after hvm_irq_dpci is setup to avoid race with irq handler*/
>              rc = pirq_guest_bind(d->vcpu[0], info, 0);
>              if ( rc == 0 && pt_irq_bind->u.msi.gtable )
> @@ -398,6 +387,8 @@ int pt_irq_create_bind(
>              if ( pt_irq_need_timer(pirq_dpci->flags) )
>                  init_timer(&pirq_dpci->timer, pt_irq_time_out, pirq_dpci, 0);
>              /* Deal with gsi for legacy devices */
> +            atomic_set(&pirq_dpci->masked, 0);
> +            pirq_dpci->state = STATE_INIT;
>              rc = pirq_guest_bind(d->vcpu[0], info, share);
>              if ( unlikely(rc) )
>              {
> @@ -576,6 +567,7 @@ bool_t pt_pirq_cleanup_check(struct hvm_pirq_dpci *dpci)
>      if ( !dpci->flags && !pt_pirq_softirq_active(dpci) )
>      {
>          dpci->dom = NULL;
+        dpci->>state = STATE_INIT;
>          return 1;
>      }
>      return 0;
> @@ -617,7 +609,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
>           !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
>          return 0;
>  
-    pirq_dpci->>masked = 1;
> +    atomic_inc(&pirq_dpci->masked);
>      raise_softirq_for(pirq_dpci);
>      return 1;
>  }
> @@ -672,12 +664,13 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
>      spin_unlock(&d->event_lock);
>  }
>  
> -static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
> +static bool_t hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
>  {
> +    int more;
>      ASSERT(d->arch.hvm_domain.irq.dpci);
>  
>      spin_lock(&d->event_lock);
> -    if ( test_and_clear_bool(pirq_dpci->masked) )
> +    while ( (more = !atomic_dec_and_test(&pirq_dpci->masked)) )
>      {
>          struct pirq *pirq = dpci_pirq(pirq_dpci);
>          const struct dev_intx_gsi_link *digl;
> @@ -687,17 +680,13 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
>              send_guest_pirq(d, pirq);
>  
>              if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
> -            {
> -                spin_unlock(&d->event_lock);
> -                return;
> -            }
> +                break;
>          }
>  
>          if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
>          {
>              vmsi_deliver_pirq(d, pirq_dpci);
> -            spin_unlock(&d->event_lock);
> -            return;
> +            break;
>          }
>  
>          list_for_each_entry ( digl, &pirq_dpci->digl_list, list )
> @@ -710,8 +699,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
>          {
>              /* for translated MSI to INTx interrupt, eoi as early as possible */
>              __msi_pirq_eoi(pirq_dpci);
> -            spin_unlock(&d->event_lock);
> -            return;
> +            break;
>          }
>  
>          /*
> @@ -725,6 +713,8 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
>          set_timer(&pirq_dpci->timer, NOW() + PT_IRQ_TIME_OUT);
>      }
>      spin_unlock(&d->event_lock);
> +
> +    return more;
>  }
>  
>  static void __hvm_dpci_eoi(struct domain *d,
> @@ -781,7 +771,7 @@ unlock:
>  }
>  
>  /*
> - * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to
> + * Note: 'pt_pirq_softirq_reset' can move the state to init before we get to
>   * doing it. If that is the case we let 'pt_pirq_softirq_reset' do ref-counting.
>   */
>  static void dpci_softirq(void)
> @@ -797,23 +787,53 @@ static void dpci_softirq(void)
>      {
>          struct hvm_pirq_dpci *pirq_dpci;
>          struct domain *d;
> +        bool_t again = 0;
>  
>          pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
>          list_del(&pirq_dpci->softirq_list);
>  
>          d = pirq_dpci->dom;
>          smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> -        if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> -            BUG();
> -        /*
> -         * The one who clears STATE_SCHED MUST refcount the domain.
> -         */
> -        if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )
> +
> +        switch ( cmpxchg(&pirq_dpci->state, STATE_SCHED, STATE_RUN) )
> +        {
> +        case STATE_INIT:
> +            /*  pt_pirq_softirq_reset cleared it. Let it do the ref-counting. */
> +            continue;
> +        case STATE_RUN:
> +            again = 1;
> +            /* Fall through. */
> +        case STATE_SCHED:
> +            break;
> +        }
> +        if ( !again )
>          {
> -            hvm_dirq_assist(d, pirq_dpci);
> +            /*
> +            * The one who changes sched to running MUST refcount the domain.
> +            */
> +            again = hvm_dirq_assist(d, pirq_dpci);
>              put_domain(d);
> +            switch ( cmpxchg(&pirq_dpci->state, STATE_RUN, STATE_INIT) )
> +            {
> +            case STATE_SCHED:
> +            case STATE_INIT:
> +                BUG();
> +            case STATE_RUN:
> +                break;
> +            }
> +        }
> +        if ( again )
> +        {
> +            unsigned long flags;
> +
> +            /* Put back on the list and retry. */
> +            local_irq_save(flags);
> +            list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> +            local_irq_restore(flags);
> +
> +            raise_softirq(HVM_DPCI_SOFTIRQ);
> +            continue;
>          }
> -        clear_bit(STATE_RUN, &pirq_dpci->state);
>      }
>  }
>  
> diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
> index 3996f1f..48dbc51 100644
> --- a/xen/include/xen/hvm/irq.h
> +++ b/xen/include/xen/hvm/irq.h
> @@ -93,8 +93,8 @@ struct hvm_irq_dpci {
>  /* Machine IRQ to guest device/intx mapping. */
>  struct hvm_pirq_dpci {
>      uint32_t flags;
> -    unsigned int state;
> -    bool_t masked;
> +    enum { STATE_INIT, STATE_SCHED, STATE_RUN } state;
> +    atomic_t masked;
>      uint16_t pending;
>      struct list_head digl_list;
>      struct domain *dom;

  reply	other threads:[~2015-03-17 22:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-12 16:45 [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU Konrad Rzeszutek Wilk
2015-01-12 17:27 ` Sander Eikelenboom
2015-01-12 17:35 ` Konrad Rzeszutek Wilk
2015-01-13 10:26   ` Jan Beulich
2015-01-13 10:20 ` Jan Beulich
2015-01-23  1:44   ` Konrad Rzeszutek Wilk
2015-01-23  9:37     ` Jan Beulich
2015-01-23 14:54       ` Konrad Rzeszutek Wilk
2015-03-17 17:44       ` Konrad Rzeszutek Wilk
2015-03-17 22:16         ` Sander Eikelenboom [this message]
2015-03-18  7:41         ` Jan Beulich
2015-03-18 14:06           ` Konrad Rzeszutek Wilk
2015-03-18 16:43             ` Jan Beulich
2015-03-18 17:00               ` Konrad Rzeszutek Wilk
2015-03-19  7:15                 ` Jan Beulich
2015-02-02 14:29   ` Konrad Rzeszutek Wilk
2015-02-02 15:19     ` Jan Beulich
2015-02-02 15:31       ` Konrad Rzeszutek Wilk
2015-02-02 15:48         ` Jan Beulich
2015-02-02 17:44           ` Konrad Rzeszutek Wilk
2015-02-03  8:58             ` Jan Beulich
2015-03-16 17:59               ` Konrad Rzeszutek Wilk
2015-03-17  8:18                 ` Jan Beulich
2015-03-17  8:42                   ` Sander Eikelenboom
2015-03-17 14:54                     ` Konrad Rzeszutek Wilk
2015-03-17 16:01                       ` Jan Beulich
2015-03-17 16:09                         ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1988745644.20150317231628@eikelenboom.it \
    --to=linux@eikelenboom.it \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.