All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: konrad@kernel.org, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tianyu Lan <tianyu.lan@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 1/5] tasklet: Introduce per-cpu tasklet for softirq.
Date: Thu, 27 Oct 2016 10:24:55 -0600	[thread overview]
Message-ID: <58124677020000780011A45E@prv-mh.provo.novell.com> (raw)
In-Reply-To: <1472153041-14220-2-git-send-email-konrad.wilk@oracle.com>

>>> On 25.08.16 at 21:23, <konrad.wilk@oracle.com> wrote:
> This implements a lockless per-cpu tasklet mechanism.

How does this correlate with the title? IOW, how is this new form of
tasklets "for" softirq? Perhaps part of the sentence above would be
a better fit for the title?

> The existing tasklet mechanism has a single global
> spinlock that is taken every-time the global list
> is touched. And we use this lock quite a lot - when
> we call do_tasklet_work which is called via an softirq
> and from the idle loop. We take the lock on any
> operation on the tasklet_list.
> 
> The problem we are facing is that there are quite a lot of
> tasklets scheduled. The most common one that is invoked is
> the one injecting the VIRQ_TIMER in the guest. Guests
> are not insane and don't set the one-shot or periodic
> clocks to be in sub 1ms intervals (causing said tasklet
> to be scheduled for such small intervalls).

I don't follow how this "guests are not insane" relates to the issue
described here. Plus - what if there was an insane guest?

> This is especially an problem with guests that have a
> large amount of VCPUs.
> 
> With this patch the problem disappears.
> 
> As such this patch introduces the code to setup
> softirq per-cpu tasklets and only modifies the PCI
> passthrough cases instead of doing it wholesale. This
> is done because:
>  - We want to easily bisect it if things break.
>  - We modify the code one section at a time to
>    make it easier to review this core code.

To me this looks to be contrary to the very first change I see:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1511,7 +1511,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
>      if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) /* teardown: hvm_funcs.vcpu_destroy */
>          goto fail3;
>  
> -    softirq_tasklet_init(
> +    percpu_tasklet_init(
>          &v->arch.hvm_vcpu.assert_evtchn_irq_tasklet,
>          (void(*)(unsigned long))hvm_assert_evtchn_irq,
>          (unsigned long)v);

Isn't this unrelated to pass-through?

> --- a/xen/common/tasklet.c
> +++ b/xen/common/tasklet.c
> @@ -31,10 +31,30 @@ static DEFINE_PER_CPU(struct list_head, 
> softirq_tasklet_list);
>  /* Protects all lists and tasklet structures. */
>  static DEFINE_SPINLOCK(tasklet_lock);
>  
> +static DEFINE_PER_CPU(struct list_head, softirq_list);

Please put this next to the other two. Which then makes more
obvious that some naming changes might be helpful: We already
have softirq_tasklet_list. Maybe percpu_list or - since this already
is a per-CPU variable - local_list? Otherwise at least a comment
should be added to clarify the different purposes of the three lists.

>  static void tasklet_enqueue(struct tasklet *t)
>  {
>      unsigned int cpu = t->scheduled_on;
>  
> +    if ( t->is_percpu )
> +    {
> +        unsigned long flags;
> +        struct list_head *list;
> +
> +        INIT_LIST_HEAD(&t->list);

Why? You list_add_tail() below unconditionally.

> +        BUG_ON( !t->is_softirq );

If this is a requirement, wouldn't the two boolean flags better be
folded into a tristate enum, with the if() sequence here converted
to a switch()?

> +        BUG_ON( cpu != smp_processor_id() ); /* Not implemented yet. */
> +
> +        local_irq_save(flags);

Considering that the pre-existing two cases in this function don't do
any locking themselves, is the asymmetry to do the "locking" here a
good idea?

> +        list = &__get_cpu_var(softirq_list);
> +        list_add_tail(&t->list, list);
> +        raise_softirq(TASKLET_SOFTIRQ_PERCPU);
> +
> +        local_irq_restore(flags);
> +        return;
> +    }
>      if ( t->is_softirq )

Instead of return, please use "else if" here (unless converting to
switch() anyway).

> @@ -56,16 +76,25 @@ void tasklet_schedule_on_cpu(struct tasklet *t, unsigned int cpu)
>  {
>      unsigned long flags;
>  
> -    spin_lock_irqsave(&tasklet_lock, flags);
> +    if ( !tasklets_initialised || t->is_dead )
> +        return;
>  
> -    if ( tasklets_initialised && !t->is_dead )
> +    if ( t->is_percpu )
>      {
> -        t->scheduled_on = cpu;
> -        if ( !t->is_running )
> +        if ( !test_and_set_bit(TASKLET_STATE_SCHED, &t->state) )
>          {
> -            list_del(&t->list);
> +            t->scheduled_on = cpu;
>              tasklet_enqueue(t);
>          }
> +        return;
> +    }
> +    spin_lock_irqsave(&tasklet_lock, flags);

Blank line above this one please.

> @@ -104,6 +133,66 @@ static void do_tasklet_work(unsigned int cpu, struct list_head *list)
>      }
>  }
>  
> +void do_tasklet_work_percpu(void)

static

> +{
> +    struct tasklet *t = NULL;
> +    struct list_head *head;
> +    bool_t poke = 0;

bool / false

> +    local_irq_disable();
> +    head = &__get_cpu_var(softirq_list);
> +
> +    if ( !list_empty(head) )
> +    {
> +        t = list_entry(head->next, struct tasklet, list);
> +
> +        if ( head->next == head->prev ) /* One singular item. Re-init head. */

Do we have no list_*() function for this?

> +            INIT_LIST_HEAD(&__get_cpu_var(softirq_list));
> +        else
> +        {
> +            /* Multiple items, update head to skip 't'. */
> +            struct list_head *list;
> +
> +            /* One item past 't'. */
> +            list = head->next->next;
> +
> +            BUG_ON(list == NULL);
> +
> +            /* And update head to skip 't'. Note that t->list.prev still
> +             * points to head, but we don't care as we only process one tasklet
> +             * and once done the tasklet list is re-init one way or another.
> +             */
> +            head->next = list;

Why can't you use list_del() for all of the above, including the
INIT_LIST_HEAD() in the if branch?

> +            poke = 1;
> +        }
> +    }
> +    local_irq_enable();
> +
> +    if ( !t )
> +        return; /* Never saw it happend, but we might have a spurious case? */
> +
> +    if ( tasklet_trylock(t) )
> +    {
> +        if ( !test_and_clear_bit(TASKLET_STATE_SCHED, &t->state) )
> +                BUG();

Indentation.

> +        sync_local_execstate();
> +        t->func(t->data);
> +        tasklet_unlock(t);
> +        if ( poke )
> +            raise_softirq(TASKLET_SOFTIRQ_PERCPU);
> +        /* We could reinit the t->list but tasklet_enqueue does it for us. */

I can't see why re-initing would be needed here or there, nor why
do_tasklet_work() does so.

> +        return;
> +    }
> +
> +    local_irq_disable();
> +
> +    INIT_LIST_HEAD(&t->list);
> +    list_add_tail(&t->list, &__get_cpu_var(softirq_list));
> +    smp_wmb();
> +    raise_softirq(TASKLET_SOFTIRQ_PERCPU);
> +    local_irq_enable();

tasklet_enqueue()? The only difference appears to be the barrier
you have here but not there (and I don't think you need it).

> @@ -147,10 +236,29 @@ static void tasklet_softirq_action(void)
>      spin_unlock_irq(&tasklet_lock);
>  }
>  
> +/* Per CPU softirq context work. */
> +static void tasklet_softirq_percpu_action(void)
> +{
> +    do_tasklet_work_percpu();
> +}

Considering this is the only caller of the function - why do you need
two functions here?

>  void tasklet_kill(struct tasklet *t)
>  {
>      unsigned long flags;
>  
> +    if ( t->is_percpu )
> +    {
> +        while ( test_and_set_bit(TASKLET_STATE_SCHED, &t->state) )
> +        {
> +            do {
> +                process_pending_softirqs();
> +            } while ( test_bit(TASKLET_STATE_SCHED, &t->state) );
> +        }
> +        tasklet_unlock_wait(t);
> +        clear_bit(TASKLET_STATE_SCHED, &t->state);
> +        t->is_dead = 1;

true

> --- a/xen/include/xen/tasklet.h
> +++ b/xen/include/xen/tasklet.h
> @@ -17,21 +17,24 @@
>  struct tasklet
>  {
>      struct list_head list;
> +    unsigned long state;
>      int scheduled_on;
>      bool_t is_softirq;
>      bool_t is_running;
>      bool_t is_dead;
> +    bool_t is_percpu;
>      void (*func)(unsigned long);
>      unsigned long data;
>  };
>  
> -#define _DECLARE_TASKLET(name, func, data, softirq)                     \
> +#define _DECLARE_TASKLET(name, func, data, softirq, percpu)             \
>      struct tasklet name = {                                             \
> -        LIST_HEAD_INIT(name.list), -1, softirq, 0, 0, func, data }
> +        LIST_HEAD_INIT(name.list), 0, -1, softirq, 0, 0, percpu,        \
> +        func, data }

May I ask that you switch to designated member initializers if you
need to touch this already anyway?

> @@ -40,6 +43,54 @@ DECLARE_PER_CPU(unsigned long, tasklet_work_to_do);
>  #define TASKLET_enqueued   (1ul << _TASKLET_enqueued)
>  #define TASKLET_scheduled  (1ul << _TASKLET_scheduled)
>  
> +/* These fancy bit manipulations (bit 0 and bit 1) along with using a lock

Please avoid referring to "locked" operations outside of x86 code.
ITYM "atomic".

> + * operation allow us to have four stages in tasklet life-time.
> + *  a) 0x0: Completely empty (not scheduled nor running).
> + *  b) 0x1: Scheduled but not running. Used to guard in 'tasklet_schedule'
> + *     such that we will only schedule one. If it is scheduled and had never
> + *     run (hence never clearing STATE_SCHED bit), tasklet_kill will spin
> + *     forever on said tasklet. However 'tasklet_schedule' raises the
> + *     softirq associated with the per-cpu - so it will run, albeit there might
> + *     be a race (tasklet_kill spinning until the softirq handler runs).

And is there a guarantee the handler can actually run while
tasklet_kill() spins?

> + *  c) 0x2: it is running (only on one CPU) and can be scheduled on any
> + *     CPU. The bit 0 - scheduled is cleared at this stage allowing
> + *     'tasklet_schedule' to succesfully schedule.
> + *  d) 0x3: scheduled and running - only possible if the running tasklet calls
> + *     tasklet_schedule (on same CPU) or the tasklet is scheduled from another
> + *     CPU while the tasklet is running on another CPU.
> + *
> + * The two bits play a vital role in assuring that the tasklet is scheduled
> + * once and runs only once. The steps are:
> + *
> + *  1) tasklet_schedule: STATE_SCHED bit set (0x1), added on the per cpu list.
> + *  2) tasklet_softirq_percpu_action picks one tasklet from the list. Schedules
> + *  itself later if there are more tasklets on it. Tries to set STATE_RUN bit
> + *  (0x3) - if it fails adds tasklet back to the per-cpu list. If it succeeds
> + *  clears the STATE_SCHED bit (0x2).

Why are these two steps? Can't you transition 1 -> 2 in one go
(using cmpxchg())?

> Once tasklet completed, unsets STATE_RUN
> + *  (0x0 or 0x1 if tasklet called tasklet_schedule).
> + */
> +enum {
> +    TASKLET_STATE_SCHED, /* Bit 0 */
> +    TASKLET_STATE_RUN
> +};
> +
> +static inline int tasklet_trylock(struct tasklet *t)

bool and const

> +{
> +    return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
> +}
> +
> +static inline void tasklet_unlock(struct tasklet *t)
> +{
> +    barrier();
> +    clear_bit(TASKLET_STATE_RUN, &(t)->state);
> +}
> +static inline void tasklet_unlock_wait(struct tasklet *t)

const

> +{
> +    while (test_bit(TASKLET_STATE_RUN, &(t)->state))

Missing blanks.

> +    {
> +        barrier();
> +    }

Unnecessary braces.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-10-27 16:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 19:23 [PATCH v2] Convert tasklet to per-cpu tasklet Konrad Rzeszutek Wilk
2016-08-25 19:23 ` [PATCH v2 1/5] tasklet: Introduce per-cpu tasklet for softirq Konrad Rzeszutek Wilk
2016-08-26  9:43   ` Wei Liu
2016-10-27 16:24   ` Jan Beulich [this message]
2016-08-25 19:23 ` [PATCH v2 2/5] tasklet: Add cross CPU feeding of per-cpu tasklets Konrad Rzeszutek Wilk
2016-10-28 10:37   ` Jan Beulich
2016-08-25 19:23 ` [PATCH v2 3/5] tasklet: Remove the old-softirq implementation Konrad Rzeszutek Wilk
2016-10-28 12:37   ` Jan Beulich
2016-08-25 19:24 ` [PATCH v2 4/5] tasklet: Introduce per-cpu tasklet for schedule tasklet Konrad Rzeszutek Wilk
2016-08-25 19:24 ` [PATCH v2 5/5] tasklet: Remove the scaffolding Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58124677020000780011A45E@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=konrad@kernel.org \
    --cc=tianyu.lan@intel.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.