All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavan Kondeti <pkondeti@codeaurora.org>
To: williams@redhat.com, Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	bristot@redhat.com, jkacur@redhat.com, efault@gmx.de,
	hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com
Cc: linux-tip-commits@vger.kernel.org,
	Pavan Kondeti <pkondeti@codeaurora.org>
Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic
Date: Fri, 19 Jan 2018 14:53:05 +0530	[thread overview]
Message-ID: <CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com> (raw)
In-Reply-To: <tip-4bdced5c9a2922521e325896a7bbbf0132c94e56@git.kernel.org>

Hi Steven,

>  /* Called from hardirq context */
> -static void try_to_push_tasks(void *arg)
> +void rto_push_irq_work_func(struct irq_work *work)
>  {
> -       struct rt_rq *rt_rq = arg;
> -       struct rq *rq, *src_rq;
> -       int this_cpu;
> +       struct rq *rq;
>         int cpu;
>
> -       this_cpu = rt_rq->push_cpu;
> +       rq = this_rq();
>
> -       /* Paranoid check */
> -       BUG_ON(this_cpu != smp_processor_id());
> -
> -       rq = cpu_rq(this_cpu);
> -       src_rq = rq_of_rt_rq(rt_rq);
> -
> -again:
> +       /*
> +        * We do not need to grab the lock to check for has_pushable_tasks.
> +        * When it gets updated, a check is made if a push is possible.
> +        */
>         if (has_pushable_tasks(rq)) {
>                 raw_spin_lock(&rq->lock);
> -               push_rt_task(rq);
> +               push_rt_tasks(rq);
>                 raw_spin_unlock(&rq->lock);
>         }
>
> -       /* Pass the IPI to the next rt overloaded queue */
> -       raw_spin_lock(&rt_rq->push_lock);
> -       /*
> -        * If the source queue changed since the IPI went out,
> -        * we need to restart the search from that CPU again.
> -        */
> -       if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
> -               rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
> -               rt_rq->push_cpu = src_rq->cpu;
> -       }
> +       raw_spin_lock(&rq->rd->rto_lock);
>
> -       cpu = find_next_push_cpu(src_rq);
> +       /* Pass the IPI to the next rt overloaded queue */
> +       cpu = rto_next_cpu(rq);
>
> -       if (cpu >= nr_cpu_ids)
> -               rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
> -       raw_spin_unlock(&rt_rq->push_lock);
> +       raw_spin_unlock(&rq->rd->rto_lock);
>
> -       if (cpu >= nr_cpu_ids)
> +       if (cpu < 0)
>                 return;

I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9
stable kernel based system. This issue is observed only after
inclusion of this patch. It appears to me that rq->rd can change
between spinlock is acquired and released in rto_push_irq_work_func()
IRQ work if hotplug is in progress. It was only reported couple of
times during long stress testing. The issue can be easily reproduced
if an artificial delay is introduced between lock and unlock of
rto_lock. The rq->rd is changed under rq->lock, so we can protect this
race with rq->lock. The below patch solved the problem. we are taking
rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same
here. Please let me know your thoughts on this.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index d863d39..478192b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work)
                raw_spin_unlock(&rq->lock);
        }

+       raw_spin_lock(&rq->lock);
        raw_spin_lock(&rq->rd->rto_lock);

        /* Pass the IPI to the next rt overloaded queue */
@@ -2291,11 +2292,10 @@ void rto_push_irq_work_func(struct irq_work *work)

        raw_spin_unlock(&rq->rd->rto_lock);

-       if (cpu < 0)
-               return;
-
        /* Try the next RT overloaded CPU */
-       irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+       if (cpu >= 0)
+               irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+       raw_spin_unlock(&rq->lock);
 }
 #endif /* HAVE_RT_PUSH_IPI */


Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project

  reply	other threads:[~2018-01-19  9:23 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 15:47 [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic Steven Rostedt
2017-05-04 14:41 ` Peter Zijlstra
2017-05-04 15:01   ` Steven Rostedt
2017-05-04 15:32 ` Peter Zijlstra
2017-05-04 17:25   ` Steven Rostedt
2017-05-04 18:42     ` Peter Zijlstra
2017-05-04 19:03       ` Steven Rostedt
2017-05-05  4:26         ` Mike Galbraith
2017-05-05  5:16           ` Mike Galbraith
2017-05-05 11:05         ` Peter Zijlstra
2017-05-05 12:02           ` Steven Rostedt
2017-05-05 17:39             ` Peter Zijlstra
2017-05-05 18:59               ` Steven Rostedt
2017-05-06  7:52                 ` Peter Zijlstra
2017-05-08  1:25 ` [lkp-robot] [sched/rt] bebe5811a9: hackbench.throughput -6% regression kernel test robot
2017-10-10 11:02 ` [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic tip-bot for Steven Rostedt (Red Hat)
2018-01-19  9:23   ` Pavan Kondeti [this message]
2018-01-19 15:03     ` Steven Rostedt
2018-01-19 15:44       ` Pavan Kondeti
2018-01-19 15:53         ` Steven Rostedt
2018-01-19 17:46       ` Pavan Kondeti
2018-01-19 18:11         ` Steven Rostedt
2018-01-19 18:12           ` Steven Rostedt
2018-01-19 18:57             ` Pavan Kondeti
2018-01-19 19:51               ` Steven Rostedt
2018-01-20  4:56                 ` Pavan Kondeti
2018-01-19 18:54           ` Pavan Kondeti
2018-02-06 11:54     ` [tip:sched/urgent] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() tip-bot for Steven Rostedt (VMware)
2018-02-07  4:14       ` Steven Rostedt
2018-02-06 11:54     ` [tip:sched/urgent] sched/rt: Up the root domain ref count when passing it around via IPIs tip-bot for Steven Rostedt (VMware)
2018-02-07  4:15       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com' \
    --to=pkondeti@codeaurora.org \
    --cc=bristot@redhat.com \
    --cc=efault@gmx.de \
    --cc=hpa@zytor.com \
    --cc=jkacur@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=swood@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.