From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755703AbeASPEF (ORCPT ); Fri, 19 Jan 2018 10:04:05 -0500 Received: from mail.kernel.org ([198.145.29.99]:51622 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754542AbeASPD5 (ORCPT ); Fri, 19 Jan 2018 10:03:57 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E07421456 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Fri, 19 Jan 2018 10:03:53 -0500 From: Steven Rostedt To: Pavan Kondeti Cc: williams@redhat.com, Ingo Molnar , LKML , Peter Zijlstra , Thomas Gleixner , bristot@redhat.com, jkacur@redhat.com, efault@gmx.de, hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com, linux-tip-commits@vger.kernel.org Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic Message-ID: <20180119100353.7f9f5154@gandalf.local.home> In-Reply-To: References: <20170424114732.1aac6dc4@gandalf.local.home> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 19 Jan 2018 14:53:05 +0530 Pavan Kondeti wrote: > I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9 > stable kernel based system. This issue is observed only after > inclusion of this patch. It appears to me that rq->rd can change > between spinlock is acquired and released in rto_push_irq_work_func() > IRQ work if hotplug is in progress. It was only reported couple of > times during long stress testing. The issue can be easily reproduced > if an artificial delay is introduced between lock and unlock of > rto_lock. The rq->rd is changed under rq->lock, so we can protect this > race with rq->lock. The below patch solved the problem. we are taking > rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same > here. Please let me know your thoughts on this. As so rq->rd can change. Interesting. > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index d863d39..478192b 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work) > raw_spin_unlock(&rq->lock); > } > > + raw_spin_lock(&rq->lock); What about just saving the rd then? struct root_domain *rd; rd = READ_ONCE(rq->rd); then use that. Then we don't need to worry about it changing. -- Steve > raw_spin_lock(&rq->rd->rto_lock); > > /* Pass the IPI to the next rt overloaded queue */ > @@ -2291,11 +2292,10 @@ void rto_push_irq_work_func(struct irq_work *work) > > raw_spin_unlock(&rq->rd->rto_lock); > > - if (cpu < 0) > - return; > - > /* Try the next RT overloaded CPU */ > - irq_work_queue_on(&rq->rd->rto_push_work, cpu); > + if (cpu >= 0) > + irq_work_queue_on(&rq->rd->rto_push_work, cpu); > + raw_spin_unlock(&rq->lock); > } > #endif /* HAVE_RT_PUSH_IPI */ > > > Thanks, > Pavan >