All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Stephen Boyd <sboyd@codeaurora.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"cpufreq@vger.kernel.org" <cpufreq@vger.kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Kukjin Kim <kgene.kim@samsung.com>
Subject: Re: mutex warning in cpufreq + RFC patch
Date: Sat, 31 Aug 2013 02:55:57 +0200	[thread overview]
Message-ID: <2143062.PTWIfM6mhl@vostro.rjw.lan> (raw)
In-Reply-To: <20130831003641.GE19754@codeaurora.org>

On Friday, August 30, 2013 05:36:41 PM Stephen Boyd wrote:
> On 08/29, Viresh Kumar wrote:
> > On 28 August 2013 22:22, Stephen Boyd <sboyd@codeaurora.org> wrote:
> > >
> > > I've applied these patches on top of v3.10
> > >
> > > f51e1eb63d9c28cec188337ee656a13be6980cfd (cpufreq: Fix cpufreq regression after suspend/resume
> > > aae760ed21cd690fe8a6db9f3a177ad55d7e12ab (cpufreq: Revert commit a66b2e to fix suspend/resume regression)
> > > e8d05276f236ee6435e78411f62be9714e0b9377 (cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression)
> > > 2a99859932281ed6c2ecdd988855f8f6838f6743 (cpufreq: Fix cpufreq driver module refcount balance after suspend/resume)
> > > 419e172145cf6c51d436a8bf4afcd17511f0ff79 (cpufreq: don't leave stale policy pointer in cdbs->cur_policy)
> > > 95731ebb114c5f0c028459388560fc2a72fe5049 (cpufreq: Fix governor start/stop race condition)
> > >
> > > That second to last one causes a NULL pointer exception after the mutex
> > > warning above because the limits case does
> > >
> > >     if (policy->max < cpu_cdbs->cur_policy->cur)
> > >
> > > and that dereferences a NULL cur_policy pointer.
> > 
> > I have seen something similar and the error checking patch that
> > I mentioned earlier came as solution to that only..
> 
> Yes that patch may reduce the chance of the race condition but I
> don't believe it removes it entirely. I believe this bug still
> exists in linux-next. Consider the scenario where CPU1 is going
> down.
> 
> __cpufreq_remove_dev()
>  ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
>   __cpufreq_governor()
>    policy->governor->governor(policy, CPUFREQ_GOV_STOP);
>     cpufreq_governor_dbs()
>      case CPUFREQ_GOV_STOP:
>       mutex_destroy(&cpu_cdbs->timer_mutex)
>       cpu_cdbs->cur_policy = NULL;
>   <PREEMPT>
> store()
>  __cpufreq_set_policy()
>   ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>    __cpufreq_governor()
>     policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
>      case CPUFREQ_GOV_LIMITS:
>       mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
>        if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
> 
> Once we stop the governor I don't see how another thread can't
> race in and get all the way down into the GOV_LIMITS case. Even
> if we wanted to lock out that thread with some mutex or semaphore
> it will have to continue running eventually and so we really need
> to wait until all the sysfs files are gone before we stop the
> governor (in the case of the last cpu for the policy) or we need
> to stop and start the governor while holding the policy semaphore
> to prevent a race.
> 
> > 
> > > Are there any fixes that I'm missing? I see that some things are
> > > changing in linux-next but they don't look like fixes, more like
> > > optimizations.
> > 
> > Getting patches over 3.10 would be tricky.. You are two kernel
> > version back and that's not going to help much.. There are too many
> > patches in between linux-next and 3.10..
> >
> > 
> > I really can't tell you which specific ones to include, as I am lost in them :)
> 
> That's a problem. 3.10 is the next long term stable kernel and so we need to
> backport any fixes to 3.10 for the next two years. Hopefully these bugs I'm
> finding in the 3.10 stable kernel's cpufreq code aren't known issues on
> 3.11/next.

No, they aren't.

Well, that's the main reason why I've been pushing back against more churn in
the cpuidle subsystem recently.  I think we went too far with changes that
were not entirely understood and now we're seeing the fallout.

It would be great if you could identify the 3.11 changes that fix problems
you're seeing in 3.10.y (doing a reverse bisect of drivers/cpufreq/ changes
might help, since you have reproducers it seems).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

  reply	other threads:[~2013-08-31  0:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-28  2:57 mutex warning in cpufreq + RFC patch Stephen Boyd
2013-08-28  2:57 ` Stephen Boyd
2013-08-28  6:58 ` Viresh Kumar
2013-08-28  6:58   ` Viresh Kumar
2013-08-28 16:52   ` Stephen Boyd
2013-08-29  8:37     ` Viresh Kumar
2013-08-29  8:39       ` Viresh Kumar
2013-08-31  0:36       ` Stephen Boyd
2013-08-31  0:55         ` Rafael J. Wysocki [this message]
2013-08-31  0:59           ` Rafael J. Wysocki
2013-09-01  6:24         ` Viresh Kumar
2013-09-01 13:22           ` Rafael J. Wysocki
2013-09-01 16:21             ` Viresh Kumar
2013-09-03 13:18               ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2143062.PTWIfM6mhl@vostro.rjw.lan \
    --to=rjw@sisk.pl \
    --cc=cpufreq@vger.kernel.org \
    --cc=kgene.kim@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=sboyd@codeaurora.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.