linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Sudeep Holla <sudeep.holla@arm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cpufreq: flush any pending policy update work scheduled before freeing
Date: Fri, 18 Oct 2019 11:32:47 +0530	[thread overview]
Message-ID: <20191018060247.g5asfuh3kncoj7kl@vireshk-i7> (raw)
In-Reply-To: <20191018055533.GC31836@e107533-lin.cambridge.arm.com>

On 18-10-19, 06:55, Sudeep Holla wrote:
> On Thu, Oct 17, 2019 at 11:26:54PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Oct 17, 2019 at 9:36 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 6:35 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > >
> > > > dev_pm_qos_remove_request ends calling {max,min}_freq_req QoS notifiers
> > > > which schedule policy update work. It may end up racing with the freeing
> > > > the policy and unregistering the driver.
> > > >
> > > > One possible race is as below where the cpufreq_driver is unregistered
> > > > but the scheduled work gets executed at later stage when cpufreq_driver
> > > > is NULL(i.e. after freeing the policy and driver)
> > > >
> > > > Unable to handle kernel NULL pointer dereference at virtual address 0000001c
> > > > pgd = (ptrval)
> > > > [0000001c] *pgd=80000080204003, *pmd=00000000
> > > > Internal error: Oops: 206 [#1] SMP THUMB2
> > > > Modules linked in:
> > > > CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 5.4.0-rc3-00006-g67f5a8081a4b #86
> > > > Hardware name: ARM-Versatile Express
> > > > Workqueue: events handle_update
> > > > PC is at cpufreq_set_policy+0x58/0x228
> > > > LR is at dev_pm_qos_read_value+0x77/0xac
> > > > Control: 70c5387d  Table: 80203000  DAC: fffffffd
> > > > Process kworker/0:1 (pid: 34, stack limit = 0x(ptrval))
> > > >         (cpufreq_set_policy) from (refresh_frequency_limits.part.24+0x37/0x48)
> > > >         (refresh_frequency_limits.part.24) from (handle_update+0x2f/0x38)
> > > >         (handle_update) from (process_one_work+0x16d/0x3cc)
> > > >         (process_one_work) from (worker_thread+0xff/0x414)
> > > >         (worker_thread) from (kthread+0xff/0x100)
> > > >         (kthread) from (ret_from_fork+0x11/0x28)
> > > >
> > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > > ---
> > > >  drivers/cpufreq/cpufreq.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > Hi Rafael, Viresh,
> > > >
> > > > This fixed the boot issue I reported[1] on TC2 with bL switcher enabled.
> > > > I have based this patch on -rc3 and not on top of your patches. This
> > > > only fixes the boot issue but I hit the other crashes while continuously
> > > > switching on and off the bL switcher that register/unregister the driver
> > > > Your patch series fixes them. I can based this on top of those if you
> > > > prefer.
> > > >
> > > > Regards,
> > > > Sudeep
> > > >
> > > > [1] https://lore.kernel.org/linux-pm/20191015155735.GA29105@bogus/
> > > >
> > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > > index c52d6fa32aac..b703c29a84be 100644
> > > > --- a/drivers/cpufreq/cpufreq.c
> > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > @@ -1278,6 +1278,9 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> > > >         }
> > > >
> > > >         dev_pm_qos_remove_request(policy->min_freq_req);
> > > > +       /* flush the pending policy->update work before freeing the policy */
> > > > +       if (work_pending(&policy->update))
> > >
> > > Isn't this racy?
> > >
> > > It still may be running if the pending bit is clear and we still need
> > > to wait for it then, don't we?
> > >
> > > Why don't you do an unconditional flush_work() here?
> > 
> > You may as well do a cancel_work_sync() here, because whether or not
> > the last update of the policy happens before it goes away is a matter
> > of timing in any case
> 
> In fact that's the first thing I tried to fix the issue I was seeing.
> But I then thought it would be better to complete the update as the PM
> QoS were getting updated back to DEFAULT values for the device. Even
> this works.
> 
> What is your preference ? flush_work or cancel_work_sync ? I will
> update accordingly. I may need to do some more testing with
> cancel_work_sync as I just checked that quickly to confirm the race.

As I said in the other email, this work didn't come as a result of
removal of the qos request from cpufreq core and so must have come
from other thermal or similar events. In that case maybe doing
flush_work() is better before we remove the cpufreq driver. Though
Rafael's timing related comment makes sense as well, but now that we
have received the work before policy is removed, I will rather
complete the work and quit.

-- 
viresh

  reply	other threads:[~2019-10-18  6:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 16:35 [PATCH] cpufreq: flush any pending policy update work scheduled before freeing Sudeep Holla
2019-10-17 19:36 ` Rafael J. Wysocki
2019-10-17 21:26   ` Rafael J. Wysocki
2019-10-18  5:55     ` Sudeep Holla
2019-10-18  6:02       ` Viresh Kumar [this message]
2019-10-18  7:32         ` Rafael J. Wysocki
2019-10-18  8:03           ` Viresh Kumar
2019-10-18  8:19             ` Rafael J. Wysocki
2019-10-18  8:25               ` Viresh Kumar
2019-10-18 10:19         ` Sudeep Holla
2019-10-18 10:37           ` Rafael J. Wysocki
2019-10-18 11:06             ` Sudeep Holla
2019-10-21  0:14               ` Rafael J. Wysocki
2019-10-21 10:33                 ` Sudeep Holla
2019-10-21  2:15               ` Viresh Kumar
2019-10-21  8:20                 ` Rafael J. Wysocki
2019-10-21 10:27                 ` Sudeep Holla
2019-10-21 10:55                   ` Viresh Kumar
2019-10-18  5:47   ` Sudeep Holla
2019-10-18  5:38 ` Viresh Kumar
2019-10-18  7:53   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191018060247.g5asfuh3kncoj7kl@vireshk-i7 \
    --to=viresh.kumar@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=sudeep.holla@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).