All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sudeep Holla <sudeep.holla@arm.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cpufreq: flush any pending policy update work scheduled before freeing
Date: Mon, 21 Oct 2019 11:33:29 +0100	[thread overview]
Message-ID: <20191021103329.GB21581@bogus> (raw)
In-Reply-To: <CAJZ5v0hnvahJ6bu6SEVvavb1kRe=X0wP_JUS26h_d019u5z7PA@mail.gmail.com>

On Mon, Oct 21, 2019 at 02:14:51AM +0200, Rafael J. Wysocki wrote:
> On Fri, Oct 18, 2019 at 1:06 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:37:51PM +0200, Rafael J. Wysocki wrote:
> > > On Friday, October 18, 2019 12:19:24 PM CEST Sudeep Holla wrote:
> > > > On Fri, Oct 18, 2019 at 11:32:47AM +0530, Viresh Kumar wrote:
> > > > > On 18-10-19, 06:55, Sudeep Holla wrote:
> > > > > > On Thu, Oct 17, 2019 at 11:26:54PM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Thu, Oct 17, 2019 at 9:36 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 6:35 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > > > > > > >
> > > > > > > > > dev_pm_qos_remove_request ends calling {max,min}_freq_req QoS notifiers
> > > > > > > > > which schedule policy update work. It may end up racing with the freeing
> > > > > > > > > the policy and unregistering the driver.
> > > > > > > > >
> > > > > > > > > One possible race is as below where the cpufreq_driver is unregistered
> > > > > > > > > but the scheduled work gets executed at later stage when cpufreq_driver
> > > > > > > > > is NULL(i.e. after freeing the policy and driver)
> > > > > > > > >
> > > > > > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000001c
> > > > > > > > > pgd = (ptrval)
> > > > > > > > > [0000001c] *pgd=80000080204003, *pmd=00000000
> > > > > > > > > Internal error: Oops: 206 [#1] SMP THUMB2
> > > > > > > > > Modules linked in:
> > > > > > > > > CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 5.4.0-rc3-00006-g67f5a8081a4b #86
> > > > > > > > > Hardware name: ARM-Versatile Express
> > > > > > > > > Workqueue: events handle_update
> > > > > > > > > PC is at cpufreq_set_policy+0x58/0x228
> > > > > > > > > LR is at dev_pm_qos_read_value+0x77/0xac
> > > > > > > > > Control: 70c5387d  Table: 80203000  DAC: fffffffd
> > > > > > > > > Process kworker/0:1 (pid: 34, stack limit = 0x(ptrval))
> > > > > > > > >         (cpufreq_set_policy) from (refresh_frequency_limits.part.24+0x37/0x48)
> > > > > > > > >         (refresh_frequency_limits.part.24) from (handle_update+0x2f/0x38)
> > > > > > > > >         (handle_update) from (process_one_work+0x16d/0x3cc)
> > > > > > > > >         (process_one_work) from (worker_thread+0xff/0x414)
> > > > > > > > >         (worker_thread) from (kthread+0xff/0x100)
> > > > > > > > >         (kthread) from (ret_from_fork+0x11/0x28)
> > > > > > > > >
> > > > > > > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > > > > > > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/cpufreq/cpufreq.c | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > >
> > > > > > > > > Hi Rafael, Viresh,
> > > > > > > > >
> > > > > > > > > This fixed the boot issue I reported[1] on TC2 with bL switcher enabled.
> > > > > > > > > I have based this patch on -rc3 and not on top of your patches. This
> > > > > > > > > only fixes the boot issue but I hit the other crashes while continuously
> > > > > > > > > switching on and off the bL switcher that register/unregister the driver
> > > > > > > > > Your patch series fixes them. I can based this on top of those if you
> > > > > > > > > prefer.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Sudeep
> > > > > > > > >
> > > > > > > > > [1] https://lore.kernel.org/linux-pm/20191015155735.GA29105@bogus/
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > > > > > > > index c52d6fa32aac..b703c29a84be 100644
> > > > > > > > > --- a/drivers/cpufreq/cpufreq.c
> > > > > > > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > > > > > > @@ -1278,6 +1278,9 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >         dev_pm_qos_remove_request(policy->min_freq_req);
> > > > > > > > > +       /* flush the pending policy->update work before freeing the policy */
> > > > > > > > > +       if (work_pending(&policy->update))
> > > > > > > >
> > > > > > > > Isn't this racy?
> > > > > > > >
> > > > > > > > It still may be running if the pending bit is clear and we still need
> > > > > > > > to wait for it then, don't we?
> > > > > > > >
> > > > > > > > Why don't you do an unconditional flush_work() here?
> > > > > > >
> > > > > > > You may as well do a cancel_work_sync() here, because whether or not
> > > > > > > the last update of the policy happens before it goes away is a matter
> > > > > > > of timing in any case
> > > > > >
> > > > > > In fact that's the first thing I tried to fix the issue I was seeing.
> > > > > > But I then thought it would be better to complete the update as the PM
> > > > > > QoS were getting updated back to DEFAULT values for the device. Even
> > > > > > this works.
> > > > > >
> > > > > > What is your preference ? flush_work or cancel_work_sync ? I will
> > > > > > update accordingly. I may need to do some more testing with
> > > > > > cancel_work_sync as I just checked that quickly to confirm the race.
> > > > >
> > > > > As I said in the other email, this work didn't come as a result of
> > > > > removal of the qos request from cpufreq core and so must have come
> > > > > from other thermal or similar events.
> > > >
> > > > I don't think so. For sure not because of any thermal events. I didn't
> > > > have log handy and hence had to wait till I was next to hardware.
> > > >
> > > > This is log:
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request max before
> > > >  cpufreq: cpufreq_notifier_max: schedule_work(&policy->update)
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request max after
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request min before
> > > >  cpufreq: cpufreq_notifier_min: schedule_work(&policy->update)
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request min after
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request max before
> > > >  cpufreq: cpufreq_notifier_max: schedule_work(&policy->update)
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request max after
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request min before
> > > >  cpufreq: cpufreq_notifier_min: schedule_work(&policy->update)
> > > >  cpufreq: cpufreq_policy_free: dev_pm_qos_remove_request min after
> > > >
> > > > So if I move the call above, it still crashes as the work is getting
> > > > scheduled later.
> > >
> > > OK, please cancel the work after dropping the last request.
> > >
> > > We still need to understand what is going on here, but the crash needs to be
> > > prevented from occurring in the first place IMO.
> > >
> > Callstack is:
> >
> > (cpufreq_notifier_max)
> > (notifier_call_chain)
> > (blocking_notifier_call_chain)
> > (pm_qos_update_target)
> > (freq_qos_apply)
> > (freq_qos_remove_request)
> > (cpufreq_policy_free)
> > (subsys_interface_unregister)
> > (cpufreq_unregister_driver)
>
> That may be due to a bug in one of my patches (it's adding one of the
> notifiers to a wrong list).
>

Ah that explains, I was wondering what changed as it's working now but
was not the case when I tried earlier and I had to keep cancel_work_sync
after dev_pm_qos_remove_request

> Please re-test with the current linux-next branch that I've just pushed.

Yes, it did that and it now works fine even if I move the cancel_work_sync
call earlier just after freq_qos_remove_notifier.

If you/Viresh prefer the call to cancel_work_sync to be moved up, that
should be fine now. I have sent the delta for reference in other reply.

--
Regards,
Sudeep

  reply	other threads:[~2019-10-21 10:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 16:35 [PATCH] cpufreq: flush any pending policy update work scheduled before freeing Sudeep Holla
2019-10-17 19:36 ` Rafael J. Wysocki
2019-10-17 21:26   ` Rafael J. Wysocki
2019-10-18  5:55     ` Sudeep Holla
2019-10-18  6:02       ` Viresh Kumar
2019-10-18  7:32         ` Rafael J. Wysocki
2019-10-18  8:03           ` Viresh Kumar
2019-10-18  8:19             ` Rafael J. Wysocki
2019-10-18  8:25               ` Viresh Kumar
2019-10-18 10:19         ` Sudeep Holla
2019-10-18 10:37           ` Rafael J. Wysocki
2019-10-18 11:06             ` Sudeep Holla
2019-10-21  0:14               ` Rafael J. Wysocki
2019-10-21 10:33                 ` Sudeep Holla [this message]
2019-10-21  2:15               ` Viresh Kumar
2019-10-21  8:20                 ` Rafael J. Wysocki
2019-10-21 10:27                 ` Sudeep Holla
2019-10-21 10:55                   ` Viresh Kumar
2019-10-18  5:47   ` Sudeep Holla
2019-10-18  5:38 ` Viresh Kumar
2019-10-18  7:53   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191021103329.GB21581@bogus \
    --to=sudeep.holla@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.