linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wei Wang <wei.vince.wang@gmail.com>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] thermal: Fix deadlock in thermal thermal_zone_device_check
Date: Tue, 3 Dec 2019 09:55:43 -0800	[thread overview]
Message-ID: <CAMFybE4aSo7AK20T0Hkjo-R-BkjaYtqz-FMF8pS_p_GeLg5tJA@mail.gmail.com> (raw)
In-Reply-To: <57162b7c-cf34-2b55-a17c-40d96a0ab105@linaro.org>

On Tue, Dec 3, 2019 at 9:15 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
> On 12/11/2019 21:42, Wei Wang wrote:
> > commit [2] changed cancel_delayed_work to cancel_delayed_work_sync to
> > avoid a use-after-free issue. However, cancel_delayed_work_sync could be
> > called insides the WQ causing deadlock [1].
> >
> > [1]
> > [54109.642398] c0   1162 kworker/u17:1   D    0 11030      2 0x00000000
> > [54109.642437] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
> > [54109.642447] c0   1162 Call trace:
> > [54109.642456] c0   1162  __switch_to+0x138/0x158
> > [54109.642467] c0   1162  __schedule+0xba4/0x1434
> > [54109.642480] c0   1162  schedule_timeout+0xa0/0xb28
> > [54109.642492] c0   1162  wait_for_common+0x138/0x2e8
> > [54109.642511] c0   1162  flush_work+0x348/0x40c
> > [54109.642522] c0   1162  __cancel_work_timer+0x180/0x218
> > [54109.642544] c0   1162  handle_thermal_trip+0x2c4/0x5a4
> > [54109.642553] c0   1162  thermal_zone_device_update+0x1b4/0x25c
> > [54109.642563] c0   1162  thermal_zone_device_check+0x18/0x24
> > [54109.642574] c0   1162  process_one_work+0x3cc/0x69c
> > [54109.642583] c0   1162  worker_thread+0x49c/0x7c0
> > [54109.642593] c0   1162  kthread+0x17c/0x1b0
> > [54109.642602] c0   1162  ret_from_fork+0x10/0x18
> > [54109.643051] c0   1162 kworker/u17:2   D    0 16245      2 0x00000000
> > [54109.643067] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
> > [54109.643077] c0   1162 Call trace:
> > [54109.643085] c0   1162  __switch_to+0x138/0x158
> > [54109.643095] c0   1162  __schedule+0xba4/0x1434
> > [54109.643104] c0   1162  schedule_timeout+0xa0/0xb28
> > [54109.643114] c0   1162  wait_for_common+0x138/0x2e8
> > [54109.643122] c0   1162  flush_work+0x348/0x40c
> > [54109.643131] c0   1162  __cancel_work_timer+0x180/0x218
> > [54109.643141] c0   1162  handle_thermal_trip+0x2c4/0x5a4
> > [54109.643150] c0   1162  thermal_zone_device_update+0x1b4/0x25c
> > [54109.643159] c0   1162  thermal_zone_device_check+0x18/0x24
> > [54109.643167] c0   1162  process_one_work+0x3cc/0x69c
> > [54109.643177] c0   1162  worker_thread+0x49c/0x7c0
> > [54109.643186] c0   1162  kthread+0x17c/0x1b0
> > [54109.643195] c0   1162  ret_from_fork+0x10/0x18
> > [54109.644500] c0   1162 cat             D    0  7766      1 0x00000001
> > [54109.644515] c0   1162 Call trace:
> > [54109.644524] c0   1162  __switch_to+0x138/0x158
> > [54109.644536] c0   1162  __schedule+0xba4/0x1434
> > [54109.644546] c0   1162  schedule_preempt_disabled+0x80/0xb0
> > [54109.644555] c0   1162  __mutex_lock+0x3a8/0x7f0
> > [54109.644563] c0   1162  __mutex_lock_slowpath+0x14/0x20
> > [54109.644575] c0   1162  thermal_zone_get_temp+0x84/0x360
> > [54109.644586] c0   1162  temp_show+0x30/0x78
> > [54109.644609] c0   1162  dev_attr_show+0x5c/0xf0
> > [54109.644628] c0   1162  sysfs_kf_seq_show+0xcc/0x1a4
> > [54109.644636] c0   1162  kernfs_seq_show+0x48/0x88
> > [54109.644656] c0   1162  seq_read+0x1f4/0x73c
> > [54109.644664] c0   1162  kernfs_fop_read+0x84/0x318
> > [54109.644683] c0   1162  __vfs_read+0x50/0x1bc
> > [54109.644692] c0   1162  vfs_read+0xa4/0x140
> > [54109.644701] c0   1162  SyS_read+0xbc/0x144
> > [54109.644708] c0   1162  el0_svc_naked+0x34/0x38
> > [54109.845800] c0   1162 D 720.000s 1->7766->7766 cat [panic]
> >
> > Fixes commit 1851799e1d29 ("thermal: Fix use-after-free when
> > unregistering thermal zone device") [2]
>
> It does not fix the problem actually. It is preferable to revert the
> commit 1851799e1d29.
>
The problem as you mentioned is "This one (WQ) blocks because
cancel_delayed_work_sync is owning the lock which in turn waits for
the work to end"
AFAICT, thermal_zone_device_unregister should not take tz->lock and
either be called in WQ context.

>
> The cancel delayed work sync takes the mutex while a cat
> /sys/class/thermal/thermal_zone0/temp happens which takes the mutex also
> and at the same moment a thermal_zone_device_update() call is done in
> the workqueue context. This one blocks because cancel_delayed_work_sync
> is owning the lock which in turn waits for the work to end.
>
> IMO, that deserves a deeper investigation with the mutex. Probably the
> fix would be to refcount the thermal zone device and see if we can get
> rid of the mutex in some places. If the refcounting is used, the
> function thermal_zone_device_unregister() will be called with the
> guarantee nobody will use it and the mutex can be removed.
>
> > Signed-off-by: Wei Wang <wvw@google.com>
> > ---
> >  drivers/thermal/thermal_core.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index d4481cc8958f..c28271817e43 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -304,7 +304,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
> >                                &tz->poll_queue,
> >                                msecs_to_jiffies(delay));
> >       else
> > -             cancel_delayed_work_sync(&tz->poll_queue);
> > +             cancel_delayed_work(&tz->poll_queue);
> >  }
> >
> >  static void monitor_thermal_zone(struct thermal_zone_device *tz)
> > @@ -1414,7 +1414,7 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
> >
> >       mutex_unlock(&thermal_list_lock);
> >
> > -     thermal_zone_device_set_polling(tz, 0);
> > +     cancel_delayed_work_sync(&tz->poll_queue);
> >
> >       thermal_set_governor(tz, NULL);
> >
> >
>
>
> --
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

      reply	other threads:[~2019-12-03 17:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12 20:31 [PATCH] Fix deadlock in thermal thermal_zone_device_check Wei Wang
2019-11-12 20:42 ` [PATCH v2] thermal: " Wei Wang
2019-12-03 17:15   ` Daniel Lezcano
2019-12-03 17:55     ` Wei Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMFybE4aSo7AK20T0Hkjo-R-BkjaYtqz-FMF8pS_p_GeLg5tJA@mail.gmail.com \
    --to=wei.vince.wang@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).