linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 ] PM-runtime: fix deadlock with ktime
@ 2019-01-30 11:16 Vincent Guittot
  2019-01-30 12:14 ` Ulf Hansson
  2019-01-30 13:06 ` Rafael J. Wysocki
  0 siblings, 2 replies; 17+ messages in thread
From: Vincent Guittot @ 2019-01-30 11:16 UTC (permalink / raw)
  To: linux-pm, linux-kernel, linux-arm-kernel, linux-omap, rjw,
	ulf.hansson, biju.das
  Cc: geert, linux-renesas-soc, Vincent Guittot

A deadlock has been seen when swicthing clocksources which use PM runtime.
The call path is:
change_clocksource
    ...
    write_seqcount_begin
    ...
    timekeeping_update
        ...
        sh_cmt_clocksource_enable
            ...
            rpm_resume
                pm_runtime_mark_last_busy
                    ktime_get
                        do
                            read_seqcount_begin
                        while read_seqcount_retry
    ....
    write_seqcount_end

Although we should be safe because we haven't yet changed the clocksource
at that time, we can't because of seqcount protection.

Use ktime_get_mono_fast_ns() instead which is lock safe for such case

With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
monotonic across an update and as a result can goes backward. According to
update_fast_timekeeper() description: "In the worst case, this can
result is a slightly wrong timestamp (a few nanoseconds)". For
PM runtime autosuspend, this means only that the suspend decision can
be slightly sub optimal.

Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

- v2: Updated commit message to explain the impact of using 
  ktime_get_mono_fast_ns()

 drivers/base/power/runtime.c | 10 +++++-----
 include/linux/pm_runtime.h   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 457be03..708a13f 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
 {
 	int autosuspend_delay;
 	u64 last_busy, expires = 0;
-	u64 now = ktime_to_ns(ktime_get());
+	u64 now = ktime_get_mono_fast_ns();
 
 	if (!dev->power.use_autosuspend)
 		goto out;
@@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 	 * If 'expires' is after the current time, we've been called
 	 * too early.
 	 */
-	if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
+	if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
 		dev->power.timer_expires = 0;
 		rpm_suspend(dev, dev->power.timer_autosuspends ?
 		    (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
@@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 int pm_schedule_suspend(struct device *dev, unsigned int delay)
 {
 	unsigned long flags;
-	ktime_t expires;
+	u64 expires;
 	int retval;
 
 	spin_lock_irqsave(&dev->power.lock, flags);
@@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
 	/* Other scheduled or pending requests need to be canceled. */
 	pm_runtime_cancel_pending(dev);
 
-	expires = ktime_add(ktime_get(), ms_to_ktime(delay));
-	dev->power.timer_expires = ktime_to_ns(expires);
+	expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
+	dev->power.timer_expires = expires;
 	dev->power.timer_autosuspends = 0;
 	hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
 
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 54af4ee..fed5be7 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
 
 static inline void pm_runtime_mark_last_busy(struct device *dev)
 {
-	WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
+	WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
 }
 
 static inline bool pm_runtime_is_irq_safe(struct device *dev)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-01-30 11:16 [PATCH v2 ] PM-runtime: fix deadlock with ktime Vincent Guittot
@ 2019-01-30 12:14 ` Ulf Hansson
  2019-02-05 13:24   ` Geert Uytterhoeven
  2019-01-30 13:06 ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: Ulf Hansson @ 2019-01-30 12:14 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Linux PM, Linux Kernel Mailing List, Linux ARM, linux-omap,
	Rafael J. Wysocki, Biju Das, Geert Uytterhoeven, Linux-Renesas

On Wed, 30 Jan 2019 at 12:16, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> A deadlock has been seen when swicthing clocksources which use PM runtime.
> The call path is:
> change_clocksource
>     ...
>     write_seqcount_begin
>     ...
>     timekeeping_update
>         ...
>         sh_cmt_clocksource_enable
>             ...
>             rpm_resume

Hmm, isn't this already a problem in genpd then?

In genpd's ->runtime_resume() callback (genpd_runtime_resume()) we
call ktime_get() to measure the latency of running the callback.

Or, perhaps irq_safe_dev_in_no_sleep_domain() returns true for the
genpd + device in question, which means the ktime thingy is skipped.

Geert?


>                 pm_runtime_mark_last_busy
>                     ktime_get
>                         do
>                             read_seqcount_begin
>                         while read_seqcount_retry
>     ....
>     write_seqcount_end
>
> Although we should be safe because we haven't yet changed the clocksource
> at that time, we can't because of seqcount protection.
>
> Use ktime_get_mono_fast_ns() instead which is lock safe for such case
>
> With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> monotonic across an update and as a result can goes backward. According to
> update_fast_timekeeper() description: "In the worst case, this can
> result is a slightly wrong timestamp (a few nanoseconds)". For
> PM runtime autosuspend, this means only that the suspend decision can
> be slightly sub optimal.
>
> Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> Reported-by: Biju Das <biju.das@bp.renesas.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe

> ---
>
> - v2: Updated commit message to explain the impact of using
>   ktime_get_mono_fast_ns()
>
>  drivers/base/power/runtime.c | 10 +++++-----
>  include/linux/pm_runtime.h   |  2 +-
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 457be03..708a13f 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
>  {
>         int autosuspend_delay;
>         u64 last_busy, expires = 0;
> -       u64 now = ktime_to_ns(ktime_get());
> +       u64 now = ktime_get_mono_fast_ns();
>
>         if (!dev->power.use_autosuspend)
>                 goto out;
> @@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
>          * If 'expires' is after the current time, we've been called
>          * too early.
>          */
> -       if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> +       if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
>                 dev->power.timer_expires = 0;
>                 rpm_suspend(dev, dev->power.timer_autosuspends ?
>                     (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
> @@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
>  int pm_schedule_suspend(struct device *dev, unsigned int delay)
>  {
>         unsigned long flags;
> -       ktime_t expires;
> +       u64 expires;
>         int retval;
>
>         spin_lock_irqsave(&dev->power.lock, flags);
> @@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
>         /* Other scheduled or pending requests need to be canceled. */
>         pm_runtime_cancel_pending(dev);
>
> -       expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> -       dev->power.timer_expires = ktime_to_ns(expires);
> +       expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
> +       dev->power.timer_expires = expires;
>         dev->power.timer_autosuspends = 0;
>         hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
>
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index 54af4ee..fed5be7 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
>
>  static inline void pm_runtime_mark_last_busy(struct device *dev)
>  {
> -       WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
> +       WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
>  }
>
>  static inline bool pm_runtime_is_irq_safe(struct device *dev)
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-01-30 11:16 [PATCH v2 ] PM-runtime: fix deadlock with ktime Vincent Guittot
  2019-01-30 12:14 ` Ulf Hansson
@ 2019-01-30 13:06 ` Rafael J. Wysocki
  2019-01-30 13:18   ` Vincent Guittot
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2019-01-30 13:06 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Linux PM, Linux Kernel Mailing List, Linux ARM,
	Linux OMAP Mailing List, Rafael J. Wysocki, Ulf Hansson,
	Biju Das, Geert Uytterhoeven, Linux-Renesas, Ladislav Michl

On Wed, Jan 30, 2019 at 12:16 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> A deadlock has been seen when swicthing clocksources which use PM runtime.
> The call path is:
> change_clocksource
>     ...
>     write_seqcount_begin
>     ...
>     timekeeping_update
>         ...
>         sh_cmt_clocksource_enable
>             ...
>             rpm_resume
>                 pm_runtime_mark_last_busy
>                     ktime_get
>                         do
>                             read_seqcount_begin
>                         while read_seqcount_retry
>     ....
>     write_seqcount_end
>
> Although we should be safe because we haven't yet changed the clocksource
> at that time, we can't because of seqcount protection.
>
> Use ktime_get_mono_fast_ns() instead which is lock safe for such case
>
> With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> monotonic across an update and as a result can goes backward. According to
> update_fast_timekeeper() description: "In the worst case, this can
> result is a slightly wrong timestamp (a few nanoseconds)". For
> PM runtime autosuspend, this means only that the suspend decision can
> be slightly sub optimal.
>
> Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> Reported-by: Biju Das <biju.das@bp.renesas.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

I've queued this one up as a fix for 5.0, but unfortunately it clashes
with the patch from Ladislav Michl at
https://patchwork.kernel.org/patch/10755477/ which has been dropped
now.

Can you or Ladislav please rebase that patch on top of this one and repost?

> ---
>
> - v2: Updated commit message to explain the impact of using
>   ktime_get_mono_fast_ns()
>
>  drivers/base/power/runtime.c | 10 +++++-----
>  include/linux/pm_runtime.h   |  2 +-
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 457be03..708a13f 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
>  {
>         int autosuspend_delay;
>         u64 last_busy, expires = 0;
> -       u64 now = ktime_to_ns(ktime_get());
> +       u64 now = ktime_get_mono_fast_ns();
>
>         if (!dev->power.use_autosuspend)
>                 goto out;
> @@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
>          * If 'expires' is after the current time, we've been called
>          * too early.
>          */
> -       if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> +       if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
>                 dev->power.timer_expires = 0;
>                 rpm_suspend(dev, dev->power.timer_autosuspends ?
>                     (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
> @@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
>  int pm_schedule_suspend(struct device *dev, unsigned int delay)
>  {
>         unsigned long flags;
> -       ktime_t expires;
> +       u64 expires;
>         int retval;
>
>         spin_lock_irqsave(&dev->power.lock, flags);
> @@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
>         /* Other scheduled or pending requests need to be canceled. */
>         pm_runtime_cancel_pending(dev);
>
> -       expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> -       dev->power.timer_expires = ktime_to_ns(expires);
> +       expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
> +       dev->power.timer_expires = expires;
>         dev->power.timer_autosuspends = 0;
>         hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
>
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index 54af4ee..fed5be7 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
>
>  static inline void pm_runtime_mark_last_busy(struct device *dev)
>  {
> -       WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
> +       WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
>  }
>
>  static inline bool pm_runtime_is_irq_safe(struct device *dev)
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-01-30 13:06 ` Rafael J. Wysocki
@ 2019-01-30 13:18   ` Vincent Guittot
  2019-01-30 19:39     ` Ladislav Michl
  2019-01-30 17:26   ` [PATCH v3] " Vincent Guittot
  2019-01-30 21:32   ` [PATCH v2 ] " Ladislav Michl
  2 siblings, 1 reply; 17+ messages in thread
From: Vincent Guittot @ 2019-01-30 13:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Linux Kernel Mailing List, Linux ARM,
	Linux OMAP Mailing List, Rafael J. Wysocki, Ulf Hansson,
	Biju Das, Geert Uytterhoeven, Linux-Renesas, Ladislav Michl

On Wed, 30 Jan 2019 at 14:06, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jan 30, 2019 at 12:16 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > The call path is:
> > change_clocksource
> >     ...
> >     write_seqcount_begin
> >     ...
> >     timekeeping_update
> >         ...
> >         sh_cmt_clocksource_enable
> >             ...
> >             rpm_resume
> >                 pm_runtime_mark_last_busy
> >                     ktime_get
> >                         do
> >                             read_seqcount_begin
> >                         while read_seqcount_retry
> >     ....
> >     write_seqcount_end
> >
> > Although we should be safe because we haven't yet changed the clocksource
> > at that time, we can't because of seqcount protection.
> >
> > Use ktime_get_mono_fast_ns() instead which is lock safe for such case
> >
> > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > monotonic across an update and as a result can goes backward. According to
> > update_fast_timekeeper() description: "In the worst case, this can
> > result is a slightly wrong timestamp (a few nanoseconds)". For
> > PM runtime autosuspend, this means only that the suspend decision can
> > be slightly sub optimal.
> >
> > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>
> I've queued this one up as a fix for 5.0, but unfortunately it clashes
> with the patch from Ladislav Michl at
> https://patchwork.kernel.org/patch/10755477/ which has been dropped
> now.

Thanks for adding Ladislav in this thread.
I'm sorry I forgot to add him in the loop.

>
> Can you or Ladislav please rebase that patch on top of this one and repost?

Ladislav,

Let me know if you prefer to rebase and repost your patch of if you
want me to do.

Regards,
Vincent

>
> > ---
> >
> > - v2: Updated commit message to explain the impact of using
> >   ktime_get_mono_fast_ns()
> >
> >  drivers/base/power/runtime.c | 10 +++++-----
> >  include/linux/pm_runtime.h   |  2 +-
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 457be03..708a13f 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
> >  {
> >         int autosuspend_delay;
> >         u64 last_busy, expires = 0;
> > -       u64 now = ktime_to_ns(ktime_get());
> > +       u64 now = ktime_get_mono_fast_ns();
> >
> >         if (!dev->power.use_autosuspend)
> >                 goto out;
> > @@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
> >          * If 'expires' is after the current time, we've been called
> >          * too early.
> >          */
> > -       if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> > +       if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
> >                 dev->power.timer_expires = 0;
> >                 rpm_suspend(dev, dev->power.timer_autosuspends ?
> >                     (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
> > @@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
> >  int pm_schedule_suspend(struct device *dev, unsigned int delay)
> >  {
> >         unsigned long flags;
> > -       ktime_t expires;
> > +       u64 expires;
> >         int retval;
> >
> >         spin_lock_irqsave(&dev->power.lock, flags);
> > @@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
> >         /* Other scheduled or pending requests need to be canceled. */
> >         pm_runtime_cancel_pending(dev);
> >
> > -       expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> > -       dev->power.timer_expires = ktime_to_ns(expires);
> > +       expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
> > +       dev->power.timer_expires = expires;
> >         dev->power.timer_autosuspends = 0;
> >         hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
> >
> > diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> > index 54af4ee..fed5be7 100644
> > --- a/include/linux/pm_runtime.h
> > +++ b/include/linux/pm_runtime.h
> > @@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
> >
> >  static inline void pm_runtime_mark_last_busy(struct device *dev)
> >  {
> > -       WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
> > +       WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
> >  }
> >
> >  static inline bool pm_runtime_is_irq_safe(struct device *dev)
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-01-30 13:06 ` Rafael J. Wysocki
  2019-01-30 13:18   ` Vincent Guittot
@ 2019-01-30 17:26   ` Vincent Guittot
  2019-01-30 21:53     ` Rafael J. Wysocki
  2019-01-30 21:32   ` [PATCH v2 ] " Ladislav Michl
  2 siblings, 1 reply; 17+ messages in thread
From: Vincent Guittot @ 2019-01-30 17:26 UTC (permalink / raw)
  To: linux-pm, linux-kernel, linux-arm-kernel, linux-omap, rjw,
	ulf.hansson, biju.das
  Cc: geert, linux-renesas-soc, Vincent Guittot

A deadlock has been seen when swicthing clocksources which use PM runtime.
The call path is:
change_clocksource
    ...
    write_seqcount_begin
    ...
    timekeeping_update
        ...
        sh_cmt_clocksource_enable
            ...
            rpm_resume
                pm_runtime_mark_last_busy
                    ktime_get
                        do
                            read_seqcount_begin
                        while read_seqcount_retry
    ....
    write_seqcount_end

Although we should be safe because we haven't yet changed the clocksource
at that time, we can't because of seqcount protection.

Use ktime_get_mono_fast_ns() instead which is lock safe for such case

With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
monotonic across an update and as a result can goes backward. According to
update_fast_timekeeper() description: "In the worst case, this can
result is a slightly wrong timestamp (a few nanoseconds)". For
PM runtime autosuspend, this means only that the suspend decision can
be slightly sub optimal.

Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

Hi Rafael,

Sorry, I sent the version with the typo mistake that generated the compilation error
reported by kbuild-test-robot

This version doesn't have the typo.

Regards,
Vincent

- v3: fix typo error that appears while updating commit message
- v2: Updated commit message to explain the impact of using 
  ktime_get_mono_fast_ns()

 drivers/base/power/runtime.c | 10 +++++-----
 include/linux/pm_runtime.h   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 457be03..0ea2139 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
 {
 	int autosuspend_delay;
 	u64 last_busy, expires = 0;
-	u64 now = ktime_to_ns(ktime_get());
+	u64 now = ktime_get_mono_fast_ns();
 
 	if (!dev->power.use_autosuspend)
 		goto out;
@@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 	 * If 'expires' is after the current time, we've been called
 	 * too early.
 	 */
-	if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
+	if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
 		dev->power.timer_expires = 0;
 		rpm_suspend(dev, dev->power.timer_autosuspends ?
 		    (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
@@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 int pm_schedule_suspend(struct device *dev, unsigned int delay)
 {
 	unsigned long flags;
-	ktime_t expires;
+	u64 expires;
 	int retval;
 
 	spin_lock_irqsave(&dev->power.lock, flags);
@@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
 	/* Other scheduled or pending requests need to be canceled. */
 	pm_runtime_cancel_pending(dev);
 
-	expires = ktime_add(ktime_get(), ms_to_ktime(delay));
-	dev->power.timer_expires = ktime_to_ns(expires);
+	expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC;
+	dev->power.timer_expires = expires;
 	dev->power.timer_autosuspends = 0;
 	hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
 
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 54af4ee..fed5be7 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
 
 static inline void pm_runtime_mark_last_busy(struct device *dev)
 {
-	WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
+	WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
 }
 
 static inline bool pm_runtime_is_irq_safe(struct device *dev)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-01-30 13:18   ` Vincent Guittot
@ 2019-01-30 19:39     ` Ladislav Michl
  0 siblings, 0 replies; 17+ messages in thread
From: Ladislav Michl @ 2019-01-30 19:39 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Biju Das, Geert Uytterhoeven, Linux-Renesas

On Wed, Jan 30, 2019 at 02:18:49PM +0100, Vincent Guittot wrote:
> On Wed, 30 Jan 2019 at 14:06, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Jan 30, 2019 at 12:16 PM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > > The call path is:
> > > change_clocksource
> > >     ...
> > >     write_seqcount_begin
> > >     ...
> > >     timekeeping_update
> > >         ...
> > >         sh_cmt_clocksource_enable
> > >             ...
> > >             rpm_resume
> > >                 pm_runtime_mark_last_busy
> > >                     ktime_get
> > >                         do
> > >                             read_seqcount_begin
> > >                         while read_seqcount_retry
> > >     ....
> > >     write_seqcount_end
> > >
> > > Although we should be safe because we haven't yet changed the clocksource
> > > at that time, we can't because of seqcount protection.
> > >
> > > Use ktime_get_mono_fast_ns() instead which is lock safe for such case
> > >
> > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > > monotonic across an update and as a result can goes backward. According to
> > > update_fast_timekeeper() description: "In the worst case, this can
> > > result is a slightly wrong timestamp (a few nanoseconds)". For
> > > PM runtime autosuspend, this means only that the suspend decision can
> > > be slightly sub optimal.
> > >
> > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >
> > I've queued this one up as a fix for 5.0, but unfortunately it clashes
> > with the patch from Ladislav Michl at
> > https://patchwork.kernel.org/patch/10755477/ which has been dropped
> > now.
> 
> Thanks for adding Ladislav in this thread.
> I'm sorry I forgot to add him in the loop.
> 
> >
> > Can you or Ladislav please rebase that patch on top of this one and repost?
> 
> Ladislav,
> 
> Let me know if you prefer to rebase and repost your patch of if you
> want me to do.

I'll rebase it on top of Rafael's bleeding-edge branch.

Best regards,
	ladis

> Regards,
> Vincent
> 
> >
> > > ---
> > >
> > > - v2: Updated commit message to explain the impact of using
> > >   ktime_get_mono_fast_ns()
> > >
> > >  drivers/base/power/runtime.c | 10 +++++-----
> > >  include/linux/pm_runtime.h   |  2 +-
> > >  2 files changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > > index 457be03..708a13f 100644
> > > --- a/drivers/base/power/runtime.c
> > > +++ b/drivers/base/power/runtime.c
> > > @@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
> > >  {
> > >         int autosuspend_delay;
> > >         u64 last_busy, expires = 0;
> > > -       u64 now = ktime_to_ns(ktime_get());
> > > +       u64 now = ktime_get_mono_fast_ns();
> > >
> > >         if (!dev->power.use_autosuspend)
> > >                 goto out;
> > > @@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
> > >          * If 'expires' is after the current time, we've been called
> > >          * too early.
> > >          */
> > > -       if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> > > +       if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
> > >                 dev->power.timer_expires = 0;
> > >                 rpm_suspend(dev, dev->power.timer_autosuspends ?
> > >                     (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
> > > @@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
> > >  int pm_schedule_suspend(struct device *dev, unsigned int delay)
> > >  {
> > >         unsigned long flags;
> > > -       ktime_t expires;
> > > +       u64 expires;
> > >         int retval;
> > >
> > >         spin_lock_irqsave(&dev->power.lock, flags);
> > > @@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
> > >         /* Other scheduled or pending requests need to be canceled. */
> > >         pm_runtime_cancel_pending(dev);
> > >
> > > -       expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> > > -       dev->power.timer_expires = ktime_to_ns(expires);
> > > +       expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
> > > +       dev->power.timer_expires = expires;
> > >         dev->power.timer_autosuspends = 0;
> > >         hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
> > >
> > > diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> > > index 54af4ee..fed5be7 100644
> > > --- a/include/linux/pm_runtime.h
> > > +++ b/include/linux/pm_runtime.h
> > > @@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
> > >
> > >  static inline void pm_runtime_mark_last_busy(struct device *dev)
> > >  {
> > > -       WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
> > > +       WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
> > >  }
> > >
> > >  static inline bool pm_runtime_is_irq_safe(struct device *dev)
> > > --
> > > 2.7.4
> > >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-01-30 13:06 ` Rafael J. Wysocki
  2019-01-30 13:18   ` Vincent Guittot
  2019-01-30 17:26   ` [PATCH v3] " Vincent Guittot
@ 2019-01-30 21:32   ` Ladislav Michl
  2 siblings, 0 replies; 17+ messages in thread
From: Ladislav Michl @ 2019-01-30 21:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vincent Guittot, Linux PM, Linux Kernel Mailing List, Linux ARM,
	Linux OMAP Mailing List, Rafael J. Wysocki, Ulf Hansson,
	Biju Das, Geert Uytterhoeven, Linux-Renesas

On Wed, Jan 30, 2019 at 02:06:07PM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 30, 2019 at 12:16 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > The call path is:
> > change_clocksource
> >     ...
> >     write_seqcount_begin
> >     ...
> >     timekeeping_update
> >         ...
> >         sh_cmt_clocksource_enable
> >             ...
> >             rpm_resume
> >                 pm_runtime_mark_last_busy
> >                     ktime_get
> >                         do
> >                             read_seqcount_begin
> >                         while read_seqcount_retry
> >     ....
> >     write_seqcount_end
> >
> > Although we should be safe because we haven't yet changed the clocksource
> > at that time, we can't because of seqcount protection.
> >
> > Use ktime_get_mono_fast_ns() instead which is lock safe for such case
> >
> > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > monotonic across an update and as a result can goes backward. According to
> > update_fast_timekeeper() description: "In the worst case, this can
> > result is a slightly wrong timestamp (a few nanoseconds)". For
> > PM runtime autosuspend, this means only that the suspend decision can
> > be slightly sub optimal.
> >
> > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> 
> I've queued this one up as a fix for 5.0, but unfortunately it clashes
> with the patch from Ladislav Michl at
> https://patchwork.kernel.org/patch/10755477/ which has been dropped
> now.
> 
> Can you or Ladislav please rebase that patch on top of this one and repost?
> 
> > ---
> >
> > - v2: Updated commit message to explain the impact of using
> >   ktime_get_mono_fast_ns()
> >
> >  drivers/base/power/runtime.c | 10 +++++-----
> >  include/linux/pm_runtime.h   |  2 +-
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 457be03..708a13f 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
> >  {
> >         int autosuspend_delay;
> >         u64 last_busy, expires = 0;
> > -       u64 now = ktime_to_ns(ktime_get());
> > +       u64 now = ktime_get_mono_fast_ns();
> >
> >         if (!dev->power.use_autosuspend)
> >                 goto out;
> > @@ -909,7 +909,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
> >          * If 'expires' is after the current time, we've been called
> >          * too early.
> >          */
> > -       if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> > +       if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
> >                 dev->power.timer_expires = 0;
> >                 rpm_suspend(dev, dev->power.timer_autosuspends ?
> >                     (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
> > @@ -928,7 +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
> >  int pm_schedule_suspend(struct device *dev, unsigned int delay)
> >  {
> >         unsigned long flags;
> > -       ktime_t expires;
> > +       u64 expires;
> >         int retval;
> >
> >         spin_lock_irqsave(&dev->power.lock, flags);
> > @@ -945,8 +945,8 @@ int pm_schedule_suspend(struct device *dev, unsigned int delay)
> >         /* Other scheduled or pending requests need to be canceled. */
> >         pm_runtime_cancel_pending(dev);
> >
> > -       expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> > -       dev->power.timer_expires = ktime_to_ns(expires);
> > +       expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);

Just FYI, this one does not compile                                        ^
I'm going to send updated optimization patch anyway as fixing this one will not
break it :)

> > +       dev->power.timer_expires = expires;
> >         dev->power.timer_autosuspends = 0;
> >         hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
> >
> > diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> > index 54af4ee..fed5be7 100644
> > --- a/include/linux/pm_runtime.h
> > +++ b/include/linux/pm_runtime.h
> > @@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct device *dev)
> >
> >  static inline void pm_runtime_mark_last_busy(struct device *dev)
> >  {
> > -       WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
> > +       WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
> >  }
> >
> >  static inline bool pm_runtime_is_irq_safe(struct device *dev)
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-01-30 17:26   ` [PATCH v3] " Vincent Guittot
@ 2019-01-30 21:53     ` Rafael J. Wysocki
  2019-02-01 15:02       ` Biju Das
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2019-01-30 21:53 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Linux PM, Linux Kernel Mailing List, Linux ARM,
	Linux OMAP Mailing List, Rafael J. Wysocki, Ulf Hansson,
	Biju Das, Geert Uytterhoeven, Linux-Renesas

On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> A deadlock has been seen when swicthing clocksources which use PM runtime.
> The call path is:
> change_clocksource
>     ...
>     write_seqcount_begin
>     ...
>     timekeeping_update
>         ...
>         sh_cmt_clocksource_enable
>             ...
>             rpm_resume
>                 pm_runtime_mark_last_busy
>                     ktime_get
>                         do
>                             read_seqcount_begin
>                         while read_seqcount_retry
>     ....
>     write_seqcount_end
>
> Although we should be safe because we haven't yet changed the clocksource
> at that time, we can't because of seqcount protection.
>
> Use ktime_get_mono_fast_ns() instead which is lock safe for such case
>
> With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> monotonic across an update and as a result can goes backward. According to
> update_fast_timekeeper() description: "In the worst case, this can
> result is a slightly wrong timestamp (a few nanoseconds)". For
> PM runtime autosuspend, this means only that the suspend decision can
> be slightly sub optimal.
>
> Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> Reported-by: Biju Das <biju.das@bp.renesas.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>
> Hi Rafael,
>
> Sorry, I sent the version with the typo mistake that generated the compilation error
> reported by kbuild-test-robot
>
> This version doesn't have the typo.

OK, I've applied this one, thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-01-30 21:53     ` Rafael J. Wysocki
@ 2019-02-01 15:02       ` Biju Das
  2019-02-01 15:28         ` Vincent Guittot
  0 siblings, 1 reply; 17+ messages in thread
From: Biju Das @ 2019-02-01 15:02 UTC (permalink / raw)
  To: Rafael J. Wysocki, Vincent Guittot
  Cc: Linux PM, Linux Kernel Mailing List, Linux ARM,
	Linux OMAP Mailing List, Rafael J. Wysocki, Ulf Hansson,
	Geert Uytterhoeven, Linux-Renesas

Hi Vincent,

I have rebased my kernel to "next-20190201".  Still I am seeing dead lock.

Am I missing any patch?

root@ek874:/# echo e61e0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
[  193.869423]
[  193.870963] ============================================
[  193.876292] WARNING: possible recursive locking detected
[  193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not tainted
[  193.887737] --------------------------------------------
[  193.893066] migration/0/11 is trying to acquire lock:
[  193.898136] (____ptrval____) (tk_core.seq){----}, at: update_pm_runtime_accounting+0x14/0x68
[  193.906632]
[  193.906632] but task is already holding lock:
[  193.912483] (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
[  193.919828]
[  193.919828] other info that might help us debug this:
[  193.926377]  Possible unsafe locking scenario:
[  193.926377]
[  193.932314]        CPU0
[  193.934765]        ----
[  193.937216]   lock(tk_core.seq);
[  193.940453]   lock(tk_core.seq);
[  193.943691]
[  193.943691]  *** DEADLOCK ***
[  193.943691]
[  193.949634]  May be due to missing lock nesting notation
[  193.949634]
[  193.956446] 3 locks held by migration/0/11:
[  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at: change_clocksource+0x2c/0x118
[  193.969125]  #1: (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
[  193.976903]  #2: (____ptrval____) (&(&dev->power.lock)->rlock){....}, at: __pm_runtime_resume+0x40/0x98
[  193.986339]
[  193.986339] stack backtrace:
[  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.0.0-rc4-next-20190201-00007-g731346f #3
[  193.999707] Hardware name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 + CAT875) (DT)
[  194.008089] Call trace:
[  194.010553]  dump_backtrace+0x0/0x178
[  194.014227]  show_stack+0x14/0x20
[  194.017562]  dump_stack+0xb0/0xec
[  194.020895]  __lock_acquire+0xfb4/0x1c08
[  194.024832]  lock_acquire+0xd0/0x268
[  194.028420]  ktime_get+0x5c/0x108
[  194.031747]  update_pm_runtime_accounting+0x14/0x68
[  194.036643]  rpm_resume+0x4ec/0x698
[  194.040144]  __pm_runtime_resume+0x50/0x98
[  194.044264]  sh_tmu_enable.part.1+0x24/0x50
[  194.048462]  sh_tmu_clocksource_enable+0x48/0x70
[  194.053097]  change_clocksource+0x84/0x118
[  194.057208]  multi_cpu_stop+0x8c/0x140
[  194.060970]  cpu_stopper_thread+0xac/0x120
[  194.065087]  smpboot_thread_fn+0x1ac/0x2c8
[  194.069198]  kthread+0x128/0x130
[  194.072439]  ret_from_fork+0x10/0x18


Regards,
Biju

> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: 30 January 2019 21:53
> To: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>; Linux ARM <linux-arm-
> kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> Hansson <ulf.hansson@linaro.org>; Biju Das <biju.das@bp.renesas.com>;
> Geert Uytterhoeven <geert@linux-m68k.org>; Linux-Renesas <linux-
> renesas-soc@vger.kernel.org>
> Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
>
> On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > A deadlock has been seen when swicthing clocksources which use PM
> runtime.
> > The call path is:
> > change_clocksource
> >     ...
> >     write_seqcount_begin
> >     ...
> >     timekeeping_update
> >         ...
> >         sh_cmt_clocksource_enable
> >             ...
> >             rpm_resume
> >                 pm_runtime_mark_last_busy
> >                     ktime_get
> >                         do
> >                             read_seqcount_begin
> >                         while read_seqcount_retry
> >     ....
> >     write_seqcount_end
> >
> > Although we should be safe because we haven't yet changed the
> > clocksource at that time, we can't because of seqcount protection.
> >
> > Use ktime_get_mono_fast_ns() instead which is lock safe for such case
> >
> > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > monotonic across an update and as a result can goes backward.
> > According to
> > update_fast_timekeeper() description: "In the worst case, this can
> > result is a slightly wrong timestamp (a few nanoseconds)". For PM
> > runtime autosuspend, this means only that the suspend decision can be
> > slightly sub optimal.
> >
> > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
> > hrtimers")
> > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >
> > Hi Rafael,
> >
> > Sorry, I sent the version with the typo mistake that generated the
> > compilation error reported by kbuild-test-robot
> >
> > This version doesn't have the typo.
>
> OK, I've applied this one, thanks!



Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-02-01 15:02       ` Biju Das
@ 2019-02-01 15:28         ` Vincent Guittot
  2019-02-01 15:44           ` Biju Das
  2019-02-01 15:45           ` Vincent Guittot
  0 siblings, 2 replies; 17+ messages in thread
From: Vincent Guittot @ 2019-02-01 15:28 UTC (permalink / raw)
  To: Biju Das
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Geert Uytterhoeven, Linux-Renesas

On Fri, 1 Feb 2019 at 16:02, Biju Das <biju.das@bp.renesas.com> wrote:
>
> Hi Vincent,
>
> I have rebased my kernel to "next-20190201".  Still I am seeing dead lock.
>
> Am I missing any patch?

No you don't miss anything.
I think that it's the opposite.

Modification in time accounting in PM runtime has been queued but it
has not moved (yet) to ktime_get_mono_fast_ns()

Can you try to revert c669560be6c8 ("PM-runtime: Replace jiffies-based
accounting with ktime-based accounting") ?

>
> root@ek874:/# echo e61e0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
> [  193.869423]
> [  193.870963] ============================================
> [  193.876292] WARNING: possible recursive locking detected
> [  193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not tainted
> [  193.887737] --------------------------------------------
> [  193.893066] migration/0/11 is trying to acquire lock:
> [  193.898136] (____ptrval____) (tk_core.seq){----}, at: update_pm_runtime_accounting+0x14/0x68
> [  193.906632]
> [  193.906632] but task is already holding lock:
> [  193.912483] (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
> [  193.919828]
> [  193.919828] other info that might help us debug this:
> [  193.926377]  Possible unsafe locking scenario:
> [  193.926377]
> [  193.932314]        CPU0
> [  193.934765]        ----
> [  193.937216]   lock(tk_core.seq);
> [  193.940453]   lock(tk_core.seq);
> [  193.943691]
> [  193.943691]  *** DEADLOCK ***
> [  193.943691]
> [  193.949634]  May be due to missing lock nesting notation
> [  193.949634]
> [  193.956446] 3 locks held by migration/0/11:
> [  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at: change_clocksource+0x2c/0x118
> [  193.969125]  #1: (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
> [  193.976903]  #2: (____ptrval____) (&(&dev->power.lock)->rlock){....}, at: __pm_runtime_resume+0x40/0x98
> [  193.986339]
> [  193.986339] stack backtrace:
> [  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.0.0-rc4-next-20190201-00007-g731346f #3
> [  193.999707] Hardware name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 + CAT875) (DT)
> [  194.008089] Call trace:
> [  194.010553]  dump_backtrace+0x0/0x178
> [  194.014227]  show_stack+0x14/0x20
> [  194.017562]  dump_stack+0xb0/0xec
> [  194.020895]  __lock_acquire+0xfb4/0x1c08
> [  194.024832]  lock_acquire+0xd0/0x268
> [  194.028420]  ktime_get+0x5c/0x108
> [  194.031747]  update_pm_runtime_accounting+0x14/0x68
> [  194.036643]  rpm_resume+0x4ec/0x698
> [  194.040144]  __pm_runtime_resume+0x50/0x98
> [  194.044264]  sh_tmu_enable.part.1+0x24/0x50
> [  194.048462]  sh_tmu_clocksource_enable+0x48/0x70
> [  194.053097]  change_clocksource+0x84/0x118
> [  194.057208]  multi_cpu_stop+0x8c/0x140
> [  194.060970]  cpu_stopper_thread+0xac/0x120
> [  194.065087]  smpboot_thread_fn+0x1ac/0x2c8
> [  194.069198]  kthread+0x128/0x130
> [  194.072439]  ret_from_fork+0x10/0x18
>
>
> Regards,
> Biju
>
> > -----Original Message-----
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: 30 January 2019 21:53
> > To: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> > kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > Hansson <ulf.hansson@linaro.org>; Biju Das <biju.das@bp.renesas.com>;
> > Geert Uytterhoeven <geert@linux-m68k.org>; Linux-Renesas <linux-
> > renesas-soc@vger.kernel.org>
> > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> >
> > On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > A deadlock has been seen when swicthing clocksources which use PM
> > runtime.
> > > The call path is:
> > > change_clocksource
> > >     ...
> > >     write_seqcount_begin
> > >     ...
> > >     timekeeping_update
> > >         ...
> > >         sh_cmt_clocksource_enable
> > >             ...
> > >             rpm_resume
> > >                 pm_runtime_mark_last_busy
> > >                     ktime_get
> > >                         do
> > >                             read_seqcount_begin
> > >                         while read_seqcount_retry
> > >     ....
> > >     write_seqcount_end
> > >
> > > Although we should be safe because we haven't yet changed the
> > > clocksource at that time, we can't because of seqcount protection.
> > >
> > > Use ktime_get_mono_fast_ns() instead which is lock safe for such case
> > >
> > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > > monotonic across an update and as a result can goes backward.
> > > According to
> > > update_fast_timekeeper() description: "In the worst case, this can
> > > result is a slightly wrong timestamp (a few nanoseconds)". For PM
> > > runtime autosuspend, this means only that the suspend decision can be
> > > slightly sub optimal.
> > >
> > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
> > > hrtimers")
> > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > ---
> > >
> > > Hi Rafael,
> > >
> > > Sorry, I sent the version with the typo mistake that generated the
> > > compilation error reported by kbuild-test-robot
> > >
> > > This version doesn't have the typo.
> >
> > OK, I've applied this one, thanks!
>
>
>
> Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-02-01 15:28         ` Vincent Guittot
@ 2019-02-01 15:44           ` Biju Das
  2019-02-01 15:48             ` Vincent Guittot
  2019-02-01 15:45           ` Vincent Guittot
  1 sibling, 1 reply; 17+ messages in thread
From: Biju Das @ 2019-02-01 15:44 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Geert Uytterhoeven, Linux-Renesas

Hi Vincent,

Thanks for the feedback. Instead of reverting.  I just modified the patch like this and it fixed the issue.

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 4eaf166..145a182 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -66,7 +66,7 @@ static int rpm_suspend(struct device *dev, int rpmflags);
  */
 void update_pm_runtime_accounting(struct device *dev)
 {
-       u64 now = ktime_to_ns(ktime_get());
+       u64 now = ktime_get_mono_fast_ns();
        u64 delta;

regards,
Biju

> -----Original Message-----
> From: linux-renesas-soc-owner@vger.kernel.org <linux-renesas-soc-
> owner@vger.kernel.org> On Behalf Of Vincent Guittot
> Sent: 01 February 2019 15:29
> To: Biju Das <biju.das@bp.renesas.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Linux PM <linux-
> pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>; Linux ARM <linux-arm-
> kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> Hansson <ulf.hansson@linaro.org>; Geert Uytterhoeven <geert@linux-
> m68k.org>; Linux-Renesas <linux-renesas-soc@vger.kernel.org>
> Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
>
> On Fri, 1 Feb 2019 at 16:02, Biju Das <biju.das@bp.renesas.com> wrote:
> >
> > Hi Vincent,
> >
> > I have rebased my kernel to "next-20190201".  Still I am seeing dead lock.
> >
> > Am I missing any patch?
>
> No you don't miss anything.
> I think that it's the opposite.
>
> Modification in time accounting in PM runtime has been queued but it has
> not moved (yet) to ktime_get_mono_fast_ns()
>
> Can you try to revert c669560be6c8 ("PM-runtime: Replace jiffies-based
> accounting with ktime-based accounting") ?
>
> >
> > root@ek874:/# echo e61e0000.timer >
> > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > [  193.869423]
> > [  193.870963] ============================================
> > [  193.876292] WARNING: possible recursive locking detected [
> > 193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not tainted [
> > 193.887737] --------------------------------------------
> > [  193.893066] migration/0/11 is trying to acquire lock:
> > [  193.898136] (____ptrval____) (tk_core.seq){----}, at:
> > update_pm_runtime_accounting+0x14/0x68
> > [  193.906632]
> > [  193.906632] but task is already holding lock:
> > [  193.912483] (____ptrval____) (tk_core.seq){----}, at:
> > multi_cpu_stop+0x8c/0x140 [  193.919828] [  193.919828] other info
> > that might help us debug this:
> > [  193.926377]  Possible unsafe locking scenario:
> > [  193.926377]
> > [  193.932314]        CPU0
> > [  193.934765]        ----
> > [  193.937216]   lock(tk_core.seq);
> > [  193.940453]   lock(tk_core.seq);
> > [  193.943691]
> > [  193.943691]  *** DEADLOCK ***
> > [  193.943691]
> > [  193.949634]  May be due to missing lock nesting notation [
> > 193.949634] [  193.956446] 3 locks held by migration/0/11:
> > [  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at:
> > change_clocksource+0x2c/0x118 [  193.969125]  #1: (____ptrval____)
> > (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140 [  193.976903]  #2:
> > (____ptrval____) (&(&dev->power.lock)->rlock){....}, at:
> > __pm_runtime_resume+0x40/0x98 [  193.986339] [  193.986339] stack
> backtrace:
> > [  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted
> > 5.0.0-rc4-next-20190201-00007-g731346f #3 [  193.999707] Hardware
> > name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 + CAT875) (DT) [
> 194.008089] Call trace:
> > [  194.010553]  dump_backtrace+0x0/0x178 [  194.014227]
> > show_stack+0x14/0x20 [  194.017562]  dump_stack+0xb0/0xec [
> > 194.020895]  __lock_acquire+0xfb4/0x1c08 [  194.024832]
> > lock_acquire+0xd0/0x268 [  194.028420]  ktime_get+0x5c/0x108 [
> > 194.031747]  update_pm_runtime_accounting+0x14/0x68
> > [  194.036643]  rpm_resume+0x4ec/0x698 [  194.040144]
> > __pm_runtime_resume+0x50/0x98 [  194.044264]
> > sh_tmu_enable.part.1+0x24/0x50 [  194.048462]
> > sh_tmu_clocksource_enable+0x48/0x70
> > [  194.053097]  change_clocksource+0x84/0x118 [  194.057208]
> > multi_cpu_stop+0x8c/0x140 [  194.060970]
> > cpu_stopper_thread+0xac/0x120 [  194.065087]
> > smpboot_thread_fn+0x1ac/0x2c8 [  194.069198]  kthread+0x128/0x130 [
> > 194.072439]  ret_from_fork+0x10/0x18
> >
> >
> > Regards,
> > Biju
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: 30 January 2019 21:53
> > > To: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing List
> > > <linux- kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > > Hansson <ulf.hansson@linaro.org>; Biju Das
> > > <biju.das@bp.renesas.com>; Geert Uytterhoeven
> > > <geert@linux-m68k.org>; Linux-Renesas <linux-
> > > renesas-soc@vger.kernel.org>
> > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > >
> > > On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > A deadlock has been seen when swicthing clocksources which use PM
> > > runtime.
> > > > The call path is:
> > > > change_clocksource
> > > >     ...
> > > >     write_seqcount_begin
> > > >     ...
> > > >     timekeeping_update
> > > >         ...
> > > >         sh_cmt_clocksource_enable
> > > >             ...
> > > >             rpm_resume
> > > >                 pm_runtime_mark_last_busy
> > > >                     ktime_get
> > > >                         do
> > > >                             read_seqcount_begin
> > > >                         while read_seqcount_retry
> > > >     ....
> > > >     write_seqcount_end
> > > >
> > > > Although we should be safe because we haven't yet changed the
> > > > clocksource at that time, we can't because of seqcount protection.
> > > >
> > > > Use ktime_get_mono_fast_ns() instead which is lock safe for such
> > > > case
> > > >
> > > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > > > monotonic across an update and as a result can goes backward.
> > > > According to
> > > > update_fast_timekeeper() description: "In the worst case, this can
> > > > result is a slightly wrong timestamp (a few nanoseconds)". For PM
> > > > runtime autosuspend, this means only that the suspend decision can
> > > > be slightly sub optimal.
> > > >
> > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
> > > > hrtimers")
> > > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > ---
> > > >
> > > > Hi Rafael,
> > > >
> > > > Sorry, I sent the version with the typo mistake that generated the
> > > > compilation error reported by kbuild-test-robot
> > > >
> > > > This version doesn't have the typo.
> > >
> > > OK, I've applied this one, thanks!
> >
> >
> >
> > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne
> End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under
> Registered No. 04586709.



Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-02-01 15:28         ` Vincent Guittot
  2019-02-01 15:44           ` Biju Das
@ 2019-02-01 15:45           ` Vincent Guittot
  1 sibling, 0 replies; 17+ messages in thread
From: Vincent Guittot @ 2019-02-01 15:45 UTC (permalink / raw)
  To: Biju Das
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Geert Uytterhoeven, Linux-Renesas

Le Friday 01 Feb 2019 à 16:28:54 (+0100), Vincent Guittot a écrit :
> On Fri, 1 Feb 2019 at 16:02, Biju Das <biju.das@bp.renesas.com> wrote:
> >
> > Hi Vincent,
> >
> > I have rebased my kernel to "next-20190201".  Still I am seeing dead lock.
> >
> > Am I missing any patch?
> 
> No you don't miss anything.
> I think that it's the opposite.
> 
> Modification in time accounting in PM runtime has been queued but it
> has not moved (yet) to ktime_get_mono_fast_ns()
> 
> Can you try to revert c669560be6c8 ("PM-runtime: Replace jiffies-based
> accounting with ktime-based accounting") ?

Or instead you can apply :

---
 drivers/base/power/runtime.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 4eaf166..1c40e2a 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -66,7 +66,7 @@ static int rpm_suspend(struct device *dev, int rpmflags);
  */
 void update_pm_runtime_accounting(struct device *dev)
 {
-	u64 now = ktime_to_ns(ktime_get());
+	u64 now = ktime_get_mono_fast_ns();
 	u64 delta;
 
 	delta = now - dev->power.accounting_timestamp;
@@ -1315,7 +1315,7 @@ void pm_runtime_enable(struct device *dev)
 
 		/* About to enable runtime pm, set accounting_timestamp to now */
 		if (!dev->power.disable_depth)
-			dev->power.accounting_timestamp = ktime_to_ns(ktime_get());
+			dev->power.accounting_timestamp = ktime_get_mono_fast_ns();
 	} else {
 		dev_warn(dev, "Unbalanced %s!\n", __func__);
 	}
-- 
2.7.4


>
> 
> >
> > root@ek874:/# echo e61e0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > [  193.869423]
> > [  193.870963] ============================================
> > [  193.876292] WARNING: possible recursive locking detected
> > [  193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not tainted
> > [  193.887737] --------------------------------------------
> > [  193.893066] migration/0/11 is trying to acquire lock:
> > [  193.898136] (____ptrval____) (tk_core.seq){----}, at: update_pm_runtime_accounting+0x14/0x68
> > [  193.906632]
> > [  193.906632] but task is already holding lock:
> > [  193.912483] (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
> > [  193.919828]
> > [  193.919828] other info that might help us debug this:
> > [  193.926377]  Possible unsafe locking scenario:
> > [  193.926377]
> > [  193.932314]        CPU0
> > [  193.934765]        ----
> > [  193.937216]   lock(tk_core.seq);
> > [  193.940453]   lock(tk_core.seq);
> > [  193.943691]
> > [  193.943691]  *** DEADLOCK ***
> > [  193.943691]
> > [  193.949634]  May be due to missing lock nesting notation
> > [  193.949634]
> > [  193.956446] 3 locks held by migration/0/11:
> > [  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at: change_clocksource+0x2c/0x118
> > [  193.969125]  #1: (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
> > [  193.976903]  #2: (____ptrval____) (&(&dev->power.lock)->rlock){....}, at: __pm_runtime_resume+0x40/0x98
> > [  193.986339]
> > [  193.986339] stack backtrace:
> > [  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.0.0-rc4-next-20190201-00007-g731346f #3
> > [  193.999707] Hardware name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 + CAT875) (DT)
> > [  194.008089] Call trace:
> > [  194.010553]  dump_backtrace+0x0/0x178
> > [  194.014227]  show_stack+0x14/0x20
> > [  194.017562]  dump_stack+0xb0/0xec
> > [  194.020895]  __lock_acquire+0xfb4/0x1c08
> > [  194.024832]  lock_acquire+0xd0/0x268
> > [  194.028420]  ktime_get+0x5c/0x108
> > [  194.031747]  update_pm_runtime_accounting+0x14/0x68
> > [  194.036643]  rpm_resume+0x4ec/0x698
> > [  194.040144]  __pm_runtime_resume+0x50/0x98
> > [  194.044264]  sh_tmu_enable.part.1+0x24/0x50
> > [  194.048462]  sh_tmu_clocksource_enable+0x48/0x70
> > [  194.053097]  change_clocksource+0x84/0x118
> > [  194.057208]  multi_cpu_stop+0x8c/0x140
> > [  194.060970]  cpu_stopper_thread+0xac/0x120
> > [  194.065087]  smpboot_thread_fn+0x1ac/0x2c8
> > [  194.069198]  kthread+0x128/0x130
> > [  194.072439]  ret_from_fork+0x10/0x18
> >
> >
> > Regards,
> > Biju
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: 30 January 2019 21:53
> > > To: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> > > kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > > Hansson <ulf.hansson@linaro.org>; Biju Das <biju.das@bp.renesas.com>;
> > > Geert Uytterhoeven <geert@linux-m68k.org>; Linux-Renesas <linux-
> > > renesas-soc@vger.kernel.org>
> > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > >
> > > On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > A deadlock has been seen when swicthing clocksources which use PM
> > > runtime.
> > > > The call path is:
> > > > change_clocksource
> > > >     ...
> > > >     write_seqcount_begin
> > > >     ...
> > > >     timekeeping_update
> > > >         ...
> > > >         sh_cmt_clocksource_enable
> > > >             ...
> > > >             rpm_resume
> > > >                 pm_runtime_mark_last_busy
> > > >                     ktime_get
> > > >                         do
> > > >                             read_seqcount_begin
> > > >                         while read_seqcount_retry
> > > >     ....
> > > >     write_seqcount_end
> > > >
> > > > Although we should be safe because we haven't yet changed the
> > > > clocksource at that time, we can't because of seqcount protection.
> > > >
> > > > Use ktime_get_mono_fast_ns() instead which is lock safe for such case
> > > >
> > > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > > > monotonic across an update and as a result can goes backward.
> > > > According to
> > > > update_fast_timekeeper() description: "In the worst case, this can
> > > > result is a slightly wrong timestamp (a few nanoseconds)". For PM
> > > > runtime autosuspend, this means only that the suspend decision can be
> > > > slightly sub optimal.
> > > >
> > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
> > > > hrtimers")
> > > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > ---
> > > >
> > > > Hi Rafael,
> > > >
> > > > Sorry, I sent the version with the typo mistake that generated the
> > > > compilation error reported by kbuild-test-robot
> > > >
> > > > This version doesn't have the typo.
> > >
> > > OK, I've applied this one, thanks!
> >
> >
> >
> > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-02-01 15:44           ` Biju Das
@ 2019-02-01 15:48             ` Vincent Guittot
  2019-02-01 15:52               ` Vincent Guittot
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Guittot @ 2019-02-01 15:48 UTC (permalink / raw)
  To: Biju Das
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Geert Uytterhoeven, Linux-Renesas

On Fri, 1 Feb 2019 at 16:44, Biju Das <biju.das@bp.renesas.com> wrote:
>
> Hi Vincent,
>
> Thanks for the feedback. Instead of reverting.  I just modified the patch like this and it fixed the issue.
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 4eaf166..145a182 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -66,7 +66,7 @@ static int rpm_suspend(struct device *dev, int rpmflags);
>   */
>  void update_pm_runtime_accounting(struct device *dev)
>  {
> -       u64 now = ktime_to_ns(ktime_get());
> +       u64 now = ktime_get_mono_fast_ns();
>         u64 delta;

yes.
This probably also need to be changed in another place
(pm_runtime_enable) to be safe
Our email have crossed. I just sent something similar



>
> regards,
> Biju
>
> > -----Original Message-----
> > From: linux-renesas-soc-owner@vger.kernel.org <linux-renesas-soc-
> > owner@vger.kernel.org> On Behalf Of Vincent Guittot
> > Sent: 01 February 2019 15:29
> > To: Biju Das <biju.das@bp.renesas.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Linux PM <linux-
> > pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> > kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > Hansson <ulf.hansson@linaro.org>; Geert Uytterhoeven <geert@linux-
> > m68k.org>; Linux-Renesas <linux-renesas-soc@vger.kernel.org>
> > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> >
> > On Fri, 1 Feb 2019 at 16:02, Biju Das <biju.das@bp.renesas.com> wrote:
> > >
> > > Hi Vincent,
> > >
> > > I have rebased my kernel to "next-20190201".  Still I am seeing dead lock.
> > >
> > > Am I missing any patch?
> >
> > No you don't miss anything.
> > I think that it's the opposite.
> >
> > Modification in time accounting in PM runtime has been queued but it has
> > not moved (yet) to ktime_get_mono_fast_ns()
> >
> > Can you try to revert c669560be6c8 ("PM-runtime: Replace jiffies-based
> > accounting with ktime-based accounting") ?
> >
> > >
> > > root@ek874:/# echo e61e0000.timer >
> > > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > [  193.869423]
> > > [  193.870963] ============================================
> > > [  193.876292] WARNING: possible recursive locking detected [
> > > 193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not tainted [
> > > 193.887737] --------------------------------------------
> > > [  193.893066] migration/0/11 is trying to acquire lock:
> > > [  193.898136] (____ptrval____) (tk_core.seq){----}, at:
> > > update_pm_runtime_accounting+0x14/0x68
> > > [  193.906632]
> > > [  193.906632] but task is already holding lock:
> > > [  193.912483] (____ptrval____) (tk_core.seq){----}, at:
> > > multi_cpu_stop+0x8c/0x140 [  193.919828] [  193.919828] other info
> > > that might help us debug this:
> > > [  193.926377]  Possible unsafe locking scenario:
> > > [  193.926377]
> > > [  193.932314]        CPU0
> > > [  193.934765]        ----
> > > [  193.937216]   lock(tk_core.seq);
> > > [  193.940453]   lock(tk_core.seq);
> > > [  193.943691]
> > > [  193.943691]  *** DEADLOCK ***
> > > [  193.943691]
> > > [  193.949634]  May be due to missing lock nesting notation [
> > > 193.949634] [  193.956446] 3 locks held by migration/0/11:
> > > [  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at:
> > > change_clocksource+0x2c/0x118 [  193.969125]  #1: (____ptrval____)
> > > (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140 [  193.976903]  #2:
> > > (____ptrval____) (&(&dev->power.lock)->rlock){....}, at:
> > > __pm_runtime_resume+0x40/0x98 [  193.986339] [  193.986339] stack
> > backtrace:
> > > [  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted
> > > 5.0.0-rc4-next-20190201-00007-g731346f #3 [  193.999707] Hardware
> > > name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 + CAT875) (DT) [
> > 194.008089] Call trace:
> > > [  194.010553]  dump_backtrace+0x0/0x178 [  194.014227]
> > > show_stack+0x14/0x20 [  194.017562]  dump_stack+0xb0/0xec [
> > > 194.020895]  __lock_acquire+0xfb4/0x1c08 [  194.024832]
> > > lock_acquire+0xd0/0x268 [  194.028420]  ktime_get+0x5c/0x108 [
> > > 194.031747]  update_pm_runtime_accounting+0x14/0x68
> > > [  194.036643]  rpm_resume+0x4ec/0x698 [  194.040144]
> > > __pm_runtime_resume+0x50/0x98 [  194.044264]
> > > sh_tmu_enable.part.1+0x24/0x50 [  194.048462]
> > > sh_tmu_clocksource_enable+0x48/0x70
> > > [  194.053097]  change_clocksource+0x84/0x118 [  194.057208]
> > > multi_cpu_stop+0x8c/0x140 [  194.060970]
> > > cpu_stopper_thread+0xac/0x120 [  194.065087]
> > > smpboot_thread_fn+0x1ac/0x2c8 [  194.069198]  kthread+0x128/0x130 [
> > > 194.072439]  ret_from_fork+0x10/0x18
> > >
> > >
> > > Regards,
> > > Biju
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > Sent: 30 January 2019 21:53
> > > > To: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing List
> > > > <linux- kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > > > Hansson <ulf.hansson@linaro.org>; Biju Das
> > > > <biju.das@bp.renesas.com>; Geert Uytterhoeven
> > > > <geert@linux-m68k.org>; Linux-Renesas <linux-
> > > > renesas-soc@vger.kernel.org>
> > > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > > >
> > > > On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> > > > <vincent.guittot@linaro.org> wrote:
> > > > >
> > > > > A deadlock has been seen when swicthing clocksources which use PM
> > > > runtime.
> > > > > The call path is:
> > > > > change_clocksource
> > > > >     ...
> > > > >     write_seqcount_begin
> > > > >     ...
> > > > >     timekeeping_update
> > > > >         ...
> > > > >         sh_cmt_clocksource_enable
> > > > >             ...
> > > > >             rpm_resume
> > > > >                 pm_runtime_mark_last_busy
> > > > >                     ktime_get
> > > > >                         do
> > > > >                             read_seqcount_begin
> > > > >                         while read_seqcount_retry
> > > > >     ....
> > > > >     write_seqcount_end
> > > > >
> > > > > Although we should be safe because we haven't yet changed the
> > > > > clocksource at that time, we can't because of seqcount protection.
> > > > >
> > > > > Use ktime_get_mono_fast_ns() instead which is lock safe for such
> > > > > case
> > > > >
> > > > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > > > > monotonic across an update and as a result can goes backward.
> > > > > According to
> > > > > update_fast_timekeeper() description: "In the worst case, this can
> > > > > result is a slightly wrong timestamp (a few nanoseconds)". For PM
> > > > > runtime autosuspend, this means only that the suspend decision can
> > > > > be slightly sub optimal.
> > > > >
> > > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
> > > > > hrtimers")
> > > > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > ---
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > Sorry, I sent the version with the typo mistake that generated the
> > > > > compilation error reported by kbuild-test-robot
> > > > >
> > > > > This version doesn't have the typo.
> > > >
> > > > OK, I've applied this one, thanks!
> > >
> > >
> > >
> > > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne
> > End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under
> > Registered No. 04586709.
>
>
>
> Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-02-01 15:48             ` Vincent Guittot
@ 2019-02-01 15:52               ` Vincent Guittot
  2019-02-04  9:29                 ` Biju Das
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Guittot @ 2019-02-01 15:52 UTC (permalink / raw)
  To: Biju Das
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Geert Uytterhoeven, Linux-Renesas

On Fri, 1 Feb 2019 at 16:48, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> On Fri, 1 Feb 2019 at 16:44, Biju Das <biju.das@bp.renesas.com> wrote:
> >
> > Hi Vincent,
> >
> > Thanks for the feedback. Instead of reverting.  I just modified the patch like this and it fixed the issue.
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 4eaf166..145a182 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -66,7 +66,7 @@ static int rpm_suspend(struct device *dev, int rpmflags);
> >   */
> >  void update_pm_runtime_accounting(struct device *dev)
> >  {
> > -       u64 now = ktime_to_ns(ktime_get());
> > +       u64 now = ktime_get_mono_fast_ns();
> >         u64 delta;
>
> yes.
> This probably also need to be changed in another place
> (pm_runtime_enable) to be safe
> Our email have crossed. I just sent something similar

In fact after more thinking, it's probably better to revert because we
are not sure to be monotonic with ktime_get_mono_fast_ns() which means
that the time account can be negative but this not yet taken into
account. It's part of a clean up patch that was on hold while working
on previous problem

>
>
>
> >
> > regards,
> > Biju
> >
> > > -----Original Message-----
> > > From: linux-renesas-soc-owner@vger.kernel.org <linux-renesas-soc-
> > > owner@vger.kernel.org> On Behalf Of Vincent Guittot
> > > Sent: 01 February 2019 15:29
> > > To: Biju Das <biju.das@bp.renesas.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Linux PM <linux-
> > > pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> > > kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > > Hansson <ulf.hansson@linaro.org>; Geert Uytterhoeven <geert@linux-
> > > m68k.org>; Linux-Renesas <linux-renesas-soc@vger.kernel.org>
> > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > >
> > > On Fri, 1 Feb 2019 at 16:02, Biju Das <biju.das@bp.renesas.com> wrote:
> > > >
> > > > Hi Vincent,
> > > >
> > > > I have rebased my kernel to "next-20190201".  Still I am seeing dead lock.
> > > >
> > > > Am I missing any patch?
> > >
> > > No you don't miss anything.
> > > I think that it's the opposite.
> > >
> > > Modification in time accounting in PM runtime has been queued but it has
> > > not moved (yet) to ktime_get_mono_fast_ns()
> > >
> > > Can you try to revert c669560be6c8 ("PM-runtime: Replace jiffies-based
> > > accounting with ktime-based accounting") ?
> > >
> > > >
> > > > root@ek874:/# echo e61e0000.timer >
> > > > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > > [  193.869423]
> > > > [  193.870963] ============================================
> > > > [  193.876292] WARNING: possible recursive locking detected [
> > > > 193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not tainted [
> > > > 193.887737] --------------------------------------------
> > > > [  193.893066] migration/0/11 is trying to acquire lock:
> > > > [  193.898136] (____ptrval____) (tk_core.seq){----}, at:
> > > > update_pm_runtime_accounting+0x14/0x68
> > > > [  193.906632]
> > > > [  193.906632] but task is already holding lock:
> > > > [  193.912483] (____ptrval____) (tk_core.seq){----}, at:
> > > > multi_cpu_stop+0x8c/0x140 [  193.919828] [  193.919828] other info
> > > > that might help us debug this:
> > > > [  193.926377]  Possible unsafe locking scenario:
> > > > [  193.926377]
> > > > [  193.932314]        CPU0
> > > > [  193.934765]        ----
> > > > [  193.937216]   lock(tk_core.seq);
> > > > [  193.940453]   lock(tk_core.seq);
> > > > [  193.943691]
> > > > [  193.943691]  *** DEADLOCK ***
> > > > [  193.943691]
> > > > [  193.949634]  May be due to missing lock nesting notation [
> > > > 193.949634] [  193.956446] 3 locks held by migration/0/11:
> > > > [  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at:
> > > > change_clocksource+0x2c/0x118 [  193.969125]  #1: (____ptrval____)
> > > > (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140 [  193.976903]  #2:
> > > > (____ptrval____) (&(&dev->power.lock)->rlock){....}, at:
> > > > __pm_runtime_resume+0x40/0x98 [  193.986339] [  193.986339] stack
> > > backtrace:
> > > > [  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted
> > > > 5.0.0-rc4-next-20190201-00007-g731346f #3 [  193.999707] Hardware
> > > > name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 + CAT875) (DT) [
> > > 194.008089] Call trace:
> > > > [  194.010553]  dump_backtrace+0x0/0x178 [  194.014227]
> > > > show_stack+0x14/0x20 [  194.017562]  dump_stack+0xb0/0xec [
> > > > 194.020895]  __lock_acquire+0xfb4/0x1c08 [  194.024832]
> > > > lock_acquire+0xd0/0x268 [  194.028420]  ktime_get+0x5c/0x108 [
> > > > 194.031747]  update_pm_runtime_accounting+0x14/0x68
> > > > [  194.036643]  rpm_resume+0x4ec/0x698 [  194.040144]
> > > > __pm_runtime_resume+0x50/0x98 [  194.044264]
> > > > sh_tmu_enable.part.1+0x24/0x50 [  194.048462]
> > > > sh_tmu_clocksource_enable+0x48/0x70
> > > > [  194.053097]  change_clocksource+0x84/0x118 [  194.057208]
> > > > multi_cpu_stop+0x8c/0x140 [  194.060970]
> > > > cpu_stopper_thread+0xac/0x120 [  194.065087]
> > > > smpboot_thread_fn+0x1ac/0x2c8 [  194.069198]  kthread+0x128/0x130 [
> > > > 194.072439]  ret_from_fork+0x10/0x18
> > > >
> > > >
> > > > Regards,
> > > > Biju
> > > >
> > > > > -----Original Message-----
> > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > Sent: 30 January 2019 21:53
> > > > > To: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing List
> > > > > <linux- kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > > > > Hansson <ulf.hansson@linaro.org>; Biju Das
> > > > > <biju.das@bp.renesas.com>; Geert Uytterhoeven
> > > > > <geert@linux-m68k.org>; Linux-Renesas <linux-
> > > > > renesas-soc@vger.kernel.org>
> > > > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > > > >
> > > > > On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> > > > > <vincent.guittot@linaro.org> wrote:
> > > > > >
> > > > > > A deadlock has been seen when swicthing clocksources which use PM
> > > > > runtime.
> > > > > > The call path is:
> > > > > > change_clocksource
> > > > > >     ...
> > > > > >     write_seqcount_begin
> > > > > >     ...
> > > > > >     timekeeping_update
> > > > > >         ...
> > > > > >         sh_cmt_clocksource_enable
> > > > > >             ...
> > > > > >             rpm_resume
> > > > > >                 pm_runtime_mark_last_busy
> > > > > >                     ktime_get
> > > > > >                         do
> > > > > >                             read_seqcount_begin
> > > > > >                         while read_seqcount_retry
> > > > > >     ....
> > > > > >     write_seqcount_end
> > > > > >
> > > > > > Although we should be safe because we haven't yet changed the
> > > > > > clocksource at that time, we can't because of seqcount protection.
> > > > > >
> > > > > > Use ktime_get_mono_fast_ns() instead which is lock safe for such
> > > > > > case
> > > > > >
> > > > > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
> > > > > > monotonic across an update and as a result can goes backward.
> > > > > > According to
> > > > > > update_fast_timekeeper() description: "In the worst case, this can
> > > > > > result is a slightly wrong timestamp (a few nanoseconds)". For PM
> > > > > > runtime autosuspend, this means only that the suspend decision can
> > > > > > be slightly sub optimal.
> > > > > >
> > > > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
> > > > > > hrtimers")
> > > > > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > > ---
> > > > > >
> > > > > > Hi Rafael,
> > > > > >
> > > > > > Sorry, I sent the version with the typo mistake that generated the
> > > > > > compilation error reported by kbuild-test-robot
> > > > > >
> > > > > > This version doesn't have the typo.
> > > > >
> > > > > OK, I've applied this one, thanks!
> > > >
> > > >
> > > >
> > > > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne
> > > End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under
> > > Registered No. 04586709.
> >
> >
> >
> > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3] PM-runtime: fix deadlock with ktime
  2019-02-01 15:52               ` Vincent Guittot
@ 2019-02-04  9:29                 ` Biju Das
  0 siblings, 0 replies; 17+ messages in thread
From: Biju Das @ 2019-02-04  9:29 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Rafael J. Wysocki, Linux PM, Linux Kernel Mailing List,
	Linux ARM, Linux OMAP Mailing List, Rafael J. Wysocki,
	Ulf Hansson, Geert Uytterhoeven, Linux-Renesas

Hi Vincent,

Thanks for the feedback.

> Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
>
> On Fri, 1 Feb 2019 at 16:48, Vincent Guittot <vincent.guittot@linaro.org>
> wrote:
> >
> > On Fri, 1 Feb 2019 at 16:44, Biju Das <biju.das@bp.renesas.com> wrote:
> > >
> > > Hi Vincent,
> > >
> > > Thanks for the feedback. Instead of reverting.  I just modified the patch
> like this and it fixed the issue.
> > >
> > > diff --git a/drivers/base/power/runtime.c
> > > b/drivers/base/power/runtime.c index 4eaf166..145a182 100644
> > > --- a/drivers/base/power/runtime.c
> > > +++ b/drivers/base/power/runtime.c
> > > @@ -66,7 +66,7 @@ static int rpm_suspend(struct device *dev, int
> rpmflags);
> > >   */
> > >  void update_pm_runtime_accounting(struct device *dev)  {
> > > -       u64 now = ktime_to_ns(ktime_get());
> > > +       u64 now = ktime_get_mono_fast_ns();
> > >         u64 delta;
> >
> > yes.
> > This probably also need to be changed in another place
> > (pm_runtime_enable) to be safe
> > Our email have crossed. I just sent something similar
>
> In fact after more thinking, it's probably better to revert because we are not
> sure to be monotonic with ktime_get_mono_fast_ns() which means that the
> time account can be negative but this not yet taken into account. It's part of a
> clean up patch that was on hold while working on previous problem

I confirm there is no deadlock, after reverting the commit c669560be6c8 ("PM-runtime: Replace
 jiffies-based accounting with ktime-based accounting")

Regards,
Biju

> > >
> > > > -----Original Message-----
> > > > From: linux-renesas-soc-owner@vger.kernel.org <linux-renesas-soc-
> > > > owner@vger.kernel.org> On Behalf Of Vincent Guittot
> > > > Sent: 01 February 2019 15:29
> > > > To: Biju Das <biju.das@bp.renesas.com>
> > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Linux PM <linux-
> > > > pm@vger.kernel.org>; Linux Kernel Mailing List <linux-
> > > > kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Ulf
> > > > Hansson <ulf.hansson@linaro.org>; Geert Uytterhoeven <geert@linux-
> > > > m68k.org>; Linux-Renesas <linux-renesas-soc@vger.kernel.org>
> > > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > > >
> > > > On Fri, 1 Feb 2019 at 16:02, Biju Das <biju.das@bp.renesas.com> wrote:
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > I have rebased my kernel to "next-20190201".  Still I am seeing dead
> lock.
> > > > >
> > > > > Am I missing any patch?
> > > >
> > > > No you don't miss anything.
> > > > I think that it's the opposite.
> > > >
> > > > Modification in time accounting in PM runtime has been queued but
> > > > it has not moved (yet) to ktime_get_mono_fast_ns()
> > > >
> > > > Can you try to revert c669560be6c8 ("PM-runtime: Replace
> > > > jiffies-based accounting with ktime-based accounting") ?
> > > >
> > > > >
> > > > > root@ek874:/# echo e61e0000.timer >
> > > > > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > > > [  193.869423]
> > > > > [  193.870963]
> ============================================
> > > > > [  193.876292] WARNING: possible recursive locking detected [
> > > > > 193.881625] 5.0.0-rc4-next-20190201-00007-g731346f #3 Not
> > > > > tainted [ 193.887737]
> > > > > --------------------------------------------
> > > > > [  193.893066] migration/0/11 is trying to acquire lock:
> > > > > [  193.898136] (____ptrval____) (tk_core.seq){----}, at:
> > > > > update_pm_runtime_accounting+0x14/0x68
> > > > > [  193.906632]
> > > > > [  193.906632] but task is already holding lock:
> > > > > [  193.912483] (____ptrval____) (tk_core.seq){----}, at:
> > > > > multi_cpu_stop+0x8c/0x140 [  193.919828] [  193.919828] other
> > > > > info that might help us debug this:
> > > > > [  193.926377]  Possible unsafe locking scenario:
> > > > > [  193.926377]
> > > > > [  193.932314]        CPU0
> > > > > [  193.934765]        ----
> > > > > [  193.937216]   lock(tk_core.seq);
> > > > > [  193.940453]   lock(tk_core.seq);
> > > > > [  193.943691]
> > > > > [  193.943691]  *** DEADLOCK *** [  193.943691] [  193.949634]
> > > > > May be due to missing lock nesting notation [ 193.949634] [
> > > > > 193.956446] 3 locks held by migration/0/11:
> > > > > [  193.960642]  #0: (____ptrval____) (timekeeper_lock){-.-.}, at:
> > > > > change_clocksource+0x2c/0x118 [  193.969125]  #1:
> > > > > (____ptrval____) (tk_core.seq){----}, at: multi_cpu_stop+0x8c/0x140
> [  193.976903]  #2:
> > > > > (____ptrval____) (&(&dev->power.lock)->rlock){....}, at:
> > > > > __pm_runtime_resume+0x40/0x98 [  193.986339] [  193.986339]
> > > > > stack
> > > > backtrace:
> > > > > [  193.990715] CPU: 0 PID: 11 Comm: migration/0 Not tainted
> > > > > 5.0.0-rc4-next-20190201-00007-g731346f #3 [  193.999707]
> > > > > Hardware
> > > > > name: Silicon Linux RZ/G2E evaluation kit EK874 (CAT874 +
> > > > > CAT875) (DT) [
> > > > 194.008089] Call trace:
> > > > > [  194.010553]  dump_backtrace+0x0/0x178 [  194.014227]
> > > > > show_stack+0x14/0x20 [  194.017562]  dump_stack+0xb0/0xec [
> > > > > 194.020895]  __lock_acquire+0xfb4/0x1c08 [  194.024832]
> > > > > lock_acquire+0xd0/0x268 [  194.028420]  ktime_get+0x5c/0x108 [
> > > > > 194.031747]  update_pm_runtime_accounting+0x14/0x68
> > > > > [  194.036643]  rpm_resume+0x4ec/0x698 [  194.040144]
> > > > > __pm_runtime_resume+0x50/0x98 [  194.044264]
> > > > > sh_tmu_enable.part.1+0x24/0x50 [  194.048462]
> > > > > sh_tmu_clocksource_enable+0x48/0x70
> > > > > [  194.053097]  change_clocksource+0x84/0x118 [  194.057208]
> > > > > multi_cpu_stop+0x8c/0x140 [  194.060970]
> > > > > cpu_stopper_thread+0xac/0x120 [  194.065087]
> > > > > smpboot_thread_fn+0x1ac/0x2c8 [  194.069198]
> > > > > kthread+0x128/0x130 [ 194.072439]  ret_from_fork+0x10/0x18
> > > > >
> > > > >
> > > > > Regards,
> > > > > Biju
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > Sent: 30 January 2019 21:53
> > > > > > To: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > > Cc: Linux PM <linux-pm@vger.kernel.org>; Linux Kernel Mailing
> > > > > > List
> > > > > > <linux- kernel@vger.kernel.org>; Linux ARM <linux-arm-
> > > > > > kernel@lists.infradead.org>; Linux OMAP Mailing List <linux-
> > > > > > omap@vger.kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>;
> > > > > > Ulf Hansson <ulf.hansson@linaro.org>; Biju Das
> > > > > > <biju.das@bp.renesas.com>; Geert Uytterhoeven
> > > > > > <geert@linux-m68k.org>; Linux-Renesas <linux-
> > > > > > renesas-soc@vger.kernel.org>
> > > > > > Subject: Re: [PATCH v3] PM-runtime: fix deadlock with ktime
> > > > > >
> > > > > > On Wed, Jan 30, 2019 at 6:26 PM Vincent Guittot
> > > > > > <vincent.guittot@linaro.org> wrote:
> > > > > > >
> > > > > > > A deadlock has been seen when swicthing clocksources which
> > > > > > > use PM
> > > > > > runtime.
> > > > > > > The call path is:
> > > > > > > change_clocksource
> > > > > > >     ...
> > > > > > >     write_seqcount_begin
> > > > > > >     ...
> > > > > > >     timekeeping_update
> > > > > > >         ...
> > > > > > >         sh_cmt_clocksource_enable
> > > > > > >             ...
> > > > > > >             rpm_resume
> > > > > > >                 pm_runtime_mark_last_busy
> > > > > > >                     ktime_get
> > > > > > >                         do
> > > > > > >                             read_seqcount_begin
> > > > > > >                         while read_seqcount_retry
> > > > > > >     ....
> > > > > > >     write_seqcount_end
> > > > > > >
> > > > > > > Although we should be safe because we haven't yet changed
> > > > > > > the clocksource at that time, we can't because of seqcount
> protection.
> > > > > > >
> > > > > > > Use ktime_get_mono_fast_ns() instead which is lock safe for
> > > > > > > such case
> > > > > > >
> > > > > > > With ktime_get_mono_fast_ns, the timestamp is not guaranteed
> > > > > > > to be monotonic across an update and as a result can goes
> backward.
> > > > > > > According to
> > > > > > > update_fast_timekeeper() description: "In the worst case,
> > > > > > > this can result is a slightly wrong timestamp (a few
> > > > > > > nanoseconds)". For PM runtime autosuspend, this means only
> > > > > > > that the suspend decision can be slightly sub optimal.
> > > > > > >
> > > > > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to
> > > > > > > using
> > > > > > > hrtimers")
> > > > > > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > > > ---
> > > > > > >
> > > > > > > Hi Rafael,
> > > > > > >
> > > > > > > Sorry, I sent the version with the typo mistake that
> > > > > > > generated the compilation error reported by
> > > > > > > kbuild-test-robot
> > > > > > >
> > > > > > > This version doesn't have the typo.
> > > > > >
> > > > > > OK, I've applied this one, thanks!
> > > > >
> > > > >
> > > > >
> > > > > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road,
> > > > > Bourne
> > > > End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales
> > > > under Registered No. 04586709.
> > >
> > >
> > >
> > > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne
> End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under
> Registered No. 04586709.



Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-01-30 12:14 ` Ulf Hansson
@ 2019-02-05 13:24   ` Geert Uytterhoeven
  2019-02-05 13:40     ` Ulf Hansson
  0 siblings, 1 reply; 17+ messages in thread
From: Geert Uytterhoeven @ 2019-02-05 13:24 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Vincent Guittot, Linux PM, Linux Kernel Mailing List, Linux ARM,
	linux-omap, Rafael J. Wysocki, Biju Das, Linux-Renesas

Hi Ulf,

On Wed, Jan 30, 2019 at 1:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Wed, 30 Jan 2019 at 12:16, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > The call path is:
> > change_clocksource
> >     ...
> >     write_seqcount_begin
> >     ...
> >     timekeeping_update
> >         ...
> >         sh_cmt_clocksource_enable
> >             ...
> >             rpm_resume
>
> Hmm, isn't this already a problem in genpd then?
>
> In genpd's ->runtime_resume() callback (genpd_runtime_resume()) we
> call ktime_get() to measure the latency of running the callback.
>
> Or, perhaps irq_safe_dev_in_no_sleep_domain() returns true for the
> genpd + device in question, which means the ktime thingy is skipped.
>
> Geert?

Correct. After adding

    dev_info(dev, "%s: pm_runtime_is_irq_safe = %d,
genpd_is_irq_safe(%s) = %d => %s\n", __func__,
pm_runtime_is_irq_safe(dev), genpd->name, genpd_is_irq_safe(genpd),
ret ? "true" : "false");

I see on R-Car M2-W:

    sh_cmt ffca0000.timer: irq_safe_dev_in_no_sleep_domain:
pm_runtime_is_irq_safe = 1, genpd_is_irq_safe(always-on) = 0 => true

For all other devices, the result is false.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 ] PM-runtime: fix deadlock with ktime
  2019-02-05 13:24   ` Geert Uytterhoeven
@ 2019-02-05 13:40     ` Ulf Hansson
  0 siblings, 0 replies; 17+ messages in thread
From: Ulf Hansson @ 2019-02-05 13:40 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Vincent Guittot, Linux PM, Linux Kernel Mailing List, Linux ARM,
	linux-omap, Rafael J. Wysocki, Biju Das, Linux-Renesas

On Tue, 5 Feb 2019 at 14:24, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Wed, Jan 30, 2019 at 1:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Wed, 30 Jan 2019 at 12:16, Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > > The call path is:
> > > change_clocksource
> > >     ...
> > >     write_seqcount_begin
> > >     ...
> > >     timekeeping_update
> > >         ...
> > >         sh_cmt_clocksource_enable
> > >             ...
> > >             rpm_resume
> >
> > Hmm, isn't this already a problem in genpd then?
> >
> > In genpd's ->runtime_resume() callback (genpd_runtime_resume()) we
> > call ktime_get() to measure the latency of running the callback.
> >
> > Or, perhaps irq_safe_dev_in_no_sleep_domain() returns true for the
> > genpd + device in question, which means the ktime thingy is skipped.
> >
> > Geert?
>
> Correct. After adding
>
>     dev_info(dev, "%s: pm_runtime_is_irq_safe = %d,
> genpd_is_irq_safe(%s) = %d => %s\n", __func__,
> pm_runtime_is_irq_safe(dev), genpd->name, genpd_is_irq_safe(genpd),
> ret ? "true" : "false");
>
> I see on R-Car M2-W:
>
>     sh_cmt ffca0000.timer: irq_safe_dev_in_no_sleep_domain:
> pm_runtime_is_irq_safe = 1, genpd_is_irq_safe(always-on) = 0 => true
>
> For all other devices, the result is false.

Thanks for confirming! I guess we have been lucky so far.

Anyway, if runtime PM core converts to use ktime_get_mono_fast_ns(),
we should probably consider to convert genpd to this as well. Don't
you think?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-02-05 13:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30 11:16 [PATCH v2 ] PM-runtime: fix deadlock with ktime Vincent Guittot
2019-01-30 12:14 ` Ulf Hansson
2019-02-05 13:24   ` Geert Uytterhoeven
2019-02-05 13:40     ` Ulf Hansson
2019-01-30 13:06 ` Rafael J. Wysocki
2019-01-30 13:18   ` Vincent Guittot
2019-01-30 19:39     ` Ladislav Michl
2019-01-30 17:26   ` [PATCH v3] " Vincent Guittot
2019-01-30 21:53     ` Rafael J. Wysocki
2019-02-01 15:02       ` Biju Das
2019-02-01 15:28         ` Vincent Guittot
2019-02-01 15:44           ` Biju Das
2019-02-01 15:48             ` Vincent Guittot
2019-02-01 15:52               ` Vincent Guittot
2019-02-04  9:29                 ` Biju Das
2019-02-01 15:45           ` Vincent Guittot
2019-01-30 21:32   ` [PATCH v2 ] " Ladislav Michl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).