All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible
@ 2021-06-20 10:13 Ani Sinha
  2021-06-22 15:14 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Ani Sinha @ 2021-06-20 10:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: anirban.sinha, Ani Sinha, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Frederic Weisbecker, Thomas Gleixner

Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
dependency mask model") had also removed the kernel warning
message informing the user that it was not possible to turn
on NO_HZ_FULL. Adding back that log message here. It is
unhelpful when the kernel turns off NO_HZ_FULL silently
without informing anyone.
Also added a kernel log when sched clock is marked as unstable.

Signed-off-by: Ani Sinha <ani@anisinha.ca>
---
 kernel/sched/clock.c     |  5 ++++-
 kernel/time/tick-sched.c | 14 +++++++++++---
 2 files changed, 15 insertions(+), 4 deletions(-)

changelog:
v1: original patch
v2: updated log message to make it non specific to sched clock. Also
    added another log when sched clock is marked as unstable.
v3: a bugfix to v2.

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c2b2859ddd82..9f9fe658f8a5 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
 
 	smp_mb(); /* matches sched_clock_init_late() */
 
-	if (static_key_count(&sched_clock_running.key) == 2)
+	if (static_key_count(&sched_clock_running.key) == 2) {
+		WARN_ONCE(sched_clock_stable(),
+			  "sched clock is now marked unstable.");
 		__clear_sched_clock_stable();
+	}
 }
 
 static void __sched_clock_gtod_offset(void)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 828b091501ca..6e583aa2d160 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -937,10 +937,18 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (can_stop_full_tick(cpu, ts))
+	if (can_stop_full_tick(cpu, ts)) {
 		tick_nohz_stop_sched_tick(ts, cpu);
-	else if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick(ts, ktime_get());
+	} else {
+		/*
+		 * Don't allow the user to think they can get
+		 * full NO_HZ with this machine.
+		 */
+		WARN_ONCE(tick_nohz_full_running,
+			  "NO_HZ_FULL will not work for the current system.");
+		if (ts->tick_stopped)
+			tick_nohz_restart_sched_tick(ts, ktime_get());
+	}
 #endif
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible
  2021-06-20 10:13 [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible Ani Sinha
@ 2021-06-22 15:14 ` Thomas Gleixner
  2021-06-26 16:18   ` Ani Sinha
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2021-06-22 15:14 UTC (permalink / raw)
  To: Ani Sinha, linux-kernel
  Cc: anirban.sinha, Ani Sinha, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Frederic Weisbecker

On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:

> Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> dependency mask model") had also removed the kernel warning
> message informing the user that it was not possible to turn
> on NO_HZ_FULL. Adding back that log message here. It is
> unhelpful when the kernel turns off NO_HZ_FULL silently
> without informing anyone.
> Also added a kernel log when sched clock is marked as unstable.

Don't do two things at once. See Documentation/process/....

Also your subject line want's a proper prefix.

> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index c2b2859ddd82..9f9fe658f8a5 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
>  
>  	smp_mb(); /* matches sched_clock_init_late() */
>  
> -	if (static_key_count(&sched_clock_running.key) == 2)
> +	if (static_key_count(&sched_clock_running.key) == 2) {
> +		WARN_ONCE(sched_clock_stable(),
> +			  "sched clock is now marked unstable.");

What's the WARN for here? That backtrace is largely uninteresting.

> -	if (can_stop_full_tick(cpu, ts))
> +	if (can_stop_full_tick(cpu, ts)) {
>  		tick_nohz_stop_sched_tick(ts, cpu);
> -	else if (ts->tick_stopped)
> -		tick_nohz_restart_sched_tick(ts, ktime_get());
> +	} else {
> +		/*
> +		 * Don't allow the user to think they can get
> +		 * full NO_HZ with this machine.
> +		 */
> +		WARN_ONCE(tick_nohz_full_running,
> +			  "NO_HZ_FULL will not work for the current system.");

can_stop_full_tick() returning false can be transient and then the user
still has no idea _why_ this is printed.

Also assume the user/admin starts perf and knows he's going to disturb
NOHZ full, then _why_ would he be interested in that warning.

And again the backtrace is useless. The call path is known.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible
  2021-06-22 15:14 ` Thomas Gleixner
@ 2021-06-26 16:18   ` Ani Sinha
  2021-07-03 17:30     ` Ani Sinha
  0 siblings, 1 reply; 4+ messages in thread
From: Ani Sinha @ 2021-06-26 16:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ani Sinha, linux-kernel, anirban.sinha, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Frederic Weisbecker



On Tue, 22 Jun 2021, Thomas Gleixner wrote:

> On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:
>
> > Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> > dependency mask model") had also removed the kernel warning
> > message informing the user that it was not possible to turn
> > on NO_HZ_FULL. Adding back that log message here. It is
> > unhelpful when the kernel turns off NO_HZ_FULL silently
> > without informing anyone.
> > Also added a kernel log when sched clock is marked as unstable.
>
> Don't do two things at once. See Documentation/process/....

Ok I will split up this patch into two.

>
> Also your subject line want's a proper prefix.

OK will fix.

>
> > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > index c2b2859ddd82..9f9fe658f8a5 100644
> > --- a/kernel/sched/clock.c
> > +++ b/kernel/sched/clock.c
> > @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
> >
> >  	smp_mb(); /* matches sched_clock_init_late() */
> >
> > -	if (static_key_count(&sched_clock_running.key) == 2)
> > +	if (static_key_count(&sched_clock_running.key) == 2) {
> > +		WARN_ONCE(sched_clock_stable(),
> > +			  "sched clock is now marked unstable.");
>
> What's the WARN for here? That backtrace is largely uninteresting.

OK to be consistent with other parts of the code, I will replace this with
pr_warn()

>
> > -	if (can_stop_full_tick(cpu, ts))
> > +	if (can_stop_full_tick(cpu, ts)) {
> >  		tick_nohz_stop_sched_tick(ts, cpu);
> > -	else if (ts->tick_stopped)
> > -		tick_nohz_restart_sched_tick(ts, ktime_get());
> > +	} else {
> > +		/*
> > +		 * Don't allow the user to think they can get
> > +		 * full NO_HZ with this machine.
> > +		 */
> > +		WARN_ONCE(tick_nohz_full_running,
> > +			  "NO_HZ_FULL will not work for the current system.");
>
> can_stop_full_tick() returning false can be transient and then the user
> still has no idea _why_ this is printed.
>
> Also assume the user/admin starts perf and knows he's going to disturb
> NOHZ full, then _why_ would he be interested in that warning.
>

If NOHZ is disabled intentionally, that is not an interesting case. I am
worried about the situation where the user specifies NOHZ option in the
kernel commandline and the kernel silently disabled this because of one or
more limitations in the system. I want to address this. All the reasons
specified in the following commit is still true:

commit e12d0271774fea9fddf1e2a7952a0bffb2ee8e8b
Author: Steven Rostedt <rostedt@goodmis.org>
Date:   Fri May 10 17:12:28 2013 -0400

    nohz: Warn if the machine can not perform nohz_full

    If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
    kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
    due to the machine having a unstable clock, warn about it.

    We do not want users thinking that they are getting the benefit
    of nohz when their machine can not support it.

    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Li Zhong <zhong@linux.vnet.ibm.com>
    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

This log was removed from the kernel in commit

commit 4f49b90abb4aca6fe677c95fc352fd0674d489bd
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Wed Jul 22 17:03:52 2015 +0200

    sched-clock: Migrate to use new tick dependency mask model

    Instead of checking sched_clock_stable from the nohz subsystem to
verify
    its tick dependency, migrate it to the new mask in order to include it
    to the all-in-one check.


and it seems to me that the removal of the log defeats the purpose that
commit e12d0271774fea9fddf tried to serve.

Please enlighten me as to how to achieve this and I will cook up
something.

Ani

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible
  2021-06-26 16:18   ` Ani Sinha
@ 2021-07-03 17:30     ` Ani Sinha
  0 siblings, 0 replies; 4+ messages in thread
From: Ani Sinha @ 2021-07-03 17:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, anirban.sinha, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Frederic Weisbecker

ping ...

On Sat, Jun 26, 2021 at 9:49 PM Ani Sinha <ani@anisinha.ca> wrote:
>
>
>
> On Tue, 22 Jun 2021, Thomas Gleixner wrote:
>
> > On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:
> >
> > > Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> > > dependency mask model") had also removed the kernel warning
> > > message informing the user that it was not possible to turn
> > > on NO_HZ_FULL. Adding back that log message here. It is
> > > unhelpful when the kernel turns off NO_HZ_FULL silently
> > > without informing anyone.
> > > Also added a kernel log when sched clock is marked as unstable.
> >
> > Don't do two things at once. See Documentation/process/....
>
> Ok I will split up this patch into two.
>
> >
> > Also your subject line want's a proper prefix.
>
> OK will fix.
>
> >
> > > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > > index c2b2859ddd82..9f9fe658f8a5 100644
> > > --- a/kernel/sched/clock.c
> > > +++ b/kernel/sched/clock.c
> > > @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
> > >
> > >     smp_mb(); /* matches sched_clock_init_late() */
> > >
> > > -   if (static_key_count(&sched_clock_running.key) == 2)
> > > +   if (static_key_count(&sched_clock_running.key) == 2) {
> > > +           WARN_ONCE(sched_clock_stable(),
> > > +                     "sched clock is now marked unstable.");
> >
> > What's the WARN for here? That backtrace is largely uninteresting.
>
> OK to be consistent with other parts of the code, I will replace this with
> pr_warn()
>
> >
> > > -   if (can_stop_full_tick(cpu, ts))
> > > +   if (can_stop_full_tick(cpu, ts)) {
> > >             tick_nohz_stop_sched_tick(ts, cpu);
> > > -   else if (ts->tick_stopped)
> > > -           tick_nohz_restart_sched_tick(ts, ktime_get());
> > > +   } else {
> > > +           /*
> > > +            * Don't allow the user to think they can get
> > > +            * full NO_HZ with this machine.
> > > +            */
> > > +           WARN_ONCE(tick_nohz_full_running,
> > > +                     "NO_HZ_FULL will not work for the current system.");
> >
> > can_stop_full_tick() returning false can be transient and then the user
> > still has no idea _why_ this is printed.
> >
> > Also assume the user/admin starts perf and knows he's going to disturb
> > NOHZ full, then _why_ would he be interested in that warning.
> >
>
> If NOHZ is disabled intentionally, that is not an interesting case. I am
> worried about the situation where the user specifies NOHZ option in the
> kernel commandline and the kernel silently disabled this because of one or
> more limitations in the system. I want to address this. All the reasons
> specified in the following commit is still true:
>
> commit e12d0271774fea9fddf1e2a7952a0bffb2ee8e8b
> Author: Steven Rostedt <rostedt@goodmis.org>
> Date:   Fri May 10 17:12:28 2013 -0400
>
>     nohz: Warn if the machine can not perform nohz_full
>
>     If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
>     kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
>     due to the machine having a unstable clock, warn about it.
>
>     We do not want users thinking that they are getting the benefit
>     of nohz when their machine can not support it.
>
>     Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>     Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Cc: Ingo Molnar <mingo@kernel.org>
>     Cc: Andrew Morton <akpm@linux-foundation.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Cc: H. Peter Anvin <hpa@zytor.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Borislav Petkov <bp@alien8.de>
>     Cc: Li Zhong <zhong@linux.vnet.ibm.com>
>     Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
>
> This log was removed from the kernel in commit
>
> commit 4f49b90abb4aca6fe677c95fc352fd0674d489bd
> Author: Frederic Weisbecker <fweisbec@gmail.com>
> Date:   Wed Jul 22 17:03:52 2015 +0200
>
>     sched-clock: Migrate to use new tick dependency mask model
>
>     Instead of checking sched_clock_stable from the nohz subsystem to
> verify
>     its tick dependency, migrate it to the new mask in order to include it
>     to the all-in-one check.
>
>
> and it seems to me that the removal of the log defeats the purpose that
> commit e12d0271774fea9fddf tried to serve.
>
> Please enlighten me as to how to achieve this and I will cook up
> something.
>
> Ani

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-03 17:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-20 10:13 [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible Ani Sinha
2021-06-22 15:14 ` Thomas Gleixner
2021-06-26 16:18   ` Ani Sinha
2021-07-03 17:30     ` Ani Sinha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.