linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints
@ 2020-09-24  0:08 Prasad Sodagudi
  2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Prasad Sodagudi @ 2020-09-24  0:08 UTC (permalink / raw)
  To: rostedt, pmladek, sergey.senozhatsky, gregkh, tglx
  Cc: linux-kernel, tkjos, Prasad Sodagudi

During the cpu hot plug stress testing, couple of messages
continuous flooding on to the console is causing timers
migration delay. Delayed time migrations from hot plugging
core is causing device instability with watchdog. So reduce
log level for couple of prints in cpu hot plug flow.

Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
---
 arch/arm64/kernel/smp.c | 2 +-
 kernel/irq/cpuhotplug.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 355ee9e..08da6e3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -338,7 +338,7 @@ void __cpu_die(unsigned int cpu)
 		pr_crit("CPU%u: cpu didn't die\n", cpu);
 		return;
 	}
-	pr_notice("CPU%u: shutdown\n", cpu);
+	pr_info("CPU%u: shutdown\n", cpu);
 
 	/*
 	 * Now that the dying CPU is beyond the point of no return w.r.t.
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 02236b1..82802e0 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -42,7 +42,7 @@ static inline bool irq_needs_fixup(struct irq_data *d)
 		 * If this happens then there was a missed IRQ fixup at some
 		 * point. Warn about it and enforce fixup.
 		 */
-		pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
+		pr_info("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
 			cpumask_pr_args(m), d->irq, cpu);
 		return true;
 	}
@@ -166,7 +166,7 @@ void irq_migrate_all_off_this_cpu(void)
 		raw_spin_unlock(&desc->lock);
 
 		if (affinity_broken) {
-			pr_warn_ratelimited("IRQ %u: no longer affine to CPU%u\n",
+			pr_info_ratelimited("IRQ %u: no longer affine to CPU%u\n",
 					    irq, smp_processor_id());
 		}
 	}
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24  0:08 [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Prasad Sodagudi
@ 2020-09-24  0:08 ` Prasad Sodagudi
  2020-09-24  6:33   ` Greg KH
                     ` (2 more replies)
  2020-09-24  6:31 ` [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Greg KH
  2020-09-24 18:08 ` Thomas Gleixner
  2 siblings, 3 replies; 14+ messages in thread
From: Prasad Sodagudi @ 2020-09-24  0:08 UTC (permalink / raw)
  To: rostedt, pmladek, sergey.senozhatsky, gregkh, tglx
  Cc: linux-kernel, tkjos, Mohammed Khajapasha, Prasad Sodagudi

From: Mohammed Khajapasha <mkhaja@codeaurora.org>

The thread which initiates the hot plug can get scheduled
out, while trying to acquire the console lock,
thus increasing the hot plug latency. This option
allows to selectively disable the console flush and
in turn reduce the hot plug latency.

Signed-off-by: Mohammed Khajapasha <mkhaja@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
---
 init/Kconfig           | 10 ++++++++++
 kernel/printk/printk.c | 10 ++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index d6a0b31..9ce39ba 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -699,6 +699,16 @@ config LOG_BUF_SHIFT
 		     13 =>  8 KB
 		     12 =>  4 KB
 
+config CONSOLE_FLUSH_ON_HOTPLUG
+	bool "Enable console flush configurable in hot plug code path"
+	depends on HOTPLUG_CPU
+	def_bool n
+	help
+	In cpu hot plug path console lock acquire and release causes the
+	console to flush. If console lock is not free hot plug latency
+	increases. So make console flush configurable in hot plug path
+	and default disabled to help in cpu hot plug latencies.
+
 config LOG_CPU_MAX_BUF_SHIFT
 	int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
 	depends on SMP
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9b75f6b..f02d3ef 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2283,6 +2283,8 @@ void resume_console(void)
 	console_unlock();
 }
 
+#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
+
 /**
  * console_cpu_notify - print deferred console messages after CPU hotplug
  * @cpu: unused
@@ -2302,6 +2304,8 @@ static int console_cpu_notify(unsigned int cpu)
 	return 0;
 }
 
+#endif
+
 /**
  * console_lock - lock the console system for exclusive use.
  *
@@ -2974,7 +2978,7 @@ void __init console_init(void)
 static int __init printk_late_init(void)
 {
 	struct console *con;
-	int ret;
+	int ret = 0;
 
 	for_each_console(con) {
 		if (!(con->flags & CON_BOOT))
@@ -2996,13 +3000,15 @@ static int __init printk_late_init(void)
 			unregister_console(con);
 		}
 	}
+#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
 	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
 					console_cpu_notify);
 	WARN_ON(ret < 0);
 	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online",
 					console_cpu_notify, NULL);
 	WARN_ON(ret < 0);
-	return 0;
+#endif
+	return ret;
 }
 late_initcall(printk_late_init);
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints
  2020-09-24  0:08 [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Prasad Sodagudi
  2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
@ 2020-09-24  6:31 ` Greg KH
  2020-09-24 18:08 ` Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2020-09-24  6:31 UTC (permalink / raw)
  To: Prasad Sodagudi
  Cc: rostedt, pmladek, sergey.senozhatsky, tglx, linux-kernel, tkjos

On Wed, Sep 23, 2020 at 05:08:31PM -0700, Prasad Sodagudi wrote:
> During the cpu hot plug stress testing, couple of messages
> continuous flooding on to the console is causing timers
> migration delay. Delayed time migrations from hot plugging
> core is causing device instability with watchdog. So reduce
> log level for couple of prints in cpu hot plug flow.
> 
> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
> ---
>  arch/arm64/kernel/smp.c | 2 +-
>  kernel/irq/cpuhotplug.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 355ee9e..08da6e3 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -338,7 +338,7 @@ void __cpu_die(unsigned int cpu)
>  		pr_crit("CPU%u: cpu didn't die\n", cpu);
>  		return;
>  	}
> -	pr_notice("CPU%u: shutdown\n", cpu);
> +	pr_info("CPU%u: shutdown\n", cpu);
>  
>  	/*
>  	 * Now that the dying CPU is beyond the point of no return w.r.t.
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 02236b1..82802e0 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -42,7 +42,7 @@ static inline bool irq_needs_fixup(struct irq_data *d)
>  		 * If this happens then there was a missed IRQ fixup at some
>  		 * point. Warn about it and enforce fixup.
>  		 */
> -		pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
> +		pr_info("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
>  			cpumask_pr_args(m), d->irq, cpu);
>  		return true;
>  	}
> @@ -166,7 +166,7 @@ void irq_migrate_all_off_this_cpu(void)
>  		raw_spin_unlock(&desc->lock);
>  
>  		if (affinity_broken) {
> -			pr_warn_ratelimited("IRQ %u: no longer affine to CPU%u\n",
> +			pr_info_ratelimited("IRQ %u: no longer affine to CPU%u\n",
>  					    irq, smp_processor_id());
>  		}
>  	}
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
@ 2020-09-24  6:33   ` Greg KH
  2020-09-24 18:21     ` Thomas Gleixner
  2020-09-24  6:38   ` Sergey Senozhatsky
  2020-09-30 13:57   ` Petr Mladek
  2 siblings, 1 reply; 14+ messages in thread
From: Greg KH @ 2020-09-24  6:33 UTC (permalink / raw)
  To: Prasad Sodagudi
  Cc: rostedt, pmladek, sergey.senozhatsky, tglx, linux-kernel, tkjos,
	Mohammed Khajapasha

On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
> From: Mohammed Khajapasha <mkhaja@codeaurora.org>
> 
> The thread which initiates the hot plug can get scheduled
> out, while trying to acquire the console lock,
> thus increasing the hot plug latency. This option
> allows to selectively disable the console flush and
> in turn reduce the hot plug latency.
> 
> Signed-off-by: Mohammed Khajapasha <mkhaja@codeaurora.org>
> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
> ---
>  init/Kconfig           | 10 ++++++++++
>  kernel/printk/printk.c | 10 ++++++++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index d6a0b31..9ce39ba 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -699,6 +699,16 @@ config LOG_BUF_SHIFT
>  		     13 =>  8 KB
>  		     12 =>  4 KB
>  
> +config CONSOLE_FLUSH_ON_HOTPLUG
> +	bool "Enable console flush configurable in hot plug code path"
> +	depends on HOTPLUG_CPU
> +	def_bool n

n is the default, no need to list it.

> +	help
> +	In cpu hot plug path console lock acquire and release causes the
> +	console to flush. If console lock is not free hot plug latency
> +	increases. So make console flush configurable in hot plug path
> +	and default disabled to help in cpu hot plug latencies.

Why would you not want this option?

Why isn't this just a bugfix?

> +
>  config LOG_CPU_MAX_BUF_SHIFT
>  	int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
>  	depends on SMP
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9b75f6b..f02d3ef 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2283,6 +2283,8 @@ void resume_console(void)
>  	console_unlock();
>  }
>  
> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
> +
>  /**
>   * console_cpu_notify - print deferred console messages after CPU hotplug
>   * @cpu: unused
> @@ -2302,6 +2304,8 @@ static int console_cpu_notify(unsigned int cpu)
>  	return 0;
>  }
>  
> +#endif
> +
>  /**
>   * console_lock - lock the console system for exclusive use.
>   *
> @@ -2974,7 +2978,7 @@ void __init console_init(void)
>  static int __init printk_late_init(void)
>  {
>  	struct console *con;
> -	int ret;
> +	int ret = 0;
>  
>  	for_each_console(con) {
>  		if (!(con->flags & CON_BOOT))
> @@ -2996,13 +3000,15 @@ static int __init printk_late_init(void)
>  			unregister_console(con);
>  		}
>  	}
> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG

#ifdef in .c code is a mess to maintain.

>  	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
>  					console_cpu_notify);
>  	WARN_ON(ret < 0);
>  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online",
>  					console_cpu_notify, NULL);
>  	WARN_ON(ret < 0);
> -	return 0;
> +#endif

What happens if we don't make these calls entirely?  Why not just remove
them as who wants extra latency for their system?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
  2020-09-24  6:33   ` Greg KH
@ 2020-09-24  6:38   ` Sergey Senozhatsky
  2020-09-30 13:57   ` Petr Mladek
  2 siblings, 0 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2020-09-24  6:38 UTC (permalink / raw)
  To: Prasad Sodagudi
  Cc: rostedt, pmladek, sergey.senozhatsky, gregkh, tglx, linux-kernel,
	tkjos, Mohammed Khajapasha

On (20/09/23 17:08), Prasad Sodagudi wrote:
> From: Mohammed Khajapasha <mkhaja@codeaurora.org>
> 
> The thread which initiates the hot plug can get scheduled
> out, while trying to acquire the console lock,
> thus increasing the hot plug latency. This option
> allows to selectively disable the console flush and
> in turn reduce the hot plug latency.

It can schedule out or get preempted pretty much anywhere at any
time. printk->console_lock is not special in this regard. What am
I missing?

	-ss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints
  2020-09-24  0:08 [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Prasad Sodagudi
  2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
  2020-09-24  6:31 ` [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Greg KH
@ 2020-09-24 18:08 ` Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2020-09-24 18:08 UTC (permalink / raw)
  To: Prasad Sodagudi, rostedt, pmladek, sergey.senozhatsky, gregkh
  Cc: linux-kernel, tkjos, Prasad Sodagudi

On Wed, Sep 23 2020 at 17:08, Prasad Sodagudi wrote:
> During the cpu hot plug stress testing, couple of messages
> continuous flooding on to the console is causing timers
> migration delay. Delayed time migrations from hot plugging
> core is causing device instability with watchdog. So reduce
> log level for couple of prints in cpu hot plug flow.

This is fixing the wrong end, really.

Timer migration can be delayed by other means as well.

The real problem is that the migration happens _after_ the CPU is
completely dead and the hotplug control thread is not guaranteed to
reach the timer migration state before timers are overdue at all.

There is a bunch of related problems, e.g. the interrupt migration
mechanism kicks in late as well.

I'm not against changing the log level per se, but the justification for
doing so is just bogus.

The more obvious question is whether these printks are useful at all
other than at the pr_debug() level.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24  6:33   ` Greg KH
@ 2020-09-24 18:21     ` Thomas Gleixner
  2020-09-25  9:27       ` Greg KH
  2020-09-28  2:05       ` psodagud
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2020-09-24 18:21 UTC (permalink / raw)
  To: Greg KH, Prasad Sodagudi
  Cc: rostedt, pmladek, sergey.senozhatsky, linux-kernel, tkjos,
	Mohammed Khajapasha

On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
> On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
>> +config CONSOLE_FLUSH_ON_HOTPLUG
>> +	bool "Enable console flush configurable in hot plug code path"
>> +	depends on HOTPLUG_CPU
>> +	def_bool n
>
> n is the default, no need to list it.
>
>> +	help
>> +	In cpu hot plug path console lock acquire and release causes the
>> +	console to flush. If console lock is not free hot plug latency
>> +	increases. So make console flush configurable in hot plug path
>> +	and default disabled to help in cpu hot plug latencies.
>
> Why would you not want this option?
>
> Why isn't this just a bugfix?

Because it's the normal behaviour of console lock and there are
gazillion other ways to delay stuff in the hotplug path.

CPU hotplug is not meant to be a high speed operation and if people
think they need it to be fast then its pretty much guaranteed that they
want it for the completely wrong reasons.

This #ifdef tinkering is just digusting especially as it just tackles an
obvious way how to delay timer migration, but does not address the
underlying root cause.

>> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
>
> #ifdef in .c code is a mess to maintain.
>
>>  	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
>>  					console_cpu_notify);
>>  	WARN_ON(ret < 0);
>>  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online",
>>  					console_cpu_notify, NULL);
>>  	WARN_ON(ret < 0);
>> -	return 0;
>> +#endif
>
> What happens if we don't make these calls entirely?  Why not just remove
> them as who wants extra latency for their system?

That's just wrong. If you don't want output, then adjust your loglevel,
but delaying printks up to the point where by chance another printk
happens is just silly.

CPU hotplug is not about latency. It's slow by design and again, the
timer migration is simply happening at the wrong place. But fixing that
needs more thoughts than modifying log levels and sprinkling a few
#ifdefs into the code.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24 18:21     ` Thomas Gleixner
@ 2020-09-25  9:27       ` Greg KH
  2020-09-25 14:16         ` Adam Borowski
  2020-09-28  2:05       ` psodagud
  1 sibling, 1 reply; 14+ messages in thread
From: Greg KH @ 2020-09-25  9:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Prasad Sodagudi, rostedt, pmladek, sergey.senozhatsky,
	linux-kernel, tkjos, Mohammed Khajapasha

On Thu, Sep 24, 2020 at 08:21:07PM +0200, Thomas Gleixner wrote:
> On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
> > On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
> >> +config CONSOLE_FLUSH_ON_HOTPLUG
> >> +	bool "Enable console flush configurable in hot plug code path"
> >> +	depends on HOTPLUG_CPU
> >> +	def_bool n
> >
> > n is the default, no need to list it.
> >
> >> +	help
> >> +	In cpu hot plug path console lock acquire and release causes the
> >> +	console to flush. If console lock is not free hot plug latency
> >> +	increases. So make console flush configurable in hot plug path
> >> +	and default disabled to help in cpu hot plug latencies.
> >
> > Why would you not want this option?
> >
> > Why isn't this just a bugfix?
> 
> Because it's the normal behaviour of console lock and there are
> gazillion other ways to delay stuff in the hotplug path.
> 
> CPU hotplug is not meant to be a high speed operation and if people
> think they need it to be fast then its pretty much guaranteed that they
> want it for the completely wrong reasons.

Odds are, it's the big/little systems that are trying to use cpu hotplug
for this type of thing :(

> This #ifdef tinkering is just digusting especially as it just tackles an
> obvious way how to delay timer migration, but does not address the
> underlying root cause.

Agreed, thanks for putting it better than I did.

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-25  9:27       ` Greg KH
@ 2020-09-25 14:16         ` Adam Borowski
  2020-09-25 22:55           ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Adam Borowski @ 2020-09-25 14:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Thomas Gleixner, Prasad Sodagudi, rostedt, pmladek,
	sergey.senozhatsky, linux-kernel, tkjos, Mohammed Khajapasha

On Fri, Sep 25, 2020 at 11:27:54AM +0200, Greg KH wrote:
> On Thu, Sep 24, 2020 at 08:21:07PM +0200, Thomas Gleixner wrote:
> > On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
> > > On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
> > >> +config CONSOLE_FLUSH_ON_HOTPLUG
> > >> +	bool "Enable console flush configurable in hot plug code path"

> > CPU hotplug is not meant to be a high speed operation and if people
> > think they need it to be fast then its pretty much guaranteed that they
> > want it for the completely wrong reasons.
> 
> Odds are, it's the big/little systems that are trying to use cpu hotplug
> for this type of thing :(

Just a bit of info:
My MT6797X (10 core: 4×A53 + 4×A53 + 2×A72), flickers its cores this way:
the right-hand piece is CPUs, one character per core: bars show utilization,
"o" stands for offline; every line is 0.1 second interval.

topline -i 0.1
mmcblk(⠀) (oooo▄▆oo▅o)
mmcblk(⠀) (oooo▅▄oooo)
mmcblk(⠀) (oooo▆▆oooo)
mmcblk(⠀) (oooo▅ooo▆o)
mmcblk(⠀) (oooo▆▆oo▄o)
mmcblk(⠀) (oooo▆▇oooo)
mmcblk(⠀) (oooo▇ooo▅o)
mmcblk(⠀) (oooo▆ooo█o)
mmcblk(⠀) (oooo▆ooo▄o)
mmcblk(⠀) (oooo▅ooo▆o)
mmcblk(⠀) (oooo▆ooo▅o)
mmcblk(⠀) (oooo▄ooo▇o)
mmcblk(⠀) (oooo▇▆oo▆o)
mmcblk(⠀) (oooo▆ooo▅o)
mmcblk(⠀) (oooo▅▆oooo)
mmcblk(⠀) (oooo▆█oooo)
mmcblk(⠀) (oooo▆▇oooo)
mmcblk(⠀) (oooo▆▆oooo)
mmcblk(⠀) (oooo▅▆oooo)
mmcblk(⠀) (oooo▆▅oooo)
mmcblk(⠀) (oooo▆ooo▆o)
mmcblk(⠀) (oooo▆ooo▇o)
mmcblk(⢀) (oooo▆▇oo▆o)
mmcblk(⠀) (oooo▄ooo▆o)
mmcblk(⠀) (oooo▆ooo█o)
mmcblk(⠀) (oooo▄ooo▇o)
mmcblk(⠀) (oooo▄▆oooo)
mmcblk(⠀) (oooo▆▆oooo)

So it's on the order of a few ons/offs per second.

The offline CPUs are "present" and "offline"; not sure if this means hotplug
or not (I'd expect dropping from "present" to "possible", but I don't know
these parts).


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ in the beginning was the boot and root floppies and they were good.
⢿⡄⠘⠷⠚⠋⠀                                       -- <willmore> on #linux-sunxi
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-25 14:16         ` Adam Borowski
@ 2020-09-25 22:55           ` Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2020-09-25 22:55 UTC (permalink / raw)
  To: Adam Borowski, Greg KH
  Cc: Prasad Sodagudi, rostedt, pmladek, sergey.senozhatsky,
	linux-kernel, tkjos, Mohammed Khajapasha

On Fri, Sep 25 2020 at 16:16, Adam Borowski wrote:
> On Fri, Sep 25, 2020 at 11:27:54AM +0200, Greg KH wrote:
>> On Thu, Sep 24, 2020 at 08:21:07PM +0200, Thomas Gleixner wrote:
>> > On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
>> > > On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
>> > >> +config CONSOLE_FLUSH_ON_HOTPLUG
>> > >> +	bool "Enable console flush configurable in hot plug code path"
>
>> > CPU hotplug is not meant to be a high speed operation and if people
>> > think they need it to be fast then its pretty much guaranteed that they
>> > want it for the completely wrong reasons.
>> 
>> Odds are, it's the big/little systems that are trying to use cpu hotplug
>> for this type of thing :(
>
> Just a bit of info:
> My MT6797X (10 core: 4×A53 + 4×A53 + 2×A72), flickers its cores this way:
> the right-hand piece is CPUs, one character per core: bars show utilization,
> "o" stands for offline; every line is 0.1 second interval.
>
> topline -i 0.1
> mmcblk(⠀) (oooo▄▆oo▅o)
> mmcblk(⠀) (oooo▅▄oooo)

...

> So it's on the order of a few ons/offs per second.
>
> The offline CPUs are "present" and "offline"; not sure if this means hotplug
> or not (I'd expect dropping from "present" to "possible", but I don't know
> these parts).

Yes, they are (ab)using CPU hotplug instead of utilizing the fine
grained hotplug state control and fix up the few odds and ends which
keeps the CPU from staying in deep idle forever.

Tinkering is way simpler than proper engineering.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24 18:21     ` Thomas Gleixner
  2020-09-25  9:27       ` Greg KH
@ 2020-09-28  2:05       ` psodagud
  2020-09-28 12:50         ` Greg KH
  2020-09-30 14:36         ` Petr Mladek
  1 sibling, 2 replies; 14+ messages in thread
From: psodagud @ 2020-09-28  2:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Greg KH, rostedt, pmladek, sergey.senozhatsky, linux-kernel,
	tkjos, Mohammed Khajapasha

On 2020-09-24 11:21, Thomas Gleixner wrote:
> On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
>> On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
>>> +config CONSOLE_FLUSH_ON_HOTPLUG
>>> +	bool "Enable console flush configurable in hot plug code path"
>>> +	depends on HOTPLUG_CPU
>>> +	def_bool n
>> 
>> n is the default, no need to list it.
>> 
>>> +	help
>>> +	In cpu hot plug path console lock acquire and release causes the
>>> +	console to flush. If console lock is not free hot plug latency
>>> +	increases. So make console flush configurable in hot plug path
>>> +	and default disabled to help in cpu hot plug latencies.
>> 
>> Why would you not want this option?
>> 
>> Why isn't this just a bugfix?
> 
> Because it's the normal behaviour of console lock and there are
> gazillion other ways to delay stuff in the hotplug path.
> 
> CPU hotplug is not meant to be a high speed operation and if people
> think they need it to be fast then its pretty much guaranteed that they
> want it for the completely wrong reasons.
> 
> This #ifdef tinkering is just digusting especially as it just tackles 
> an
> obvious way how to delay timer migration, but does not address the
> underlying root cause.
> 

Hi tglx,

Yes. I agree with you that there are other conditions, which could delay 
the hotplug operation. But this console
flushing is not needed in the hotplug path.  In the hotplug path, a core 
is trying printing messages
from other core(by design of printk), delays the whole hotplug operation 
and timers migration.  As timers
migration gets delayed, it would impact the systems stability in device 
stability testing.
To avoid timers delay in the timer migration in  debug builds has to 
choose this option.

I thought of changing the timers and irq migration as priority callbacks 
in the hotplug out operation
but I observed some comments like shown below. I was under impression 
that, it is hard to find all this
type of conditions, so started tinkering hotplug path by changing the 
log levels.
These changes helped on Qualcomm platforms testing.
         /*
          * On the tear-down path, timers_dead_cpu() must be invoked
          * before blk_mq_queue_reinit_notify() from notify_dead(),
          * otherwise a RCU stall occurs.
          */
         [CPUHP_TIMERS_PREPARE] = {
                 .name                   = "timers:prepare",
                 .startup.single         = timers_prepare_cpu,
                 .teardown.single        = timers_dead_cpu,
         },

Another reason for adding #ifdef is that, I was not clear why console 
flush is need cpuhp callback and thought
there might be some use cases and console flush use case might not be 
valid for all the users of cpu hotplug.
I will try to explore the changing the callback order to complete the 
timers and irq migration early in the hotplug operation.

Let me put some use cases of hotplug  and why hotplug and hotplug 
latency is important from testing point of view.
1)	Secondary cpus are hotplug out during the device suspend and hotplug 
in during the resume.  So cpu hotplug operation is important production 
devices point of view as user presses the power key many times.

2)	sysfs nodes (/sys/devices/ststem/cpu/cpu4/oneline) are present from 
linux kernel, so  test team wants to test cpu hotplug. There could be 
issues with in generic kernel, device drivers or firmware(psci calls 
handling from firmware).  There could be issues with device drivers or 
firmware and test teams can not leave the hotplug untested in builds.

3)	Linux kernel also gave provision to register call backs with cpu 
hotplug framework(CPUHP_AP_ONLINE_DYN) dynamic callbacks.
3002         ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, 
"printk:online",
3003                                         console_cpu_notify, NULL);
	So test team wants to test if any in tree or out of tree modules have 
any issues with registered call backs or not.

4)	Tracing of the cpuhp operation is important to find whether upstream 
changes or out of tree modules(or firmware changes) caused latency 
regression or not.

-Thanks, Prasad


>>> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
>> 
>> #ifdef in .c code is a mess to maintain.
>> 
>>>  	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", 
>>> NULL,
>>>  					console_cpu_notify);
>>>  	WARN_ON(ret < 0);
>>>  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, 
>>> "printk:online",
>>>  					console_cpu_notify, NULL);
>>>  	WARN_ON(ret < 0);
>>> -	return 0;
>>> +#endif
>> 
>> What happens if we don't make these calls entirely?  Why not just 
>> remove
>> them as who wants extra latency for their system?
> 
> That's just wrong. If you don't want output, then adjust your loglevel,
> but delaying printks up to the point where by chance another printk
> happens is just silly.
> 
> CPU hotplug is not about latency. It's slow by design and again, the
> timer migration is simply happening at the wrong place. But fixing that
> needs more thoughts than modifying log levels and sprinkling a few
> #ifdefs into the code.
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-28  2:05       ` psodagud
@ 2020-09-28 12:50         ` Greg KH
  2020-09-30 14:36         ` Petr Mladek
  1 sibling, 0 replies; 14+ messages in thread
From: Greg KH @ 2020-09-28 12:50 UTC (permalink / raw)
  To: psodagud
  Cc: Thomas Gleixner, rostedt, pmladek, sergey.senozhatsky,
	linux-kernel, tkjos, Mohammed Khajapasha

On Sun, Sep 27, 2020 at 07:05:34PM -0700, psodagud@codeaurora.org wrote:
> On 2020-09-24 11:21, Thomas Gleixner wrote:
> > On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
> > > On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
> > > > +config CONSOLE_FLUSH_ON_HOTPLUG
> > > > +	bool "Enable console flush configurable in hot plug code path"
> > > > +	depends on HOTPLUG_CPU
> > > > +	def_bool n
> > > 
> > > n is the default, no need to list it.
> > > 
> > > > +	help
> > > > +	In cpu hot plug path console lock acquire and release causes the
> > > > +	console to flush. If console lock is not free hot plug latency
> > > > +	increases. So make console flush configurable in hot plug path
> > > > +	and default disabled to help in cpu hot plug latencies.
> > > 
> > > Why would you not want this option?
> > > 
> > > Why isn't this just a bugfix?
> > 
> > Because it's the normal behaviour of console lock and there are
> > gazillion other ways to delay stuff in the hotplug path.
> > 
> > CPU hotplug is not meant to be a high speed operation and if people
> > think they need it to be fast then its pretty much guaranteed that they
> > want it for the completely wrong reasons.
> > 
> > This #ifdef tinkering is just digusting especially as it just tackles an
> > obvious way how to delay timer migration, but does not address the
> > underlying root cause.
> > 
> 
> Hi tglx,
> 
> Yes. I agree with you that there are other conditions, which could delay the
> hotplug operation. But this console
> flushing is not needed in the hotplug path.  In the hotplug path, a core is
> trying printing messages
> from other core(by design of printk), delays the whole hotplug operation and
> timers migration.  As timers
> migration gets delayed, it would impact the systems stability in device
> stability testing.
> To avoid timers delay in the timer migration in  debug builds has to choose
> this option.
> 
> I thought of changing the timers and irq migration as priority callbacks in
> the hotplug out operation
> but I observed some comments like shown below. I was under impression that,
> it is hard to find all this
> type of conditions, so started tinkering hotplug path by changing the log
> levels.
> These changes helped on Qualcomm platforms testing.
>         /*
>          * On the tear-down path, timers_dead_cpu() must be invoked
>          * before blk_mq_queue_reinit_notify() from notify_dead(),
>          * otherwise a RCU stall occurs.
>          */
>         [CPUHP_TIMERS_PREPARE] = {
>                 .name                   = "timers:prepare",
>                 .startup.single         = timers_prepare_cpu,
>                 .teardown.single        = timers_dead_cpu,
>         },
> 
> Another reason for adding #ifdef is that, I was not clear why console flush
> is need cpuhp callback and thought
> there might be some use cases and console flush use case might not be valid
> for all the users of cpu hotplug.
> I will try to explore the changing the callback order to complete the timers
> and irq migration early in the hotplug operation.
> 
> Let me put some use cases of hotplug  and why hotplug and hotplug latency is
> important from testing point of view.
> 1)	Secondary cpus are hotplug out during the device suspend and hotplug in
> during the resume.  So cpu hotplug operation is important production devices
> point of view as user presses the power key many times.

But what does suspend/resume have to do with this?  Why not do just an
offline operation instead of unplugging the whole cpu?

> 2)	sysfs nodes (/sys/devices/ststem/cpu/cpu4/oneline) are present from linux
> kernel, so  test team wants to test cpu hotplug. There could be issues with
> in generic kernel, device drivers or firmware(psci calls handling from
> firmware).  There could be issues with device drivers or firmware and test
> teams can not leave the hotplug untested in builds.

Your change isn't for testing things, speed doesn't matter when writing
to sysfs nodes, right?

> 3)	Linux kernel also gave provision to register call backs with cpu hotplug
> framework(CPUHP_AP_ONLINE_DYN) dynamic callbacks.
> 3002         ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> "printk:online",
> 3003                                         console_cpu_notify, NULL);
> 	So test team wants to test if any in tree or out of tree modules have any
> issues with registered call backs or not.

Again, how is this a speed issue?

> 4)	Tracing of the cpuhp operation is important to find whether upstream
> changes or out of tree modules(or firmware changes) caused latency
> regression or not.

But cpu hotplug is not deterministic, so how does latency matter here?

confused as to the real problem here...

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
  2020-09-24  6:33   ` Greg KH
  2020-09-24  6:38   ` Sergey Senozhatsky
@ 2020-09-30 13:57   ` Petr Mladek
  2 siblings, 0 replies; 14+ messages in thread
From: Petr Mladek @ 2020-09-30 13:57 UTC (permalink / raw)
  To: Prasad Sodagudi
  Cc: rostedt, sergey.senozhatsky, gregkh, tglx, linux-kernel, tkjos,
	Mohammed Khajapasha

On Wed 2020-09-23 17:08:32, Prasad Sodagudi wrote:
> From: Mohammed Khajapasha <mkhaja@codeaurora.org>
> 
> The thread which initiates the hot plug can get scheduled
> out, while trying to acquire the console lock,
> thus increasing the hot plug latency. This option
> allows to selectively disable the console flush and
> in turn reduce the hot plug latency.
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9b75f6b..f02d3ef 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2996,13 +3000,15 @@ static int __init printk_late_init(void)
>  			unregister_console(con);
>  		}
>  	}
> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
>  	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
>  					console_cpu_notify);
>  	WARN_ON(ret < 0);
>  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online",
>  					console_cpu_notify, NULL);
>  	WARN_ON(ret < 0);
> -	return 0;
> +#endif
> +	return ret;

Just a comment from the printk-side view. This change is wrong, definitely.

The above calls are needed with the current printk() design. They make
sure that someone would actually push the messages to the console.
Otherwise they might stay hidden seconds/minutes/hour/days until
another random printk() does the job.

They will not be needed with the ongoing printk rework[1] where
the consoles will get handled by a dedicated kthread.

[1] https://lore.kernel.org/lkml/87k1acz5rx.fsf@linutronix.de/

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path
  2020-09-28  2:05       ` psodagud
  2020-09-28 12:50         ` Greg KH
@ 2020-09-30 14:36         ` Petr Mladek
  1 sibling, 0 replies; 14+ messages in thread
From: Petr Mladek @ 2020-09-30 14:36 UTC (permalink / raw)
  To: psodagud
  Cc: Thomas Gleixner, Greg KH, rostedt, sergey.senozhatsky,
	linux-kernel, tkjos, Mohammed Khajapasha

On Sun 2020-09-27 19:05:34, psodagud@codeaurora.org wrote:
> Yes. I agree with you that there are other conditions, which could delay the
> hotplug operation. But this console
> flushing is not needed in the hotplug path.  In the hotplug path, a core is
> trying printing messages
> from other core(by design of printk), delays the whole hotplug operation and
> timers migration.  As timers
> migration gets delayed, it would impact the systems stability in device
> stability testing.
> To avoid timers delay in the timer migration in  debug builds has to choose
> this option.

Again, just from the printk() point of view. The current printk()
design is that it tries to flush the messages to the console
immediately. It increases the chance to see them. The price is
that printk() might delay the code a lot.

Console handled in a dedicated kthread might make printk() fast. But
there might be another problem. Messages might get lost when they
generated faster than they might be shown on slow consoles.

In both cases, you need to think twice whether you really want to
print the messages at all or if you really want to show them on
slow consoles, see console_loglevel.

Also keep in mind that testing is not real life. I believe that
kernel messages are not shown on slow consoles on production devices.

That said, it seems that printk() is not the real problem here. As
others suggest, CPU hotplug is not the right solution for the usecase.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-30 14:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24  0:08 [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Prasad Sodagudi
2020-09-24  0:08 ` [PATCH 2/2] printk: Make the console flush configurable in hotplug path Prasad Sodagudi
2020-09-24  6:33   ` Greg KH
2020-09-24 18:21     ` Thomas Gleixner
2020-09-25  9:27       ` Greg KH
2020-09-25 14:16         ` Adam Borowski
2020-09-25 22:55           ` Thomas Gleixner
2020-09-28  2:05       ` psodagud
2020-09-28 12:50         ` Greg KH
2020-09-30 14:36         ` Petr Mladek
2020-09-24  6:38   ` Sergey Senozhatsky
2020-09-30 13:57   ` Petr Mladek
2020-09-24  6:31 ` [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints Greg KH
2020-09-24 18:08 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).