All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-10 13:56 ` Sai Prakash Ranjan
  0 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-10 13:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman, Sai Prakash Ranjan

From: Shanker Donthineni <shankerd@codeaurora.org>

Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
reads/writes from/to DCC on secondary cores. Each core has its
own DCC device registers, so when a core reads or writes from/to DCC,
it only accesses its own DCC device. Since kernel code can run on
any core, every time the kernel wants to write to the console, it
might write to a different DCC.

In SMP mode, Trace32 creates multiple windows, and each window shows
the DCC output only from that core's DCC. The result is that console
output is either lost or scattered across windows.

Selecting this option will enable code that serializes all console
input and output to core 0. The DCC driver will create input and
output FIFOs that all cores will use. Reads and writes from/to DCC
are handled by a workqueue that runs only core 0.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Elliot Berman <eberman@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
---

Changes in v4:
 * Use module parameter for runtime choice of enabling this feature.
 * Use hotplug locks to avoid race between cpu online check and work schedule.
 * Remove ifdefs and move to common ops.
 * Remove unnecessary check for this configuration.
 * Use macros for buf size instead of magic numbers.
 * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/

Changes in v3:
 * Handle case where core0 is not online.

Changes in v2:
 * Checkpatch warning fixes.
 * Use of IS_ENABLED macros instead of ifdefs.

---
 drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 174 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
index 8e0edb7d93fd..535b09441e55 100644
--- a/drivers/tty/hvc/hvc_dcc.c
+++ b/drivers/tty/hvc/hvc_dcc.c
@@ -2,19 +2,35 @@
 /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
 
 #include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <linux/init.h>
+#include <linux/kfifo.h>
+#include <linux/moduleparam.h>
 #include <linux/serial.h>
 #include <linux/serial_core.h>
+#include <linux/spinlock.h>
 
 #include <asm/dcc.h>
 #include <asm/processor.h>
 
 #include "hvc_console.h"
 
+static bool serialize_smp;
+module_param(serialize_smp, bool, 0444);
+MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
+
 /* DCC Status Bits */
 #define DCC_STATUS_RX		(1 << 30)
 #define DCC_STATUS_TX		(1 << 29)
 
+#define DCC_INBUF_SIZE		128
+#define DCC_OUTBUF_SIZE		1024
+
+static DEFINE_SPINLOCK(dcc_lock);
+static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
+static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
+
 static void dcc_uart_console_putchar(struct uart_port *port, int ch)
 {
 	while (__dcc_getstatus() & DCC_STATUS_TX)
@@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
 	return i;
 }
 
+/*
+ * Check if the DCC is enabled. If serialize_smp module param is enabled,
+ * then we assume then this function will be called first on core0. That way,
+ * dcc_core0_available will be true only if it's available on core0.
+ */
 static bool hvc_dcc_check(void)
 {
 	unsigned long time = jiffies + (HZ / 10);
+	static bool dcc_core0_available;
+
+	/*
+	 * If we're not on core 0, but we previously confirmed that DCC is
+	 * active, then just return true.
+	 */
+	if (serialize_smp && smp_processor_id() && dcc_core0_available)
+		return true;
 
 	/* Write a test character to check if it is handled */
 	__dcc_putchar('\n');
 
 	while (time_is_after_jiffies(time)) {
-		if (!(__dcc_getstatus() & DCC_STATUS_TX))
+		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
+			dcc_core0_available = true;
 			return true;
+		}
 	}
 
 	return false;
 }
 
+/*
+ * Workqueue function that writes the output FIFO to the DCC on core 0.
+ */
+static void dcc_put_work(struct work_struct *work)
+{
+	unsigned char ch;
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	/* While there's data in the output FIFO, write it to the DCC */
+	while (kfifo_get(&outbuf, &ch))
+		hvc_dcc_put_chars(0, &ch, 1);
+
+	/* While we're at it, check for any input characters */
+	while (!kfifo_is_full(&inbuf)) {
+		if (!hvc_dcc_get_chars(0, &ch, 1))
+			break;
+		kfifo_put(&inbuf, ch);
+	}
+
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+}
+
+static DECLARE_WORK(dcc_pwork, dcc_put_work);
+
+/*
+ * Workqueue function that reads characters from DCC and puts them into the
+ * input FIFO.
+ */
+static void dcc_get_work(struct work_struct *work)
+{
+	unsigned char ch;
+	unsigned long irqflags;
+
+	/*
+	 * Read characters from DCC and put them into the input FIFO, as
+	 * long as there is room and we have characters to read.
+	 */
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	while (!kfifo_is_full(&inbuf)) {
+		if (!hvc_dcc_get_chars(0, &ch, 1))
+			break;
+		kfifo_put(&inbuf, ch);
+	}
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+}
+
+static DECLARE_WORK(dcc_gwork, dcc_get_work);
+
+/*
+ * Write characters directly to the DCC if we're on core 0 and the FIFO
+ * is empty, or write them to the FIFO if we're not.
+ */
+static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
+{
+	int len;
+	unsigned long irqflags;
+
+	if (!serialize_smp)
+		return hvc_dcc_put_chars(vt, buf, count);
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
+		len = kfifo_in(&outbuf, buf, count);
+		spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+		/*
+		 * We just push data to the output FIFO, so schedule the
+		 * workqueue that will actually write that data to DCC.
+		 * Also take a CPU hotplug lock to avoid CPU going down
+		 * between the check and scheduling work on CPU0.
+		 */
+		cpus_read_lock();
+
+		if (cpu_online(0))
+			schedule_work_on(0, &dcc_pwork);
+
+		cpus_read_unlock();
+
+		return len;
+	}
+
+	/*
+	 * If we're already on core 0, and the FIFO is empty, then just
+	 * write the data to DCC.
+	 */
+	len = hvc_dcc_put_chars(vt, buf, count);
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+	return len;
+}
+
+/*
+ * Read characters directly from the DCC if we're on core 0 and the FIFO
+ * is empty, or read them from the FIFO if we're not.
+ */
+static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
+{
+	int len;
+	unsigned long irqflags;
+
+	if (!serialize_smp)
+		return hvc_dcc_get_chars(vt, buf, count);
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
+		len = kfifo_out(&inbuf, buf, count);
+		spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+		/*
+		 * If the FIFO was empty, there may be characters in the DCC
+		 * that we haven't read yet.  Schedule a workqueue to fill
+		 * the input FIFO, so that the next time this function is
+		 * called, we'll have data. Take a CPU hotplug lock as well
+		 * to avoid CPU going down between the cpu online check and
+		 * scheduling work on CPU0.
+		 */
+		cpus_read_lock();
+
+		if (!len && cpu_online(0))
+			schedule_work_on(0, &dcc_gwork);
+
+		cpus_read_unlock();
+
+		return len;
+	}
+
+	/*
+	 * If we're already on core 0, and the FIFO is empty, then just
+	 * read the data from DCC.
+	 */
+	len = hvc_dcc_get_chars(vt, buf, count);
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+	return len;
+}
+
 static const struct hv_ops hvc_dcc_get_put_ops = {
-	.get_chars = hvc_dcc_get_chars,
-	.put_chars = hvc_dcc_put_chars,
+	.get_chars = hvc_dcc0_get_chars,
+	.put_chars = hvc_dcc0_put_chars,
 };
 
 static int __init hvc_dcc_console_init(void)

base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-10 13:56 ` Sai Prakash Ranjan
  0 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-10 13:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman, Sai Prakash Ranjan

From: Shanker Donthineni <shankerd@codeaurora.org>

Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
reads/writes from/to DCC on secondary cores. Each core has its
own DCC device registers, so when a core reads or writes from/to DCC,
it only accesses its own DCC device. Since kernel code can run on
any core, every time the kernel wants to write to the console, it
might write to a different DCC.

In SMP mode, Trace32 creates multiple windows, and each window shows
the DCC output only from that core's DCC. The result is that console
output is either lost or scattered across windows.

Selecting this option will enable code that serializes all console
input and output to core 0. The DCC driver will create input and
output FIFOs that all cores will use. Reads and writes from/to DCC
are handled by a workqueue that runs only core 0.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Elliot Berman <eberman@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
---

Changes in v4:
 * Use module parameter for runtime choice of enabling this feature.
 * Use hotplug locks to avoid race between cpu online check and work schedule.
 * Remove ifdefs and move to common ops.
 * Remove unnecessary check for this configuration.
 * Use macros for buf size instead of magic numbers.
 * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/

Changes in v3:
 * Handle case where core0 is not online.

Changes in v2:
 * Checkpatch warning fixes.
 * Use of IS_ENABLED macros instead of ifdefs.

---
 drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 174 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
index 8e0edb7d93fd..535b09441e55 100644
--- a/drivers/tty/hvc/hvc_dcc.c
+++ b/drivers/tty/hvc/hvc_dcc.c
@@ -2,19 +2,35 @@
 /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
 
 #include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <linux/init.h>
+#include <linux/kfifo.h>
+#include <linux/moduleparam.h>
 #include <linux/serial.h>
 #include <linux/serial_core.h>
+#include <linux/spinlock.h>
 
 #include <asm/dcc.h>
 #include <asm/processor.h>
 
 #include "hvc_console.h"
 
+static bool serialize_smp;
+module_param(serialize_smp, bool, 0444);
+MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
+
 /* DCC Status Bits */
 #define DCC_STATUS_RX		(1 << 30)
 #define DCC_STATUS_TX		(1 << 29)
 
+#define DCC_INBUF_SIZE		128
+#define DCC_OUTBUF_SIZE		1024
+
+static DEFINE_SPINLOCK(dcc_lock);
+static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
+static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
+
 static void dcc_uart_console_putchar(struct uart_port *port, int ch)
 {
 	while (__dcc_getstatus() & DCC_STATUS_TX)
@@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
 	return i;
 }
 
+/*
+ * Check if the DCC is enabled. If serialize_smp module param is enabled,
+ * then we assume then this function will be called first on core0. That way,
+ * dcc_core0_available will be true only if it's available on core0.
+ */
 static bool hvc_dcc_check(void)
 {
 	unsigned long time = jiffies + (HZ / 10);
+	static bool dcc_core0_available;
+
+	/*
+	 * If we're not on core 0, but we previously confirmed that DCC is
+	 * active, then just return true.
+	 */
+	if (serialize_smp && smp_processor_id() && dcc_core0_available)
+		return true;
 
 	/* Write a test character to check if it is handled */
 	__dcc_putchar('\n');
 
 	while (time_is_after_jiffies(time)) {
-		if (!(__dcc_getstatus() & DCC_STATUS_TX))
+		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
+			dcc_core0_available = true;
 			return true;
+		}
 	}
 
 	return false;
 }
 
+/*
+ * Workqueue function that writes the output FIFO to the DCC on core 0.
+ */
+static void dcc_put_work(struct work_struct *work)
+{
+	unsigned char ch;
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	/* While there's data in the output FIFO, write it to the DCC */
+	while (kfifo_get(&outbuf, &ch))
+		hvc_dcc_put_chars(0, &ch, 1);
+
+	/* While we're at it, check for any input characters */
+	while (!kfifo_is_full(&inbuf)) {
+		if (!hvc_dcc_get_chars(0, &ch, 1))
+			break;
+		kfifo_put(&inbuf, ch);
+	}
+
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+}
+
+static DECLARE_WORK(dcc_pwork, dcc_put_work);
+
+/*
+ * Workqueue function that reads characters from DCC and puts them into the
+ * input FIFO.
+ */
+static void dcc_get_work(struct work_struct *work)
+{
+	unsigned char ch;
+	unsigned long irqflags;
+
+	/*
+	 * Read characters from DCC and put them into the input FIFO, as
+	 * long as there is room and we have characters to read.
+	 */
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	while (!kfifo_is_full(&inbuf)) {
+		if (!hvc_dcc_get_chars(0, &ch, 1))
+			break;
+		kfifo_put(&inbuf, ch);
+	}
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+}
+
+static DECLARE_WORK(dcc_gwork, dcc_get_work);
+
+/*
+ * Write characters directly to the DCC if we're on core 0 and the FIFO
+ * is empty, or write them to the FIFO if we're not.
+ */
+static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
+{
+	int len;
+	unsigned long irqflags;
+
+	if (!serialize_smp)
+		return hvc_dcc_put_chars(vt, buf, count);
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
+		len = kfifo_in(&outbuf, buf, count);
+		spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+		/*
+		 * We just push data to the output FIFO, so schedule the
+		 * workqueue that will actually write that data to DCC.
+		 * Also take a CPU hotplug lock to avoid CPU going down
+		 * between the check and scheduling work on CPU0.
+		 */
+		cpus_read_lock();
+
+		if (cpu_online(0))
+			schedule_work_on(0, &dcc_pwork);
+
+		cpus_read_unlock();
+
+		return len;
+	}
+
+	/*
+	 * If we're already on core 0, and the FIFO is empty, then just
+	 * write the data to DCC.
+	 */
+	len = hvc_dcc_put_chars(vt, buf, count);
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+	return len;
+}
+
+/*
+ * Read characters directly from the DCC if we're on core 0 and the FIFO
+ * is empty, or read them from the FIFO if we're not.
+ */
+static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
+{
+	int len;
+	unsigned long irqflags;
+
+	if (!serialize_smp)
+		return hvc_dcc_get_chars(vt, buf, count);
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
+		len = kfifo_out(&inbuf, buf, count);
+		spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+		/*
+		 * If the FIFO was empty, there may be characters in the DCC
+		 * that we haven't read yet.  Schedule a workqueue to fill
+		 * the input FIFO, so that the next time this function is
+		 * called, we'll have data. Take a CPU hotplug lock as well
+		 * to avoid CPU going down between the cpu online check and
+		 * scheduling work on CPU0.
+		 */
+		cpus_read_lock();
+
+		if (!len && cpu_online(0))
+			schedule_work_on(0, &dcc_gwork);
+
+		cpus_read_unlock();
+
+		return len;
+	}
+
+	/*
+	 * If we're already on core 0, and the FIFO is empty, then just
+	 * read the data from DCC.
+	 */
+	len = hvc_dcc_get_chars(vt, buf, count);
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+	return len;
+}
+
 static const struct hv_ops hvc_dcc_get_put_ops = {
-	.get_chars = hvc_dcc_get_chars,
-	.put_chars = hvc_dcc_put_chars,
+	.get_chars = hvc_dcc0_get_chars,
+	.put_chars = hvc_dcc0_put_chars,
 };
 
 static int __init hvc_dcc_console_init(void)

base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-10 13:56 ` Sai Prakash Ranjan
@ 2022-02-10 14:24   ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-10 14:24 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman

On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
> 
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.
> 
> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
> 
> Changes in v4:
>  * Use module parameter for runtime choice of enabling this feature.

No, this is not the 1990's, module parameters do not work and are not
sustainable.  They operate on a code-level while you are modifying a
device-specific attribute here.  Please make this per-device if you
really want to be able to somehow turn this on or off.

>  * Use hotplug locks to avoid race between cpu online check and work schedule.
>  * Remove ifdefs and move to common ops.
>  * Remove unnecessary check for this configuration.
>  * Use macros for buf size instead of magic numbers.
>  * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
> 
> Changes in v3:
>  * Handle case where core0 is not online.
> 
> Changes in v2:
>  * Checkpatch warning fixes.
>  * Use of IS_ENABLED macros instead of ifdefs.
> 
> ---
>  drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 174 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..535b09441e55 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,19 +2,35 @@
>  /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>  
>  #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/moduleparam.h>
>  #include <linux/serial.h>
>  #include <linux/serial_core.h>
> +#include <linux/spinlock.h>
>  
>  #include <asm/dcc.h>
>  #include <asm/processor.h>
>  
>  #include "hvc_console.h"
>  
> +static bool serialize_smp;
> +module_param(serialize_smp, bool, 0444);
> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
> +
>  /* DCC Status Bits */
>  #define DCC_STATUS_RX		(1 << 30)
>  #define DCC_STATUS_TX		(1 << 29)
>  
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024

Why these random sizes?  Why is one bigger than the other?  Why are they
these specific numbers?

> +
> +static DEFINE_SPINLOCK(dcc_lock);

What is this locking?  Please document it (didn't checkpatch complain?)

> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>  static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>  {
>  	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>  	return i;
>  }
>  
> +/*
> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>  static bool hvc_dcc_check(void)
>  {
>  	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)

Why are you checking smp_processor_id()?  Are you sure it is safe to do
that here?



> +		return true;
>  
>  	/* Write a test character to check if it is handled */
>  	__dcc_putchar('\n');
>  
>  	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>  			return true;
> +		}

That's a hard busy loop, are you sure it will always exit?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-10 14:24   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-10 14:24 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman

On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
> 
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.
> 
> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
> 
> Changes in v4:
>  * Use module parameter for runtime choice of enabling this feature.

No, this is not the 1990's, module parameters do not work and are not
sustainable.  They operate on a code-level while you are modifying a
device-specific attribute here.  Please make this per-device if you
really want to be able to somehow turn this on or off.

>  * Use hotplug locks to avoid race between cpu online check and work schedule.
>  * Remove ifdefs and move to common ops.
>  * Remove unnecessary check for this configuration.
>  * Use macros for buf size instead of magic numbers.
>  * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
> 
> Changes in v3:
>  * Handle case where core0 is not online.
> 
> Changes in v2:
>  * Checkpatch warning fixes.
>  * Use of IS_ENABLED macros instead of ifdefs.
> 
> ---
>  drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 174 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..535b09441e55 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,19 +2,35 @@
>  /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>  
>  #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/moduleparam.h>
>  #include <linux/serial.h>
>  #include <linux/serial_core.h>
> +#include <linux/spinlock.h>
>  
>  #include <asm/dcc.h>
>  #include <asm/processor.h>
>  
>  #include "hvc_console.h"
>  
> +static bool serialize_smp;
> +module_param(serialize_smp, bool, 0444);
> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
> +
>  /* DCC Status Bits */
>  #define DCC_STATUS_RX		(1 << 30)
>  #define DCC_STATUS_TX		(1 << 29)
>  
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024

Why these random sizes?  Why is one bigger than the other?  Why are they
these specific numbers?

> +
> +static DEFINE_SPINLOCK(dcc_lock);

What is this locking?  Please document it (didn't checkpatch complain?)

> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>  static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>  {
>  	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>  	return i;
>  }
>  
> +/*
> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>  static bool hvc_dcc_check(void)
>  {
>  	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)

Why are you checking smp_processor_id()?  Are you sure it is safe to do
that here?



> +		return true;
>  
>  	/* Write a test character to check if it is handled */
>  	__dcc_putchar('\n');
>  
>  	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>  			return true;
> +		}

That's a hard busy loop, are you sure it will always exit?

thanks,

greg k-h

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-10 14:24   ` Greg Kroah-Hartman
@ 2022-02-10 16:49     ` Sai Prakash Ranjan
  -1 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-10 16:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman

Hi,

On 2/10/2022 7:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>
>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>> reads/writes from/to DCC on secondary cores. Each core has its
>> own DCC device registers, so when a core reads or writes from/to DCC,
>> it only accesses its own DCC device. Since kernel code can run on
>> any core, every time the kernel wants to write to the console, it
>> might write to a different DCC.
>>
>> In SMP mode, Trace32 creates multiple windows, and each window shows
>> the DCC output only from that core's DCC. The result is that console
>> output is either lost or scattered across windows.
>>
>> Selecting this option will enable code that serializes all console
>> input and output to core 0. The DCC driver will create input and
>> output FIFOs that all cores will use. Reads and writes from/to DCC
>> are handled by a workqueue that runs only core 0.
>>
>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>> Acked-by: Adam Wallis <awallis@codeaurora.org>
>> Signed-off-by: Timur Tabi <timur@codeaurora.org>
>> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
>> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
>> ---
>>
>> Changes in v4:
>>   * Use module parameter for runtime choice of enabling this feature.
> No, this is not the 1990's, module parameters do not work and are not
> sustainable.  They operate on a code-level while you are modifying a
> device-specific attribute here.  Please make this per-device if you
> really want to be able to somehow turn this on or off.

Can you please explain how is this a device-specific thing? I guess you 
mean something like a device
tree property but that is not what it is used for, it is for hardware 
description of a device and this is not a
HW description but a software feature. Arch information such as DCC 
existing only on ARM64 is already
implied in Kconfig when this driver was merged before. Anyone with an 
ARM64 device with DCC can use
this feature. So anyone be it Mediatek, Qualcomm or any others can use 
this and there is nothing device
specific here. We need something on code level which is why the earlier 
version had build time Kconfig but
you mentioned something about runtime choice, so I modified to use 
module parameter. I will move back
to build time configuration for next version unless you have something 
else in mind when you mean a
runtime choice?

>>   * Use hotplug locks to avoid race between cpu online check and work schedule.
>>   * Remove ifdefs and move to common ops.
>>   * Remove unnecessary check for this configuration.
>>   * Use macros for buf size instead of magic numbers.
>>   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>>
>> Changes in v3:
>>   * Handle case where core0 is not online.
>>
>> Changes in v2:
>>   * Checkpatch warning fixes.
>>   * Use of IS_ENABLED macros instead of ifdefs.
>>
>> ---
>>   drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 174 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
>> index 8e0edb7d93fd..535b09441e55 100644
>> --- a/drivers/tty/hvc/hvc_dcc.c
>> +++ b/drivers/tty/hvc/hvc_dcc.c
>> @@ -2,19 +2,35 @@
>>   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>>   
>>   #include <linux/console.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>>   #include <linux/init.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/moduleparam.h>
>>   #include <linux/serial.h>
>>   #include <linux/serial_core.h>
>> +#include <linux/spinlock.h>
>>   
>>   #include <asm/dcc.h>
>>   #include <asm/processor.h>
>>   
>>   #include "hvc_console.h"
>>   
>> +static bool serialize_smp;
>> +module_param(serialize_smp, bool, 0444);
>> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
>> +
>>   /* DCC Status Bits */
>>   #define DCC_STATUS_RX		(1 << 30)
>>   #define DCC_STATUS_TX		(1 << 29)
>>   
>> +#define DCC_INBUF_SIZE		128
>> +#define DCC_OUTBUF_SIZE		1024
> Why these random sizes?  Why is one bigger than the other?  Why are they
> these specific numbers?

These are input and output kfifo sizes, it is a software construct and 
there is no specification as such.
As per kfifo documentation, size must be a power of 2. As for the sizes, 
IN_BUF size is less assuming
that amount of input data (RX) is usually less when compared to the 
output data (TX ) on the DCC console.
For ex, during boot the output kernel logs on the DCC console would be 
more than the input.
Given DCC console is very slow, we wouldn't want to make the sizes too 
large, hence 1024.
This configuration is well tested for years now and I would like to keep 
these numbers unless someone
else comes with some issue with these sizes.

>> +
>> +static DEFINE_SPINLOCK(dcc_lock);
> What is this locking?  Please document it (didn't checkpatch complain?)

Sure, I will document this and no checkpatch doesn't complain even with 
--strict option.


>> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
>> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
>> +
>>   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>>   {
>>   	while (__dcc_getstatus() & DCC_STATUS_TX)
>> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>>   	return i;
>>   }
>>   
>> +/*
>> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
>> + * then we assume then this function will be called first on core0. That way,
>> + * dcc_core0_available will be true only if it's available on core0.
>> + */
>>   static bool hvc_dcc_check(void)
>>   {
>>   	unsigned long time = jiffies + (HZ / 10);
>> +	static bool dcc_core0_available;
>> +
>> +	/*
>> +	 * If we're not on core 0, but we previously confirmed that DCC is
>> +	 * active, then just return true.
>> +	 */
>> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
> Why are you checking smp_processor_id()?  Are you sure it is safe to do
> that here?

It is to check for non-zero CPU core as mentioned in the comment above 
the check.
On safety, thanks for that, I guess you meant about calling 
smp_processor_id() in preemptible
context bug, so I tested with CONFIG_DEBUG_PREEMPT=y and that is a 
premptible section
and makes my system unbootable. I'll use proper get_cpu() and put_cpu() 
around this check.

>
>
>> +		return true;
>>   
>>   	/* Write a test character to check if it is handled */
>>   	__dcc_putchar('\n');
>>   
>>   	while (time_is_after_jiffies(time)) {
>> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
>> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
>> +			dcc_core0_available = true;
>>   			return true;
>> +		}
> That's a hard busy loop, are you sure it will always exit?

How? Nothing is changed there from before except setting a variable and 
the loop never checks
for that variable.

Thanks,
Sai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-10 16:49     ` Sai Prakash Ranjan
  0 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-10 16:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman

Hi,

On 2/10/2022 7:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>
>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>> reads/writes from/to DCC on secondary cores. Each core has its
>> own DCC device registers, so when a core reads or writes from/to DCC,
>> it only accesses its own DCC device. Since kernel code can run on
>> any core, every time the kernel wants to write to the console, it
>> might write to a different DCC.
>>
>> In SMP mode, Trace32 creates multiple windows, and each window shows
>> the DCC output only from that core's DCC. The result is that console
>> output is either lost or scattered across windows.
>>
>> Selecting this option will enable code that serializes all console
>> input and output to core 0. The DCC driver will create input and
>> output FIFOs that all cores will use. Reads and writes from/to DCC
>> are handled by a workqueue that runs only core 0.
>>
>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>> Acked-by: Adam Wallis <awallis@codeaurora.org>
>> Signed-off-by: Timur Tabi <timur@codeaurora.org>
>> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
>> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
>> ---
>>
>> Changes in v4:
>>   * Use module parameter for runtime choice of enabling this feature.
> No, this is not the 1990's, module parameters do not work and are not
> sustainable.  They operate on a code-level while you are modifying a
> device-specific attribute here.  Please make this per-device if you
> really want to be able to somehow turn this on or off.

Can you please explain how is this a device-specific thing? I guess you 
mean something like a device
tree property but that is not what it is used for, it is for hardware 
description of a device and this is not a
HW description but a software feature. Arch information such as DCC 
existing only on ARM64 is already
implied in Kconfig when this driver was merged before. Anyone with an 
ARM64 device with DCC can use
this feature. So anyone be it Mediatek, Qualcomm or any others can use 
this and there is nothing device
specific here. We need something on code level which is why the earlier 
version had build time Kconfig but
you mentioned something about runtime choice, so I modified to use 
module parameter. I will move back
to build time configuration for next version unless you have something 
else in mind when you mean a
runtime choice?

>>   * Use hotplug locks to avoid race between cpu online check and work schedule.
>>   * Remove ifdefs and move to common ops.
>>   * Remove unnecessary check for this configuration.
>>   * Use macros for buf size instead of magic numbers.
>>   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>>
>> Changes in v3:
>>   * Handle case where core0 is not online.
>>
>> Changes in v2:
>>   * Checkpatch warning fixes.
>>   * Use of IS_ENABLED macros instead of ifdefs.
>>
>> ---
>>   drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 174 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
>> index 8e0edb7d93fd..535b09441e55 100644
>> --- a/drivers/tty/hvc/hvc_dcc.c
>> +++ b/drivers/tty/hvc/hvc_dcc.c
>> @@ -2,19 +2,35 @@
>>   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>>   
>>   #include <linux/console.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>>   #include <linux/init.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/moduleparam.h>
>>   #include <linux/serial.h>
>>   #include <linux/serial_core.h>
>> +#include <linux/spinlock.h>
>>   
>>   #include <asm/dcc.h>
>>   #include <asm/processor.h>
>>   
>>   #include "hvc_console.h"
>>   
>> +static bool serialize_smp;
>> +module_param(serialize_smp, bool, 0444);
>> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
>> +
>>   /* DCC Status Bits */
>>   #define DCC_STATUS_RX		(1 << 30)
>>   #define DCC_STATUS_TX		(1 << 29)
>>   
>> +#define DCC_INBUF_SIZE		128
>> +#define DCC_OUTBUF_SIZE		1024
> Why these random sizes?  Why is one bigger than the other?  Why are they
> these specific numbers?

These are input and output kfifo sizes, it is a software construct and 
there is no specification as such.
As per kfifo documentation, size must be a power of 2. As for the sizes, 
IN_BUF size is less assuming
that amount of input data (RX) is usually less when compared to the 
output data (TX ) on the DCC console.
For ex, during boot the output kernel logs on the DCC console would be 
more than the input.
Given DCC console is very slow, we wouldn't want to make the sizes too 
large, hence 1024.
This configuration is well tested for years now and I would like to keep 
these numbers unless someone
else comes with some issue with these sizes.

>> +
>> +static DEFINE_SPINLOCK(dcc_lock);
> What is this locking?  Please document it (didn't checkpatch complain?)

Sure, I will document this and no checkpatch doesn't complain even with 
--strict option.


>> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
>> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
>> +
>>   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>>   {
>>   	while (__dcc_getstatus() & DCC_STATUS_TX)
>> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>>   	return i;
>>   }
>>   
>> +/*
>> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
>> + * then we assume then this function will be called first on core0. That way,
>> + * dcc_core0_available will be true only if it's available on core0.
>> + */
>>   static bool hvc_dcc_check(void)
>>   {
>>   	unsigned long time = jiffies + (HZ / 10);
>> +	static bool dcc_core0_available;
>> +
>> +	/*
>> +	 * If we're not on core 0, but we previously confirmed that DCC is
>> +	 * active, then just return true.
>> +	 */
>> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
> Why are you checking smp_processor_id()?  Are you sure it is safe to do
> that here?

It is to check for non-zero CPU core as mentioned in the comment above 
the check.
On safety, thanks for that, I guess you meant about calling 
smp_processor_id() in preemptible
context bug, so I tested with CONFIG_DEBUG_PREEMPT=y and that is a 
premptible section
and makes my system unbootable. I'll use proper get_cpu() and put_cpu() 
around this check.

>
>
>> +		return true;
>>   
>>   	/* Write a test character to check if it is handled */
>>   	__dcc_putchar('\n');
>>   
>>   	while (time_is_after_jiffies(time)) {
>> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
>> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
>> +			dcc_core0_available = true;
>>   			return true;
>> +		}
> That's a hard busy loop, are you sure it will always exit?

How? Nothing is changed there from before except setting a variable and 
the loop never checks
for that variable.

Thanks,
Sai

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-10 13:56 ` Sai Prakash Ranjan
@ 2022-02-11 12:48   ` Sai Prakash Ranjan
  -1 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-11 12:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman

On 2/10/2022 7:26 PM, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
>
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
>
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.
>
> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.
>
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
>
> Changes in v4:
>   * Use module parameter for runtime choice of enabling this feature.
>   * Use hotplug locks to avoid race between cpu online check and work schedule.
>   * Remove ifdefs and move to common ops.
>   * Remove unnecessary check for this configuration.
>   * Use macros for buf size instead of magic numbers.
>   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>
> Changes in v3:
>   * Handle case where core0 is not online.
>
> Changes in v2:
>   * Checkpatch warning fixes.
>   * Use of IS_ENABLED macros instead of ifdefs.
>
> ---
>   drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>   1 file changed, 174 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..535b09441e55 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,19 +2,35 @@
>   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>   
>   #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>   #include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/moduleparam.h>
>   #include <linux/serial.h>
>   #include <linux/serial_core.h>
> +#include <linux/spinlock.h>
>   
>   #include <asm/dcc.h>
>   #include <asm/processor.h>
>   
>   #include "hvc_console.h"
>   
> +static bool serialize_smp;
> +module_param(serialize_smp, bool, 0444);
> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
> +
>   /* DCC Status Bits */
>   #define DCC_STATUS_RX		(1 << 30)
>   #define DCC_STATUS_TX		(1 << 29)
>   
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024
> +
> +static DEFINE_SPINLOCK(dcc_lock);
> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>   {
>   	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>   	return i;
>   }
>   
> +/*
> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>   static bool hvc_dcc_check(void)
>   {
>   	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
> +		return true;
>   
>   	/* Write a test character to check if it is handled */
>   	__dcc_putchar('\n');
>   
>   	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>   			return true;
> +		}
>   	}
>   
>   	return false;
>   }
>   
> +/*
> + * Workqueue function that writes the output FIFO to the DCC on core 0.
> + */
> +static void dcc_put_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	/* While there's data in the output FIFO, write it to the DCC */
> +	while (kfifo_get(&outbuf, &ch))
> +		hvc_dcc_put_chars(0, &ch, 1);
> +
> +	/* While we're at it, check for any input characters */
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
> +
> +/*
> + * Workqueue function that reads characters from DCC and puts them into the
> + * input FIFO.
> + */
> +static void dcc_get_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	/*
> +	 * Read characters from DCC and put them into the input FIFO, as
> +	 * long as there is room and we have characters to read.
> +	 */
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
> +
> +/*
> + * Write characters directly to the DCC if we're on core 0 and the FIFO
> + * is empty, or write them to the FIFO if we're not.
> + */
> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_put_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
> +		len = kfifo_in(&outbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * We just push data to the output FIFO, so schedule the
> +		 * workqueue that will actually write that data to DCC.
> +		 * Also take a CPU hotplug lock to avoid CPU going down
> +		 * between the check and scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (cpu_online(0))
> +			schedule_work_on(0, &dcc_pwork);
> +
> +		cpus_read_unlock();
> +

This is a bug, I ran with lock debug configs enabled and apparently this 
runs in atomic context and
cpus_read_lock/unlock() can sleep. Will remove these locks in next version.

> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * write the data to DCC.
> +	 */
> +	len = hvc_dcc_put_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
> +/*
> + * Read characters directly from the DCC if we're on core 0 and the FIFO
> + * is empty, or read them from the FIFO if we're not.
> + */
> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_get_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
> +		len = kfifo_out(&inbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * If the FIFO was empty, there may be characters in the DCC
> +		 * that we haven't read yet.  Schedule a workqueue to fill
> +		 * the input FIFO, so that the next time this function is
> +		 * called, we'll have data. Take a CPU hotplug lock as well
> +		 * to avoid CPU going down between the cpu online check and
> +		 * scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (!len && cpu_online(0))
> +			schedule_work_on(0, &dcc_gwork);
> +
> +		cpus_read_unlock();
> +

Same as above.

Thanks,
Sai

> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * read the data from DCC.
> +	 */
> +	len = hvc_dcc_get_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
>   static const struct hv_ops hvc_dcc_get_put_ops = {
> -	.get_chars = hvc_dcc_get_chars,
> -	.put_chars = hvc_dcc_put_chars,
> +	.get_chars = hvc_dcc0_get_chars,
> +	.put_chars = hvc_dcc0_put_chars,
>   };
>   
>   static int __init hvc_dcc_console_init(void)
>
> base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-11 12:48   ` Sai Prakash Ranjan
  0 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-11 12:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, Elliot Berman, linux-arm-kernel, linux-arm-msm,
	linux-kernel, Shanker Donthineni, Adam Wallis, Timur Tabi,
	Elliot Berman

On 2/10/2022 7:26 PM, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
>
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
>
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.
>
> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.
>
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
>
> Changes in v4:
>   * Use module parameter for runtime choice of enabling this feature.
>   * Use hotplug locks to avoid race between cpu online check and work schedule.
>   * Remove ifdefs and move to common ops.
>   * Remove unnecessary check for this configuration.
>   * Use macros for buf size instead of magic numbers.
>   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>
> Changes in v3:
>   * Handle case where core0 is not online.
>
> Changes in v2:
>   * Checkpatch warning fixes.
>   * Use of IS_ENABLED macros instead of ifdefs.
>
> ---
>   drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>   1 file changed, 174 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..535b09441e55 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,19 +2,35 @@
>   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>   
>   #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>   #include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/moduleparam.h>
>   #include <linux/serial.h>
>   #include <linux/serial_core.h>
> +#include <linux/spinlock.h>
>   
>   #include <asm/dcc.h>
>   #include <asm/processor.h>
>   
>   #include "hvc_console.h"
>   
> +static bool serialize_smp;
> +module_param(serialize_smp, bool, 0444);
> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
> +
>   /* DCC Status Bits */
>   #define DCC_STATUS_RX		(1 << 30)
>   #define DCC_STATUS_TX		(1 << 29)
>   
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024
> +
> +static DEFINE_SPINLOCK(dcc_lock);
> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>   {
>   	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>   	return i;
>   }
>   
> +/*
> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>   static bool hvc_dcc_check(void)
>   {
>   	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
> +		return true;
>   
>   	/* Write a test character to check if it is handled */
>   	__dcc_putchar('\n');
>   
>   	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>   			return true;
> +		}
>   	}
>   
>   	return false;
>   }
>   
> +/*
> + * Workqueue function that writes the output FIFO to the DCC on core 0.
> + */
> +static void dcc_put_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	/* While there's data in the output FIFO, write it to the DCC */
> +	while (kfifo_get(&outbuf, &ch))
> +		hvc_dcc_put_chars(0, &ch, 1);
> +
> +	/* While we're at it, check for any input characters */
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
> +
> +/*
> + * Workqueue function that reads characters from DCC and puts them into the
> + * input FIFO.
> + */
> +static void dcc_get_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	/*
> +	 * Read characters from DCC and put them into the input FIFO, as
> +	 * long as there is room and we have characters to read.
> +	 */
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
> +
> +/*
> + * Write characters directly to the DCC if we're on core 0 and the FIFO
> + * is empty, or write them to the FIFO if we're not.
> + */
> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_put_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
> +		len = kfifo_in(&outbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * We just push data to the output FIFO, so schedule the
> +		 * workqueue that will actually write that data to DCC.
> +		 * Also take a CPU hotplug lock to avoid CPU going down
> +		 * between the check and scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (cpu_online(0))
> +			schedule_work_on(0, &dcc_pwork);
> +
> +		cpus_read_unlock();
> +

This is a bug, I ran with lock debug configs enabled and apparently this 
runs in atomic context and
cpus_read_lock/unlock() can sleep. Will remove these locks in next version.

> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * write the data to DCC.
> +	 */
> +	len = hvc_dcc_put_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
> +/*
> + * Read characters directly from the DCC if we're on core 0 and the FIFO
> + * is empty, or read them from the FIFO if we're not.
> + */
> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_get_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
> +		len = kfifo_out(&inbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * If the FIFO was empty, there may be characters in the DCC
> +		 * that we haven't read yet.  Schedule a workqueue to fill
> +		 * the input FIFO, so that the next time this function is
> +		 * called, we'll have data. Take a CPU hotplug lock as well
> +		 * to avoid CPU going down between the cpu online check and
> +		 * scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (!len && cpu_online(0))
> +			schedule_work_on(0, &dcc_gwork);
> +
> +		cpus_read_unlock();
> +

Same as above.

Thanks,
Sai

> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * read the data from DCC.
> +	 */
> +	len = hvc_dcc_get_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
>   static const struct hv_ops hvc_dcc_get_put_ops = {
> -	.get_chars = hvc_dcc_get_chars,
> -	.put_chars = hvc_dcc_put_chars,
> +	.get_chars = hvc_dcc0_get_chars,
> +	.put_chars = hvc_dcc0_put_chars,
>   };
>   
>   static int __init hvc_dcc_console_init(void)
>
> base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-10 13:56 ` Sai Prakash Ranjan
@ 2022-02-14 15:16   ` Mark Rutland
  -1 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2022-02-14 15:16 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Greg Kroah-Hartman, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel, Shanker Donthineni, Adam Wallis,
	Timur Tabi, Elliot Berman

Hi,

On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
> 
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.

This has been the Linux behaviour since the dawn of time, so why is this not
considered to be a bug in the tools? Why can't Lauterbach add an option to
treat the cores as one?

Importantly, with hotplug we *cannot* guarantee that all messages will go to
the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
general we cann't provide a guarantee here.

> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.

What is 'core 0'?

Do you actually need a *specific* PE to be used, or just some singular PE?

What happens with hotplug, as above?

Do you need to inihibit that?

Thanks,
Mark.

> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
> 
> Changes in v4:
>  * Use module parameter for runtime choice of enabling this feature.
>  * Use hotplug locks to avoid race between cpu online check and work schedule.
>  * Remove ifdefs and move to common ops.
>  * Remove unnecessary check for this configuration.
>  * Use macros for buf size instead of magic numbers.
>  * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
> 
> Changes in v3:
>  * Handle case where core0 is not online.
> 
> Changes in v2:
>  * Checkpatch warning fixes.
>  * Use of IS_ENABLED macros instead of ifdefs.
> 
> ---
>  drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 174 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..535b09441e55 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,19 +2,35 @@
>  /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>  
>  #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/moduleparam.h>
>  #include <linux/serial.h>
>  #include <linux/serial_core.h>
> +#include <linux/spinlock.h>
>  
>  #include <asm/dcc.h>
>  #include <asm/processor.h>
>  
>  #include "hvc_console.h"
>  
> +static bool serialize_smp;
> +module_param(serialize_smp, bool, 0444);
> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
> +
>  /* DCC Status Bits */
>  #define DCC_STATUS_RX		(1 << 30)
>  #define DCC_STATUS_TX		(1 << 29)
>  
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024
> +
> +static DEFINE_SPINLOCK(dcc_lock);
> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>  static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>  {
>  	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>  	return i;
>  }
>  
> +/*
> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>  static bool hvc_dcc_check(void)
>  {
>  	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
> +		return true;
>  
>  	/* Write a test character to check if it is handled */
>  	__dcc_putchar('\n');
>  
>  	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>  			return true;
> +		}
>  	}
>  
>  	return false;
>  }
>  
> +/*
> + * Workqueue function that writes the output FIFO to the DCC on core 0.
> + */
> +static void dcc_put_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	/* While there's data in the output FIFO, write it to the DCC */
> +	while (kfifo_get(&outbuf, &ch))
> +		hvc_dcc_put_chars(0, &ch, 1);
> +
> +	/* While we're at it, check for any input characters */
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
> +
> +/*
> + * Workqueue function that reads characters from DCC and puts them into the
> + * input FIFO.
> + */
> +static void dcc_get_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	/*
> +	 * Read characters from DCC and put them into the input FIFO, as
> +	 * long as there is room and we have characters to read.
> +	 */
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
> +
> +/*
> + * Write characters directly to the DCC if we're on core 0 and the FIFO
> + * is empty, or write them to the FIFO if we're not.
> + */
> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_put_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
> +		len = kfifo_in(&outbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * We just push data to the output FIFO, so schedule the
> +		 * workqueue that will actually write that data to DCC.
> +		 * Also take a CPU hotplug lock to avoid CPU going down
> +		 * between the check and scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (cpu_online(0))
> +			schedule_work_on(0, &dcc_pwork);
> +
> +		cpus_read_unlock();
> +
> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * write the data to DCC.
> +	 */
> +	len = hvc_dcc_put_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
> +/*
> + * Read characters directly from the DCC if we're on core 0 and the FIFO
> + * is empty, or read them from the FIFO if we're not.
> + */
> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_get_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
> +		len = kfifo_out(&inbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * If the FIFO was empty, there may be characters in the DCC
> +		 * that we haven't read yet.  Schedule a workqueue to fill
> +		 * the input FIFO, so that the next time this function is
> +		 * called, we'll have data. Take a CPU hotplug lock as well
> +		 * to avoid CPU going down between the cpu online check and
> +		 * scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (!len && cpu_online(0))
> +			schedule_work_on(0, &dcc_gwork);
> +
> +		cpus_read_unlock();
> +
> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * read the data from DCC.
> +	 */
> +	len = hvc_dcc_get_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
>  static const struct hv_ops hvc_dcc_get_put_ops = {
> -	.get_chars = hvc_dcc_get_chars,
> -	.put_chars = hvc_dcc_put_chars,
> +	.get_chars = hvc_dcc0_get_chars,
> +	.put_chars = hvc_dcc0_put_chars,
>  };
>  
>  static int __init hvc_dcc_console_init(void)
> 
> base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572
> -- 
> 2.33.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-14 15:16   ` Mark Rutland
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2022-02-14 15:16 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Greg Kroah-Hartman, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel, Shanker Donthineni, Adam Wallis,
	Timur Tabi, Elliot Berman

Hi,

On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
> 
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.

This has been the Linux behaviour since the dawn of time, so why is this not
considered to be a bug in the tools? Why can't Lauterbach add an option to
treat the cores as one?

Importantly, with hotplug we *cannot* guarantee that all messages will go to
the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
general we cann't provide a guarantee here.

> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.

What is 'core 0'?

Do you actually need a *specific* PE to be used, or just some singular PE?

What happens with hotplug, as above?

Do you need to inihibit that?

Thanks,
Mark.

> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
> 
> Changes in v4:
>  * Use module parameter for runtime choice of enabling this feature.
>  * Use hotplug locks to avoid race between cpu online check and work schedule.
>  * Remove ifdefs and move to common ops.
>  * Remove unnecessary check for this configuration.
>  * Use macros for buf size instead of magic numbers.
>  * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
> 
> Changes in v3:
>  * Handle case where core0 is not online.
> 
> Changes in v2:
>  * Checkpatch warning fixes.
>  * Use of IS_ENABLED macros instead of ifdefs.
> 
> ---
>  drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 174 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..535b09441e55 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,19 +2,35 @@
>  /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>  
>  #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/init.h>
> +#include <linux/kfifo.h>
> +#include <linux/moduleparam.h>
>  #include <linux/serial.h>
>  #include <linux/serial_core.h>
> +#include <linux/spinlock.h>
>  
>  #include <asm/dcc.h>
>  #include <asm/processor.h>
>  
>  #include "hvc_console.h"
>  
> +static bool serialize_smp;
> +module_param(serialize_smp, bool, 0444);
> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
> +
>  /* DCC Status Bits */
>  #define DCC_STATUS_RX		(1 << 30)
>  #define DCC_STATUS_TX		(1 << 29)
>  
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024
> +
> +static DEFINE_SPINLOCK(dcc_lock);
> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>  static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>  {
>  	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>  	return i;
>  }
>  
> +/*
> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>  static bool hvc_dcc_check(void)
>  {
>  	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
> +		return true;
>  
>  	/* Write a test character to check if it is handled */
>  	__dcc_putchar('\n');
>  
>  	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>  			return true;
> +		}
>  	}
>  
>  	return false;
>  }
>  
> +/*
> + * Workqueue function that writes the output FIFO to the DCC on core 0.
> + */
> +static void dcc_put_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	/* While there's data in the output FIFO, write it to the DCC */
> +	while (kfifo_get(&outbuf, &ch))
> +		hvc_dcc_put_chars(0, &ch, 1);
> +
> +	/* While we're at it, check for any input characters */
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
> +
> +/*
> + * Workqueue function that reads characters from DCC and puts them into the
> + * input FIFO.
> + */
> +static void dcc_get_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	/*
> +	 * Read characters from DCC and put them into the input FIFO, as
> +	 * long as there is room and we have characters to read.
> +	 */
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
> +
> +/*
> + * Write characters directly to the DCC if we're on core 0 and the FIFO
> + * is empty, or write them to the FIFO if we're not.
> + */
> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_put_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
> +		len = kfifo_in(&outbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * We just push data to the output FIFO, so schedule the
> +		 * workqueue that will actually write that data to DCC.
> +		 * Also take a CPU hotplug lock to avoid CPU going down
> +		 * between the check and scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (cpu_online(0))
> +			schedule_work_on(0, &dcc_pwork);
> +
> +		cpus_read_unlock();
> +
> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * write the data to DCC.
> +	 */
> +	len = hvc_dcc_put_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
> +/*
> + * Read characters directly from the DCC if we're on core 0 and the FIFO
> + * is empty, or read them from the FIFO if we're not.
> + */
> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!serialize_smp)
> +		return hvc_dcc_get_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
> +		len = kfifo_out(&inbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * If the FIFO was empty, there may be characters in the DCC
> +		 * that we haven't read yet.  Schedule a workqueue to fill
> +		 * the input FIFO, so that the next time this function is
> +		 * called, we'll have data. Take a CPU hotplug lock as well
> +		 * to avoid CPU going down between the cpu online check and
> +		 * scheduling work on CPU0.
> +		 */
> +		cpus_read_lock();
> +
> +		if (!len && cpu_online(0))
> +			schedule_work_on(0, &dcc_gwork);
> +
> +		cpus_read_unlock();
> +
> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * read the data from DCC.
> +	 */
> +	len = hvc_dcc_get_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
>  static const struct hv_ops hvc_dcc_get_put_ops = {
> -	.get_chars = hvc_dcc_get_chars,
> -	.put_chars = hvc_dcc_put_chars,
> +	.get_chars = hvc_dcc0_get_chars,
> +	.put_chars = hvc_dcc0_put_chars,
>  };
>  
>  static int __init hvc_dcc_console_init(void)
> 
> base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572
> -- 
> 2.33.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-14 15:16   ` Mark Rutland
@ 2022-02-15  4:03     ` Sai Prakash Ranjan
  -1 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-15  4:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Greg Kroah-Hartman, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel, Shanker Donthineni, Adam Wallis,
	Timur Tabi, Elliot Berman

Hi Mark,

On 2/14/2022 8:46 PM, Mark Rutland wrote:
> Hi,
>
> On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>
>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>> reads/writes from/to DCC on secondary cores. Each core has its
>> own DCC device registers, so when a core reads or writes from/to DCC,
>> it only accesses its own DCC device. Since kernel code can run on
>> any core, every time the kernel wants to write to the console, it
>> might write to a different DCC.
>>
>> In SMP mode, Trace32 creates multiple windows, and each window shows
>> the DCC output only from that core's DCC. The result is that console
>> output is either lost or scattered across windows.
> This has been the Linux behaviour since the dawn of time, so why is this not
> considered to be a bug in the tools? Why can't Lauterbach add an option to
> treat the cores as one?

More like a feature request than a bug? And why would tools add such a 
feature when
it is the kernel which runs in SMP mode? Shouldn't kernel be the one 
having such a feature
because there would be number of such tools with the same issue and we 
can't send a feature
request to all those tool vendors to add this feature right. Instead 
adding this in the kernel would
avoid all these centrally at one place.

> Importantly, with hotplug we *cannot* guarantee that all messages will go to
> the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
> general we cann't provide a guarantee here.

Right that is true, in case of CPU hotplug this would be pretty much 
broken if CPU0 is offlined.
We use these during initial bringup stage of SoCs when we don't have 
debug UART console up and running
and at the time we don't much care for testing out hotplugging the CPUs 
and let alone trying out
to offline CPU0 which we use and shoot our own foot :)

Given this is mostly a debug feature, we don't mind if this doesn't 
guarantee to work in hotplug scenario.
I did try to make this depend on !HOTPLUG_CPU but it looks like that 
config is so tangled into CPU_PM and
others that it can't be independently disabled without disabling a whole 
lot of other configs.

>> Selecting this option will enable code that serializes all console
>> input and output to core 0. The DCC driver will create input and
>> output FIFOs that all cores will use. Reads and writes from/to DCC
>> are handled by a workqueue that runs only core 0.
> What is 'core 0'?

I mean CPU 0 here.

> Do you actually need a *specific* PE to be used, or just some singular PE?

We just need some singular PE.

> What happens with hotplug, as above?
>
> Do you need to inihibit that?

Please look at my reply above. We don't prevent it currently and just 
make sure that
CPU0 is online so that we don't schedule work on that CPU if it is offline.

Thanks,
Sai

> Thanks,
> Mark.
>
>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>> Acked-by: Adam Wallis <awallis@codeaurora.org>
>> Signed-off-by: Timur Tabi <timur@codeaurora.org>
>> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
>> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
>> ---
>>
>> Changes in v4:
>>   * Use module parameter for runtime choice of enabling this feature.
>>   * Use hotplug locks to avoid race between cpu online check and work schedule.
>>   * Remove ifdefs and move to common ops.
>>   * Remove unnecessary check for this configuration.
>>   * Use macros for buf size instead of magic numbers.
>>   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>>
>> Changes in v3:
>>   * Handle case where core0 is not online.
>>
>> Changes in v2:
>>   * Checkpatch warning fixes.
>>   * Use of IS_ENABLED macros instead of ifdefs.
>>
>> ---
>>   drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 174 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
>> index 8e0edb7d93fd..535b09441e55 100644
>> --- a/drivers/tty/hvc/hvc_dcc.c
>> +++ b/drivers/tty/hvc/hvc_dcc.c
>> @@ -2,19 +2,35 @@
>>   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>>   
>>   #include <linux/console.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>>   #include <linux/init.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/moduleparam.h>
>>   #include <linux/serial.h>
>>   #include <linux/serial_core.h>
>> +#include <linux/spinlock.h>
>>   
>>   #include <asm/dcc.h>
>>   #include <asm/processor.h>
>>   
>>   #include "hvc_console.h"
>>   
>> +static bool serialize_smp;
>> +module_param(serialize_smp, bool, 0444);
>> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
>> +
>>   /* DCC Status Bits */
>>   #define DCC_STATUS_RX		(1 << 30)
>>   #define DCC_STATUS_TX		(1 << 29)
>>   
>> +#define DCC_INBUF_SIZE		128
>> +#define DCC_OUTBUF_SIZE		1024
>> +
>> +static DEFINE_SPINLOCK(dcc_lock);
>> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
>> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
>> +
>>   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>>   {
>>   	while (__dcc_getstatus() & DCC_STATUS_TX)
>> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>>   	return i;
>>   }
>>   
>> +/*
>> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
>> + * then we assume then this function will be called first on core0. That way,
>> + * dcc_core0_available will be true only if it's available on core0.
>> + */
>>   static bool hvc_dcc_check(void)
>>   {
>>   	unsigned long time = jiffies + (HZ / 10);
>> +	static bool dcc_core0_available;
>> +
>> +	/*
>> +	 * If we're not on core 0, but we previously confirmed that DCC is
>> +	 * active, then just return true.
>> +	 */
>> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
>> +		return true;
>>   
>>   	/* Write a test character to check if it is handled */
>>   	__dcc_putchar('\n');
>>   
>>   	while (time_is_after_jiffies(time)) {
>> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
>> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
>> +			dcc_core0_available = true;
>>   			return true;
>> +		}
>>   	}
>>   
>>   	return false;
>>   }
>>   
>> +/*
>> + * Workqueue function that writes the output FIFO to the DCC on core 0.
>> + */
>> +static void dcc_put_work(struct work_struct *work)
>> +{
>> +	unsigned char ch;
>> +	unsigned long irqflags;
>> +
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +
>> +	/* While there's data in the output FIFO, write it to the DCC */
>> +	while (kfifo_get(&outbuf, &ch))
>> +		hvc_dcc_put_chars(0, &ch, 1);
>> +
>> +	/* While we're at it, check for any input characters */
>> +	while (!kfifo_is_full(&inbuf)) {
>> +		if (!hvc_dcc_get_chars(0, &ch, 1))
>> +			break;
>> +		kfifo_put(&inbuf, ch);
>> +	}
>> +
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +}
>> +
>> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
>> +
>> +/*
>> + * Workqueue function that reads characters from DCC and puts them into the
>> + * input FIFO.
>> + */
>> +static void dcc_get_work(struct work_struct *work)
>> +{
>> +	unsigned char ch;
>> +	unsigned long irqflags;
>> +
>> +	/*
>> +	 * Read characters from DCC and put them into the input FIFO, as
>> +	 * long as there is room and we have characters to read.
>> +	 */
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +
>> +	while (!kfifo_is_full(&inbuf)) {
>> +		if (!hvc_dcc_get_chars(0, &ch, 1))
>> +			break;
>> +		kfifo_put(&inbuf, ch);
>> +	}
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +}
>> +
>> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
>> +
>> +/*
>> + * Write characters directly to the DCC if we're on core 0 and the FIFO
>> + * is empty, or write them to the FIFO if we're not.
>> + */
>> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
>> +{
>> +	int len;
>> +	unsigned long irqflags;
>> +
>> +	if (!serialize_smp)
>> +		return hvc_dcc_put_chars(vt, buf, count);
>> +
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
>> +		len = kfifo_in(&outbuf, buf, count);
>> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +		/*
>> +		 * We just push data to the output FIFO, so schedule the
>> +		 * workqueue that will actually write that data to DCC.
>> +		 * Also take a CPU hotplug lock to avoid CPU going down
>> +		 * between the check and scheduling work on CPU0.
>> +		 */
>> +		cpus_read_lock();
>> +
>> +		if (cpu_online(0))
>> +			schedule_work_on(0, &dcc_pwork);
>> +
>> +		cpus_read_unlock();
>> +
>> +		return len;
>> +	}
>> +
>> +	/*
>> +	 * If we're already on core 0, and the FIFO is empty, then just
>> +	 * write the data to DCC.
>> +	 */
>> +	len = hvc_dcc_put_chars(vt, buf, count);
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +	return len;
>> +}
>> +
>> +/*
>> + * Read characters directly from the DCC if we're on core 0 and the FIFO
>> + * is empty, or read them from the FIFO if we're not.
>> + */
>> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
>> +{
>> +	int len;
>> +	unsigned long irqflags;
>> +
>> +	if (!serialize_smp)
>> +		return hvc_dcc_get_chars(vt, buf, count);
>> +
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +
>> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
>> +		len = kfifo_out(&inbuf, buf, count);
>> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +		/*
>> +		 * If the FIFO was empty, there may be characters in the DCC
>> +		 * that we haven't read yet.  Schedule a workqueue to fill
>> +		 * the input FIFO, so that the next time this function is
>> +		 * called, we'll have data. Take a CPU hotplug lock as well
>> +		 * to avoid CPU going down between the cpu online check and
>> +		 * scheduling work on CPU0.
>> +		 */
>> +		cpus_read_lock();
>> +
>> +		if (!len && cpu_online(0))
>> +			schedule_work_on(0, &dcc_gwork);
>> +
>> +		cpus_read_unlock();
>> +
>> +		return len;
>> +	}
>> +
>> +	/*
>> +	 * If we're already on core 0, and the FIFO is empty, then just
>> +	 * read the data from DCC.
>> +	 */
>> +	len = hvc_dcc_get_chars(vt, buf, count);
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +	return len;
>> +}
>> +
>>   static const struct hv_ops hvc_dcc_get_put_ops = {
>> -	.get_chars = hvc_dcc_get_chars,
>> -	.put_chars = hvc_dcc_put_chars,
>> +	.get_chars = hvc_dcc0_get_chars,
>> +	.put_chars = hvc_dcc0_put_chars,
>>   };
>>   
>>   static int __init hvc_dcc_console_init(void)
>>
>> base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572
>> -- 
>> 2.33.1
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-15  4:03     ` Sai Prakash Ranjan
  0 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-15  4:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Greg Kroah-Hartman, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel, Shanker Donthineni, Adam Wallis,
	Timur Tabi, Elliot Berman

Hi Mark,

On 2/14/2022 8:46 PM, Mark Rutland wrote:
> Hi,
>
> On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>
>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>> reads/writes from/to DCC on secondary cores. Each core has its
>> own DCC device registers, so when a core reads or writes from/to DCC,
>> it only accesses its own DCC device. Since kernel code can run on
>> any core, every time the kernel wants to write to the console, it
>> might write to a different DCC.
>>
>> In SMP mode, Trace32 creates multiple windows, and each window shows
>> the DCC output only from that core's DCC. The result is that console
>> output is either lost or scattered across windows.
> This has been the Linux behaviour since the dawn of time, so why is this not
> considered to be a bug in the tools? Why can't Lauterbach add an option to
> treat the cores as one?

More like a feature request than a bug? And why would tools add such a 
feature when
it is the kernel which runs in SMP mode? Shouldn't kernel be the one 
having such a feature
because there would be number of such tools with the same issue and we 
can't send a feature
request to all those tool vendors to add this feature right. Instead 
adding this in the kernel would
avoid all these centrally at one place.

> Importantly, with hotplug we *cannot* guarantee that all messages will go to
> the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
> general we cann't provide a guarantee here.

Right that is true, in case of CPU hotplug this would be pretty much 
broken if CPU0 is offlined.
We use these during initial bringup stage of SoCs when we don't have 
debug UART console up and running
and at the time we don't much care for testing out hotplugging the CPUs 
and let alone trying out
to offline CPU0 which we use and shoot our own foot :)

Given this is mostly a debug feature, we don't mind if this doesn't 
guarantee to work in hotplug scenario.
I did try to make this depend on !HOTPLUG_CPU but it looks like that 
config is so tangled into CPU_PM and
others that it can't be independently disabled without disabling a whole 
lot of other configs.

>> Selecting this option will enable code that serializes all console
>> input and output to core 0. The DCC driver will create input and
>> output FIFOs that all cores will use. Reads and writes from/to DCC
>> are handled by a workqueue that runs only core 0.
> What is 'core 0'?

I mean CPU 0 here.

> Do you actually need a *specific* PE to be used, or just some singular PE?

We just need some singular PE.

> What happens with hotplug, as above?
>
> Do you need to inihibit that?

Please look at my reply above. We don't prevent it currently and just 
make sure that
CPU0 is online so that we don't schedule work on that CPU if it is offline.

Thanks,
Sai

> Thanks,
> Mark.
>
>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>> Acked-by: Adam Wallis <awallis@codeaurora.org>
>> Signed-off-by: Timur Tabi <timur@codeaurora.org>
>> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
>> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
>> ---
>>
>> Changes in v4:
>>   * Use module parameter for runtime choice of enabling this feature.
>>   * Use hotplug locks to avoid race between cpu online check and work schedule.
>>   * Remove ifdefs and move to common ops.
>>   * Remove unnecessary check for this configuration.
>>   * Use macros for buf size instead of magic numbers.
>>   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>>
>> Changes in v3:
>>   * Handle case where core0 is not online.
>>
>> Changes in v2:
>>   * Checkpatch warning fixes.
>>   * Use of IS_ENABLED macros instead of ifdefs.
>>
>> ---
>>   drivers/tty/hvc/hvc_dcc.c | 177 +++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 174 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
>> index 8e0edb7d93fd..535b09441e55 100644
>> --- a/drivers/tty/hvc/hvc_dcc.c
>> +++ b/drivers/tty/hvc/hvc_dcc.c
>> @@ -2,19 +2,35 @@
>>   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
>>   
>>   #include <linux/console.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>>   #include <linux/init.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/moduleparam.h>
>>   #include <linux/serial.h>
>>   #include <linux/serial_core.h>
>> +#include <linux/spinlock.h>
>>   
>>   #include <asm/dcc.h>
>>   #include <asm/processor.h>
>>   
>>   #include "hvc_console.h"
>>   
>> +static bool serialize_smp;
>> +module_param(serialize_smp, bool, 0444);
>> +MODULE_PARM_DESC(serialize_smp, "Serialize all DCC console input and output to CPU core 0");
>> +
>>   /* DCC Status Bits */
>>   #define DCC_STATUS_RX		(1 << 30)
>>   #define DCC_STATUS_TX		(1 << 29)
>>   
>> +#define DCC_INBUF_SIZE		128
>> +#define DCC_OUTBUF_SIZE		1024
>> +
>> +static DEFINE_SPINLOCK(dcc_lock);
>> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
>> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
>> +
>>   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>>   {
>>   	while (__dcc_getstatus() & DCC_STATUS_TX)
>> @@ -67,24 +83,179 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>>   	return i;
>>   }
>>   
>> +/*
>> + * Check if the DCC is enabled. If serialize_smp module param is enabled,
>> + * then we assume then this function will be called first on core0. That way,
>> + * dcc_core0_available will be true only if it's available on core0.
>> + */
>>   static bool hvc_dcc_check(void)
>>   {
>>   	unsigned long time = jiffies + (HZ / 10);
>> +	static bool dcc_core0_available;
>> +
>> +	/*
>> +	 * If we're not on core 0, but we previously confirmed that DCC is
>> +	 * active, then just return true.
>> +	 */
>> +	if (serialize_smp && smp_processor_id() && dcc_core0_available)
>> +		return true;
>>   
>>   	/* Write a test character to check if it is handled */
>>   	__dcc_putchar('\n');
>>   
>>   	while (time_is_after_jiffies(time)) {
>> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
>> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
>> +			dcc_core0_available = true;
>>   			return true;
>> +		}
>>   	}
>>   
>>   	return false;
>>   }
>>   
>> +/*
>> + * Workqueue function that writes the output FIFO to the DCC on core 0.
>> + */
>> +static void dcc_put_work(struct work_struct *work)
>> +{
>> +	unsigned char ch;
>> +	unsigned long irqflags;
>> +
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +
>> +	/* While there's data in the output FIFO, write it to the DCC */
>> +	while (kfifo_get(&outbuf, &ch))
>> +		hvc_dcc_put_chars(0, &ch, 1);
>> +
>> +	/* While we're at it, check for any input characters */
>> +	while (!kfifo_is_full(&inbuf)) {
>> +		if (!hvc_dcc_get_chars(0, &ch, 1))
>> +			break;
>> +		kfifo_put(&inbuf, ch);
>> +	}
>> +
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +}
>> +
>> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
>> +
>> +/*
>> + * Workqueue function that reads characters from DCC and puts them into the
>> + * input FIFO.
>> + */
>> +static void dcc_get_work(struct work_struct *work)
>> +{
>> +	unsigned char ch;
>> +	unsigned long irqflags;
>> +
>> +	/*
>> +	 * Read characters from DCC and put them into the input FIFO, as
>> +	 * long as there is room and we have characters to read.
>> +	 */
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +
>> +	while (!kfifo_is_full(&inbuf)) {
>> +		if (!hvc_dcc_get_chars(0, &ch, 1))
>> +			break;
>> +		kfifo_put(&inbuf, ch);
>> +	}
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +}
>> +
>> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
>> +
>> +/*
>> + * Write characters directly to the DCC if we're on core 0 and the FIFO
>> + * is empty, or write them to the FIFO if we're not.
>> + */
>> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
>> +{
>> +	int len;
>> +	unsigned long irqflags;
>> +
>> +	if (!serialize_smp)
>> +		return hvc_dcc_put_chars(vt, buf, count);
>> +
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
>> +		len = kfifo_in(&outbuf, buf, count);
>> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +		/*
>> +		 * We just push data to the output FIFO, so schedule the
>> +		 * workqueue that will actually write that data to DCC.
>> +		 * Also take a CPU hotplug lock to avoid CPU going down
>> +		 * between the check and scheduling work on CPU0.
>> +		 */
>> +		cpus_read_lock();
>> +
>> +		if (cpu_online(0))
>> +			schedule_work_on(0, &dcc_pwork);
>> +
>> +		cpus_read_unlock();
>> +
>> +		return len;
>> +	}
>> +
>> +	/*
>> +	 * If we're already on core 0, and the FIFO is empty, then just
>> +	 * write the data to DCC.
>> +	 */
>> +	len = hvc_dcc_put_chars(vt, buf, count);
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +	return len;
>> +}
>> +
>> +/*
>> + * Read characters directly from the DCC if we're on core 0 and the FIFO
>> + * is empty, or read them from the FIFO if we're not.
>> + */
>> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
>> +{
>> +	int len;
>> +	unsigned long irqflags;
>> +
>> +	if (!serialize_smp)
>> +		return hvc_dcc_get_chars(vt, buf, count);
>> +
>> +	spin_lock_irqsave(&dcc_lock, irqflags);
>> +
>> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
>> +		len = kfifo_out(&inbuf, buf, count);
>> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +		/*
>> +		 * If the FIFO was empty, there may be characters in the DCC
>> +		 * that we haven't read yet.  Schedule a workqueue to fill
>> +		 * the input FIFO, so that the next time this function is
>> +		 * called, we'll have data. Take a CPU hotplug lock as well
>> +		 * to avoid CPU going down between the cpu online check and
>> +		 * scheduling work on CPU0.
>> +		 */
>> +		cpus_read_lock();
>> +
>> +		if (!len && cpu_online(0))
>> +			schedule_work_on(0, &dcc_gwork);
>> +
>> +		cpus_read_unlock();
>> +
>> +		return len;
>> +	}
>> +
>> +	/*
>> +	 * If we're already on core 0, and the FIFO is empty, then just
>> +	 * read the data from DCC.
>> +	 */
>> +	len = hvc_dcc_get_chars(vt, buf, count);
>> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
>> +
>> +	return len;
>> +}
>> +
>>   static const struct hv_ops hvc_dcc_get_put_ops = {
>> -	.get_chars = hvc_dcc_get_chars,
>> -	.put_chars = hvc_dcc_put_chars,
>> +	.get_chars = hvc_dcc0_get_chars,
>> +	.put_chars = hvc_dcc0_put_chars,
>>   };
>>   
>>   static int __init hvc_dcc_console_init(void)
>>
>> base-commit: 395a61741f7ea29e1f4a0d6e160197fe8e377572
>> -- 
>> 2.33.1
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-15  4:03     ` Sai Prakash Ranjan
@ 2022-02-21 18:29       ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-21 18:29 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Mark Rutland, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel, Shanker Donthineni, Adam Wallis,
	Timur Tabi, Elliot Berman

On Tue, Feb 15, 2022 at 09:33:23AM +0530, Sai Prakash Ranjan wrote:
> Hi Mark,
> 
> On 2/14/2022 8:46 PM, Mark Rutland wrote:
> > Hi,
> > 
> > On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
> > > From: Shanker Donthineni <shankerd@codeaurora.org>
> > > 
> > > Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> > > reads/writes from/to DCC on secondary cores. Each core has its
> > > own DCC device registers, so when a core reads or writes from/to DCC,
> > > it only accesses its own DCC device. Since kernel code can run on
> > > any core, every time the kernel wants to write to the console, it
> > > might write to a different DCC.
> > > 
> > > In SMP mode, Trace32 creates multiple windows, and each window shows
> > > the DCC output only from that core's DCC. The result is that console
> > > output is either lost or scattered across windows.
> > This has been the Linux behaviour since the dawn of time, so why is this not
> > considered to be a bug in the tools? Why can't Lauterbach add an option to
> > treat the cores as one?
> 
> More like a feature request than a bug? And why would tools add such a
> feature when
> it is the kernel which runs in SMP mode? Shouldn't kernel be the one having
> such a feature
> because there would be number of such tools with the same issue and we can't
> send a feature
> request to all those tool vendors to add this feature right. Instead adding
> this in the kernel would
> avoid all these centrally at one place.

Please fix this in userspace.

> > Importantly, with hotplug we *cannot* guarantee that all messages will go to
> > the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
> > general we cann't provide a guarantee here.
> 
> Right that is true, in case of CPU hotplug this would be pretty much broken
> if CPU0 is offlined.
> We use these during initial bringup stage of SoCs when we don't have debug
> UART console up and running
> and at the time we don't much care for testing out hotplugging the CPUs and
> let alone trying out
> to offline CPU0 which we use and shoot our own foot :)
> 
> Given this is mostly a debug feature, we don't mind if this doesn't
> guarantee to work in hotplug scenario.

We do not get to choose this type of thing.  Either it will work
properly, or not.  Offlineing cpu 0 happens with power management
situations, right?  Especially with big/little systems, if CPU0 was a
big one, you would remove it while only the little ones were running.

I still feel this should all be handled in userspace.

Especially given the problems that this patch is having with being
tested properly :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-21 18:29       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-21 18:29 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Mark Rutland, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel, Shanker Donthineni, Adam Wallis,
	Timur Tabi, Elliot Berman

On Tue, Feb 15, 2022 at 09:33:23AM +0530, Sai Prakash Ranjan wrote:
> Hi Mark,
> 
> On 2/14/2022 8:46 PM, Mark Rutland wrote:
> > Hi,
> > 
> > On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
> > > From: Shanker Donthineni <shankerd@codeaurora.org>
> > > 
> > > Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> > > reads/writes from/to DCC on secondary cores. Each core has its
> > > own DCC device registers, so when a core reads or writes from/to DCC,
> > > it only accesses its own DCC device. Since kernel code can run on
> > > any core, every time the kernel wants to write to the console, it
> > > might write to a different DCC.
> > > 
> > > In SMP mode, Trace32 creates multiple windows, and each window shows
> > > the DCC output only from that core's DCC. The result is that console
> > > output is either lost or scattered across windows.
> > This has been the Linux behaviour since the dawn of time, so why is this not
> > considered to be a bug in the tools? Why can't Lauterbach add an option to
> > treat the cores as one?
> 
> More like a feature request than a bug? And why would tools add such a
> feature when
> it is the kernel which runs in SMP mode? Shouldn't kernel be the one having
> such a feature
> because there would be number of such tools with the same issue and we can't
> send a feature
> request to all those tool vendors to add this feature right. Instead adding
> this in the kernel would
> avoid all these centrally at one place.

Please fix this in userspace.

> > Importantly, with hotplug we *cannot* guarantee that all messages will go to
> > the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
> > general we cann't provide a guarantee here.
> 
> Right that is true, in case of CPU hotplug this would be pretty much broken
> if CPU0 is offlined.
> We use these during initial bringup stage of SoCs when we don't have debug
> UART console up and running
> and at the time we don't much care for testing out hotplugging the CPUs and
> let alone trying out
> to offline CPU0 which we use and shoot our own foot :)
> 
> Given this is mostly a debug feature, we don't mind if this doesn't
> guarantee to work in hotplug scenario.

We do not get to choose this type of thing.  Either it will work
properly, or not.  Offlineing cpu 0 happens with power management
situations, right?  Especially with big/little systems, if CPU0 was a
big one, you would remove it while only the little ones were running.

I still feel this should all be handled in userspace.

Especially given the problems that this patch is having with being
tested properly :(

thanks,

greg k-h

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
  2022-02-21 18:29       ` Greg Kroah-Hartman
@ 2022-02-25  6:03         ` Sai Prakash Ranjan
  -1 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-25  6:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Mark Rutland, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel

Hi,

On 2/21/2022 11:59 PM, Greg Kroah-Hartman wrote:
> On Tue, Feb 15, 2022 at 09:33:23AM +0530, Sai Prakash Ranjan wrote:
>> Hi Mark,
>>
>> On 2/14/2022 8:46 PM, Mark Rutland wrote:
>>> Hi,
>>>
>>> On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
>>>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>>>
>>>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>>>> reads/writes from/to DCC on secondary cores. Each core has its
>>>> own DCC device registers, so when a core reads or writes from/to DCC,
>>>> it only accesses its own DCC device. Since kernel code can run on
>>>> any core, every time the kernel wants to write to the console, it
>>>> might write to a different DCC.
>>>>
>>>> In SMP mode, Trace32 creates multiple windows, and each window shows
>>>> the DCC output only from that core's DCC. The result is that console
>>>> output is either lost or scattered across windows.
>>> This has been the Linux behaviour since the dawn of time, so why is this not
>>> considered to be a bug in the tools? Why can't Lauterbach add an option to
>>> treat the cores as one?
>> More like a feature request than a bug? And why would tools add such a
>> feature when
>> it is the kernel which runs in SMP mode? Shouldn't kernel be the one having
>> such a feature
>> because there would be number of such tools with the same issue and we can't
>> send a feature
>> request to all those tool vendors to add this feature right. Instead adding
>> this in the kernel would
>> avoid all these centrally at one place.
> Please fix this in userspace.

Please see below queries and let me know how do you handle them.

>
>>> Importantly, with hotplug we *cannot* guarantee that all messages will go to
>>> the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
>>> general we cann't provide a guarantee here.
>> Right that is true, in case of CPU hotplug this would be pretty much broken
>> if CPU0 is offlined.
>> We use these during initial bringup stage of SoCs when we don't have debug
>> UART console up and running
>> and at the time we don't much care for testing out hotplugging the CPUs and
>> let alone trying out
>> to offline CPU0 which we use and shoot our own foot :)
>>
>> Given this is mostly a debug feature, we don't mind if this doesn't
>> guarantee to work in hotplug scenario.
> We do not get to choose this type of thing.  Either it will work
> properly, or not.  Offlineing cpu 0 happens with power management
> situations, right?  Especially with big/little systems, if CPU0 was a
> big one, you would remove it while only the little ones were running.

AFAIK on arm64, offlining CPU0 is possible via CPU device sysfs node but we aren't discussing about
manual offlining right? Because in that case a lot of code would need to be protected against a lot of
undesired effects like what if someone manually triggers sysrq panic (echo c > /proc/sysrq-trigger),
what protects us from it? Hopefully we are not talking about manual triggers.

Now about PM situations in arm64, correct me if I am wrong but I don't see CPU0(boot cpu) being offlined
in case of suspend to idle(which just puts CPUs into deep idle state and no offlining), suspend to ram or
suspend to disk.

In suspend to ram, I see only non-boot CPUs being offlined in suspend and brought back after resume.
See below snapshot of suspend to ram(mem) on my 8 CPU arm64 based board.

[312598.137531] Disabling non-boot CPUs ...
[312598.148144] psci: CPU1 killed (polled 1 ms)
[312598.159869] psci: CPU2 killed (polled 1 ms)
[312598.173832] psci: CPU3 killed (polled 1 ms)
[312598.187163] psci: CPU4 killed (polled 1 ms)
[312598.198006] psci: CPU5 killed (polled 1 ms)
[312598.208013] psci: CPU6 killed (polled 1 ms)
[312598.221245] psci: CPU7 killed (polled 1 ms)
[312598.237688] Enabling non-boot CPUs ...
[312598.245217] CPU1 is up
[312598.251017] CPU2 is up
[312598.257171] CPU3 is up
[312598.263882] CPU4 is up
[312598.271030] CPU5 is up
[312598.285948] CPU6 is up
[312598.291120] CPU7 is up

In case of suspend to disk, I believe its the same. I don't have any board which supports suspend to disk
or hibernation but looking at the code (kernel/power/hibernate.c -> pm_sleep_disable_secondary_cpus()),
it just disables non-boot cpus.

I hope I have covered your doubts regarding CPU PM and hotplug scenarios here?

Now coming to another reason which I mentioned before about this feature being used for early
SoC bringups, it is so early in the bringup stage that we don't even have any bootloaders/bootchain
involved which handles CPU PM states, meaning PSCI handlers for suspend and resume are not even
present at the time and we just have kernel loading directly from RAM. I mean at that stage we don't
even have a working debug UART which is why we use DCC and PM support is usually far away at that
stage.

> I still feel this should all be handled in userspace.

How would userspace take care of this problem with CPU hotplug? We can't open 100s of CPU T32 (JTAG tool)
windows, attach each of them and open DCC terminals initially so that userspace tool would be able to somehow
automagically migrate the messages on to different CPU when the current CPU goes offline? How would that work?

> Especially given the problems that this patch is having with being
> tested properly :(
>
> thanks,
>
> greg k-h

Hmm, I did test this version and reported the bug myself and posted a new version. As you know, these debug
configs (lock) are not present in default defconfig, so it went unnoticed, but have now enabled them for all the
future revisions of the patch.

Thanks,
Sai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes
@ 2022-02-25  6:03         ` Sai Prakash Ranjan
  0 siblings, 0 replies; 16+ messages in thread
From: Sai Prakash Ranjan @ 2022-02-25  6:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Mark Rutland, Jiri Slaby, Elliot Berman, linux-arm-kernel,
	linux-arm-msm, linux-kernel

Hi,

On 2/21/2022 11:59 PM, Greg Kroah-Hartman wrote:
> On Tue, Feb 15, 2022 at 09:33:23AM +0530, Sai Prakash Ranjan wrote:
>> Hi Mark,
>>
>> On 2/14/2022 8:46 PM, Mark Rutland wrote:
>>> Hi,
>>>
>>> On Thu, Feb 10, 2022 at 07:26:32PM +0530, Sai Prakash Ranjan wrote:
>>>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>>>
>>>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>>>> reads/writes from/to DCC on secondary cores. Each core has its
>>>> own DCC device registers, so when a core reads or writes from/to DCC,
>>>> it only accesses its own DCC device. Since kernel code can run on
>>>> any core, every time the kernel wants to write to the console, it
>>>> might write to a different DCC.
>>>>
>>>> In SMP mode, Trace32 creates multiple windows, and each window shows
>>>> the DCC output only from that core's DCC. The result is that console
>>>> output is either lost or scattered across windows.
>>> This has been the Linux behaviour since the dawn of time, so why is this not
>>> considered to be a bug in the tools? Why can't Lauterbach add an option to
>>> treat the cores as one?
>> More like a feature request than a bug? And why would tools add such a
>> feature when
>> it is the kernel which runs in SMP mode? Shouldn't kernel be the one having
>> such a feature
>> because there would be number of such tools with the same issue and we can't
>> send a feature
>> request to all those tool vendors to add this feature right. Instead adding
>> this in the kernel would
>> avoid all these centrally at one place.
> Please fix this in userspace.

Please see below queries and let me know how do you handle them.

>
>>> Importantly, with hotplug we *cannot* guarantee that all messages will go to
>>> the same CPU anyway, since that could be offlined (even if it is CPU 0), so in
>>> general we cann't provide a guarantee here.
>> Right that is true, in case of CPU hotplug this would be pretty much broken
>> if CPU0 is offlined.
>> We use these during initial bringup stage of SoCs when we don't have debug
>> UART console up and running
>> and at the time we don't much care for testing out hotplugging the CPUs and
>> let alone trying out
>> to offline CPU0 which we use and shoot our own foot :)
>>
>> Given this is mostly a debug feature, we don't mind if this doesn't
>> guarantee to work in hotplug scenario.
> We do not get to choose this type of thing.  Either it will work
> properly, or not.  Offlineing cpu 0 happens with power management
> situations, right?  Especially with big/little systems, if CPU0 was a
> big one, you would remove it while only the little ones were running.

AFAIK on arm64, offlining CPU0 is possible via CPU device sysfs node but we aren't discussing about
manual offlining right? Because in that case a lot of code would need to be protected against a lot of
undesired effects like what if someone manually triggers sysrq panic (echo c > /proc/sysrq-trigger),
what protects us from it? Hopefully we are not talking about manual triggers.

Now about PM situations in arm64, correct me if I am wrong but I don't see CPU0(boot cpu) being offlined
in case of suspend to idle(which just puts CPUs into deep idle state and no offlining), suspend to ram or
suspend to disk.

In suspend to ram, I see only non-boot CPUs being offlined in suspend and brought back after resume.
See below snapshot of suspend to ram(mem) on my 8 CPU arm64 based board.

[312598.137531] Disabling non-boot CPUs ...
[312598.148144] psci: CPU1 killed (polled 1 ms)
[312598.159869] psci: CPU2 killed (polled 1 ms)
[312598.173832] psci: CPU3 killed (polled 1 ms)
[312598.187163] psci: CPU4 killed (polled 1 ms)
[312598.198006] psci: CPU5 killed (polled 1 ms)
[312598.208013] psci: CPU6 killed (polled 1 ms)
[312598.221245] psci: CPU7 killed (polled 1 ms)
[312598.237688] Enabling non-boot CPUs ...
[312598.245217] CPU1 is up
[312598.251017] CPU2 is up
[312598.257171] CPU3 is up
[312598.263882] CPU4 is up
[312598.271030] CPU5 is up
[312598.285948] CPU6 is up
[312598.291120] CPU7 is up

In case of suspend to disk, I believe its the same. I don't have any board which supports suspend to disk
or hibernation but looking at the code (kernel/power/hibernate.c -> pm_sleep_disable_secondary_cpus()),
it just disables non-boot cpus.

I hope I have covered your doubts regarding CPU PM and hotplug scenarios here?

Now coming to another reason which I mentioned before about this feature being used for early
SoC bringups, it is so early in the bringup stage that we don't even have any bootloaders/bootchain
involved which handles CPU PM states, meaning PSCI handlers for suspend and resume are not even
present at the time and we just have kernel loading directly from RAM. I mean at that stage we don't
even have a working debug UART which is why we use DCC and PM support is usually far away at that
stage.

> I still feel this should all be handled in userspace.

How would userspace take care of this problem with CPU hotplug? We can't open 100s of CPU T32 (JTAG tool)
windows, attach each of them and open DCC terminals initially so that userspace tool would be able to somehow
automagically migrate the messages on to different CPU when the current CPU goes offline? How would that work?

> Especially given the problems that this patch is having with being
> tested properly :(
>
> thanks,
>
> greg k-h

Hmm, I did test this version and reported the bug myself and posted a new version. As you know, these debug
configs (lock) are not present in default defconfig, so it went unnoticed, but have now enabled them for all the
future revisions of the patch.

Thanks,
Sai

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-02-25  6:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 13:56 [PATCHv4] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes Sai Prakash Ranjan
2022-02-10 13:56 ` Sai Prakash Ranjan
2022-02-10 14:24 ` Greg Kroah-Hartman
2022-02-10 14:24   ` Greg Kroah-Hartman
2022-02-10 16:49   ` Sai Prakash Ranjan
2022-02-10 16:49     ` Sai Prakash Ranjan
2022-02-11 12:48 ` Sai Prakash Ranjan
2022-02-11 12:48   ` Sai Prakash Ranjan
2022-02-14 15:16 ` Mark Rutland
2022-02-14 15:16   ` Mark Rutland
2022-02-15  4:03   ` Sai Prakash Ranjan
2022-02-15  4:03     ` Sai Prakash Ranjan
2022-02-21 18:29     ` Greg Kroah-Hartman
2022-02-21 18:29       ` Greg Kroah-Hartman
2022-02-25  6:03       ` Sai Prakash Ranjan
2022-02-25  6:03         ` Sai Prakash Ranjan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.