linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready
@ 2024-05-05 13:07 Weifeng Liu
  2024-05-05 13:07 ` [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev " Weifeng Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Weifeng Liu @ 2024-05-05 13:07 UTC (permalink / raw)
  To: platform-driver-x86, linux-serial
  Cc: Weifeng Liu, Andy Shevchenko, Maximilian Luz, Hans de Goede

v3 changes:
* better formatting, nothing special

v2 changes:
* resolves Andy's comments

Original letter:

Greetings,

This series is intended to remedy a race condition where surface
aggregator module (SAM) which is a serdev driver could fail to probe if
the underlying UART port is not ready to open.  In such circumstance,
invoking serdev_device_open() gets errno -ENXIO, leading to failure in
probing of SAM.  However, if the probe is retried in a short delay,
serdev_device_open() would work as expected and everything just goes
fine.  As a workaround, adding the serial driver (8250_dw) into
initramfs or building it into the kernel image significantly mitigates
the likelihood of encountering this race condition, as in this way the
serial device would be initialized much earlier than probing of SAM.

However, IMO we should reliably avoid this sort of race condition.  A
good way is returning -EPROBE_DEFER when serdev_device_open returns
-ENXIO so that the kernel will be able to retry the probing later.  This
is what the first patch tries to do.

Though this solution might be a good enough solution for this specific
issue, I am wondering why this kind of race condition could ever happen,
i.e., why a serdes device could be not ready yet at the moment the
serdev driver gets called and tries to bind it.  And even if this is an
expected behavior how serdev driver works, I do feel it a little bit
weird that we need to identify serdev_device_open() returning -ENXIO as
non-fatal error and thus return -EPROBE_DEFER manually in such case, as
I don't see this sort of behavior in other serdev driver.  Maximilian
and Hans suggested discussing the root cause of the race condition here.
I will be grateful if you could provide some reasoning and insights on
this.

Following is the code path when the issue occurs:

	ssam_serial_hub_probe()
	serdev_device_open()
	ctrl->ops->open()	/* this callback being ttyport_open() */
	tty->ops->open()	/* this callback being uart_open() */
	tty_port_open()
	port->ops->activate()	/* this callback being uart_port_activate() */
	Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.

I notice that flag UPF_DEAD would be set in serial_core_register_port()
during calling serial_core_add_one_port() but don't have much idea
what's going on under the hood.

Additionally, add logs to the probe procedure of SAM in the second
patch, hoping it could help provide context to user when something
miserable happens.

Context of this series is available in [1].

Best regards,
Weifeng

Weifeng Liu (2):
  platform/surface: aggregator: Defer probing when serdev is not ready
  platform/surface: aggregator: Log critical errors during SAM probing

 drivers/platform/surface/aggregator/core.c | 53 ++++++++++++++++------
 1 file changed, 39 insertions(+), 14 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev is not ready
  2024-05-05 13:07 [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Weifeng Liu
@ 2024-05-05 13:07 ` Weifeng Liu
  2024-05-06  9:06   ` Andy Shevchenko
  2024-05-05 13:07 ` [PATCH v3 2/2] platform/surface: aggregator: Log critical errors during SAM probing Weifeng Liu
  2024-05-06 14:41 ` [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Hans de Goede
  2 siblings, 1 reply; 8+ messages in thread
From: Weifeng Liu @ 2024-05-05 13:07 UTC (permalink / raw)
  To: platform-driver-x86, linux-serial
  Cc: Weifeng Liu, Andy Shevchenko, Maximilian Luz, Hans de Goede,
	Andy Shevchenko

This is an attempt to alleviate race conditions in the SAM driver where
essential resources like serial device and GPIO pins are not ready at
the time ssam_serial_hub_probe() is called.  Instead of giving up
probing, a better way would be to defer the probing by returning
-EPROBE_DEFER, allowing the kernel try again later.

However, there is no way of identifying all such cases from other real
errors in a few days.  So let's take a gradual approach identify and
address these cases as they arise.  This commit marks the initial step
in this process.

Suggested-by: Maximilian Luz <luzmaximilian@gmail.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Weifeng Liu <weifeng.liu.z@gmail.com>
---
 drivers/platform/surface/aggregator/core.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/surface/aggregator/core.c b/drivers/platform/surface/aggregator/core.c
index ba550eaa06fc..7b1871eb7a6f 100644
--- a/drivers/platform/surface/aggregator/core.c
+++ b/drivers/platform/surface/aggregator/core.c
@@ -645,9 +645,22 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
 	/* Set up serdev device. */
 	serdev_device_set_drvdata(serdev, ctrl);
 	serdev_device_set_client_ops(serdev, &ssam_serdev_ops);
+
+	/*
+	 * The following step can fail when it's called too early before the
+	 * underlying UART device is ready (in this case -ENXIO is returned).
+	 * Instead of simply giving up and losing everything, we can defer
+	 * the probing by returning -EPROBE_DEFER so that the kernel would be
+	 * able to retry later.
+	 */
 	status = serdev_device_open(serdev);
-	if (status)
+	if (status == -ENXIO)
+		status = -EPROBE_DEFER;
+	if (status) {
+		dev_err_probe(&serdev->dev, status,
+			      "failed to open serdev device\n");
 		goto err_devopen;
+	}
 
 	astatus = ssam_serdev_setup_via_acpi(ssh->handle, serdev);
 	if (ACPI_FAILURE(astatus)) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] platform/surface: aggregator: Log critical errors during SAM probing
  2024-05-05 13:07 [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Weifeng Liu
  2024-05-05 13:07 ` [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev " Weifeng Liu
@ 2024-05-05 13:07 ` Weifeng Liu
  2024-05-13 11:55   ` Hans de Goede
  2024-05-06 14:41 ` [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Hans de Goede
  2 siblings, 1 reply; 8+ messages in thread
From: Weifeng Liu @ 2024-05-05 13:07 UTC (permalink / raw)
  To: platform-driver-x86, linux-serial
  Cc: Weifeng Liu, Andy Shevchenko, Maximilian Luz, Hans de Goede,
	Andy Shevchenko

Emits messages upon errors during probing of SAM.  Hopefully this could
provide useful context to user for the purpose of diagnosis when
something miserable happen.

Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Weifeng Liu <weifeng.liu.z@gmail.com>
---
 drivers/platform/surface/aggregator/core.c | 42 ++++++++++++++--------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/platform/surface/aggregator/core.c b/drivers/platform/surface/aggregator/core.c
index 7b1871eb7a6f..046fa63446bf 100644
--- a/drivers/platform/surface/aggregator/core.c
+++ b/drivers/platform/surface/aggregator/core.c
@@ -618,15 +618,17 @@ static const struct acpi_gpio_mapping ssam_acpi_gpios[] = {
 
 static int ssam_serial_hub_probe(struct serdev_device *serdev)
 {
-	struct acpi_device *ssh = ACPI_COMPANION(&serdev->dev);
+	struct device *dev = &serdev->dev;
+	struct acpi_device *ssh = ACPI_COMPANION(dev);
 	struct ssam_controller *ctrl;
 	acpi_status astatus;
 	int status;
 
-	if (gpiod_count(&serdev->dev, NULL) < 0)
-		return -ENODEV;
+	status = gpiod_count(dev, NULL);
+	if (status < 0)
+		return dev_err_probe(dev, status, "no GPIO found\n");
 
-	status = devm_acpi_dev_add_driver_gpios(&serdev->dev, ssam_acpi_gpios);
+	status = devm_acpi_dev_add_driver_gpios(dev, ssam_acpi_gpios);
 	if (status)
 		return status;
 
@@ -637,8 +639,11 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
 
 	/* Initialize controller. */
 	status = ssam_controller_init(ctrl, serdev);
-	if (status)
+	if (status) {
+		dev_err_probe(dev, status,
+			      "failed to initialize ssam controller\n");
 		goto err_ctrl_init;
+	}
 
 	ssam_controller_lock(ctrl);
 
@@ -657,14 +662,13 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
 	if (status == -ENXIO)
 		status = -EPROBE_DEFER;
 	if (status) {
-		dev_err_probe(&serdev->dev, status,
-			      "failed to open serdev device\n");
+		dev_err_probe(dev, status, "failed to open serdev device\n");
 		goto err_devopen;
 	}
 
 	astatus = ssam_serdev_setup_via_acpi(ssh->handle, serdev);
 	if (ACPI_FAILURE(astatus)) {
-		status = -ENXIO;
+		status = dev_err_probe(dev, -ENXIO, "failed to setup serdev\n");
 		goto err_devinit;
 	}
 
@@ -680,25 +684,33 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
 	 * states.
 	 */
 	status = ssam_log_firmware_version(ctrl);
-	if (status)
+	if (status) {
+		dev_err_probe(dev, status, "failed to get firmware version\n");
 		goto err_initrq;
+	}
 
 	status = ssam_ctrl_notif_d0_entry(ctrl);
-	if (status)
+	if (status) {
+		dev_err_probe(dev, status, "D0-entry notification failed\n");
 		goto err_initrq;
+	}
 
 	status = ssam_ctrl_notif_display_on(ctrl);
-	if (status)
+	if (status) {
+		dev_err_probe(dev, status, "display-on notification failed\n");
 		goto err_initrq;
+	}
 
-	status = sysfs_create_group(&serdev->dev.kobj, &ssam_sam_group);
+	status = sysfs_create_group(&dev->kobj, &ssam_sam_group);
 	if (status)
 		goto err_initrq;
 
 	/* Set up IRQ. */
 	status = ssam_irq_setup(ctrl);
-	if (status)
+	if (status) {
+		dev_err_probe(dev, status, "failed to setup IRQ\n");
 		goto err_irq;
+	}
 
 	/* Finally, set main controller reference. */
 	status = ssam_try_set_controller(ctrl);
@@ -715,7 +727,7 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
 	 *       resumed. In short, this causes some spurious unwanted wake-ups.
 	 *       For now let's thus default power/wakeup to false.
 	 */
-	device_set_wakeup_capable(&serdev->dev, true);
+	device_set_wakeup_capable(dev, true);
 	acpi_dev_clear_dependencies(ssh);
 
 	return 0;
@@ -723,7 +735,7 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
 err_mainref:
 	ssam_irq_free(ctrl);
 err_irq:
-	sysfs_remove_group(&serdev->dev.kobj, &ssam_sam_group);
+	sysfs_remove_group(&dev->kobj, &ssam_sam_group);
 err_initrq:
 	ssam_controller_lock(ctrl);
 	ssam_controller_shutdown(ctrl);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev is not ready
  2024-05-05 13:07 ` [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev " Weifeng Liu
@ 2024-05-06  9:06   ` Andy Shevchenko
  0 siblings, 0 replies; 8+ messages in thread
From: Andy Shevchenko @ 2024-05-06  9:06 UTC (permalink / raw)
  To: Weifeng Liu
  Cc: platform-driver-x86, linux-serial, Maximilian Luz, Hans de Goede

On Sun, May 05, 2024 at 09:07:49PM +0800, Weifeng Liu wrote:
> This is an attempt to alleviate race conditions in the SAM driver where
> essential resources like serial device and GPIO pins are not ready at
> the time ssam_serial_hub_probe() is called.  Instead of giving up
> probing, a better way would be to defer the probing by returning
> -EPROBE_DEFER, allowing the kernel try again later.
> 
> However, there is no way of identifying all such cases from other real
> errors in a few days.  So let's take a gradual approach identify and
> address these cases as they arise.  This commit marks the initial step
> in this process.

It's a bit pointless to send a new version while we haven't settled yet down on
the previous one.

Moreover, there is no added details as I asked in the previous round of review.

The decision of moving this part to serdev is up to Hans, but I think we also
can at least put TODO line here with explanations you gave in the reply to v2
that this is currently the only driver needs this and there is still a chance
that more might need it.

While writing the above paragraph I realised that this might be due to
non-standard appearance of the device in DSDT, that it gets enumerated
before the controller.

Do you have a DSDT excerpt for the controller and device parts in the order
of appearance?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready
  2024-05-05 13:07 [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Weifeng Liu
  2024-05-05 13:07 ` [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev " Weifeng Liu
  2024-05-05 13:07 ` [PATCH v3 2/2] platform/surface: aggregator: Log critical errors during SAM probing Weifeng Liu
@ 2024-05-06 14:41 ` Hans de Goede
  2024-05-06 14:52   ` Andy Shevchenko
  2024-05-07 13:44   ` Weifeng Liu
  2 siblings, 2 replies; 8+ messages in thread
From: Hans de Goede @ 2024-05-06 14:41 UTC (permalink / raw)
  To: Weifeng Liu, platform-driver-x86, linux-serial
  Cc: Andy Shevchenko, Maximilian Luz

[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]

Hi,

On 5/5/24 3:07 PM, Weifeng Liu wrote:
> v3 changes:
> * better formatting, nothing special
> 
> v2 changes:
> * resolves Andy's comments
> 
> Original letter:
> 
> Greetings,
> 
> This series is intended to remedy a race condition where surface
> aggregator module (SAM) which is a serdev driver could fail to probe if
> the underlying UART port is not ready to open.  In such circumstance,
> invoking serdev_device_open() gets errno -ENXIO, leading to failure in
> probing of SAM.  However, if the probe is retried in a short delay,
> serdev_device_open() would work as expected and everything just goes
> fine.  As a workaround, adding the serial driver (8250_dw) into
> initramfs or building it into the kernel image significantly mitigates
> the likelihood of encountering this race condition, as in this way the
> serial device would be initialized much earlier than probing of SAM.
> 
> However, IMO we should reliably avoid this sort of race condition.  A
> good way is returning -EPROBE_DEFER when serdev_device_open returns
> -ENXIO so that the kernel will be able to retry the probing later.  This
> is what the first patch tries to do.
> 
> Though this solution might be a good enough solution for this specific
> issue, I am wondering why this kind of race condition could ever happen,
> i.e., why a serdes device could be not ready yet at the moment the
> serdev driver gets called and tries to bind it.  And even if this is an
> expected behavior how serdev driver works, I do feel it a little bit
> weird that we need to identify serdev_device_open() returning -ENXIO as
> non-fatal error and thus return -EPROBE_DEFER manually in such case, as
> I don't see this sort of behavior in other serdev driver.  Maximilian
> and Hans suggested discussing the root cause of the race condition here.
> I will be grateful if you could provide some reasoning and insights on
> this.
> 
> Following is the code path when the issue occurs:
> 
> 	ssam_serial_hub_probe()
> 	serdev_device_open()
> 	ctrl->ops->open()	/* this callback being ttyport_open() */
> 	tty->ops->open()	/* this callback being uart_open() */
> 	tty_port_open()
> 	port->ops->activate()	/* this callback being uart_port_activate() */
> 	Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.
> 
> I notice that flag UPF_DEAD would be set in serial_core_register_port()
> during calling serial_core_add_one_port() but don't have much idea
> what's going on under the hood.

Thank you this explanation + Andy's questions about this were quite
useful. I think I have found the root cause of this problem and I have
attached a patch which should fix this.

After dropping your own fix from your local kernel you should be able
to reproduce this 100% of the time by making the surface_aggregator
module builtin (CONFIG_SURFACE_AGGREGATOR=y) while making 8250_dw
a module (CONFIG_SERIAL_8250_DW=m). Although I'm not sure if the uart
will then not simply be initialzed as something generic. You could also
try building both into the kernel and see if the issue reproduces then.

Once you can reproduce this reliably, give the attached patch a try
to fix this please.

Regards,

Hans



> 
> Additionally, add logs to the probe procedure of SAM in the second
> patch, hoping it could help provide context to user when something
> miserable happens.
> 
> Context of this series is available in [1].
> 
> Best regards,
> Weifeng
> 
> Weifeng Liu (2):
>   platform/surface: aggregator: Defer probing when serdev is not ready
>   platform/surface: aggregator: Log critical errors during SAM probing
> 
>  drivers/platform/surface/aggregator/core.c | 53 ++++++++++++++++------
>  1 file changed, 39 insertions(+), 14 deletions(-)
> 

[-- Attachment #2: 0001-serial-Clear-UPF_DEAD-before-calling-tty_port_regist.patch --]
[-- Type: text/x-patch, Size: 2918 bytes --]

From 256e1b83d4a2015b16b036f2aaa0c3c6f6c63e4a Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Mon, 6 May 2024 16:20:45 +0200
Subject: [PATCH] serial: Clear UPF_DEAD before calling
 tty_port_register_device_attr_serdev()

If a serdev_device_driver is already loaded for a serdev_tty_port when it
gets registered by tty_port_register_device_attr_serdev() then that
driver's probe() method will be called immediately.

The serdev_device_driver's probe() method should then be able to call
serdev_device_open() successfully, but because UPF_DEAD is still dead
serdev_device_open() will fail with -ENXIO in this scenario:

  serdev_device_open()
  ctrl->ops->open()	/* this callback being ttyport_open() */
  tty->ops->open()	/* this callback being uart_open() */
  tty_port_open()
  port->ops->activate()	/* this callback being uart_port_activate() */
  Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.

Fix this be clearing UPF_DEAD before tty_port_register_device_attr_serdev()
note this only moves up the UPD_DEAD clearing a small bit, before:

  tty_port_register_device_attr_serdev();
  mutex_unlock(&tty_port.mutex);
  uart_port.flags &= ~UPF_DEAD;
  mutex_unlock(&port_mutex);

after:

  uart_port.flags &= ~UPF_DEAD;
  tty_port_register_device_attr_serdev();
  mutex_unlock(&tty_port.mutex);
  mutex_unlock(&port_mutex);

Reported-by: Weifeng Liu <weifeng.liu.z@gmail.com>
Closes: https://lore.kernel.org/platform-driver-x86/20240505130800.2546640-1-weifeng.liu.z@gmail.com/
Cc: Maximilian Luz <luzmaximilian@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/tty/serial/serial_core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index ff85ebd3a007..d9424fe6513b 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3196,6 +3196,9 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
 	if (uport->attr_group)
 		uport->tty_groups[1] = uport->attr_group;
 
+	/* Ensure serdev drivers can call serdev_device_open() right away */
+	uport->flags &= ~UPF_DEAD;
+
 	/*
 	 * Register the port whether it's detected or not.  This allows
 	 * setserial to be used to alter this port's parameters.
@@ -3206,6 +3209,7 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
 	if (!IS_ERR(tty_dev)) {
 		device_set_wakeup_capable(tty_dev, 1);
 	} else {
+		uport->flags |= UPF_DEAD;
 		dev_err(uport->dev, "Cannot register tty device on line %d\n",
 		       uport->line);
 	}
@@ -3411,8 +3415,6 @@ int serial_core_register_port(struct uart_driver *drv, struct uart_port *port)
 	if (ret)
 		goto err_unregister_port_dev;
 
-	port->flags &= ~UPF_DEAD;
-
 	mutex_unlock(&port_mutex);
 
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready
  2024-05-06 14:41 ` [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Hans de Goede
@ 2024-05-06 14:52   ` Andy Shevchenko
  2024-05-07 13:44   ` Weifeng Liu
  1 sibling, 0 replies; 8+ messages in thread
From: Andy Shevchenko @ 2024-05-06 14:52 UTC (permalink / raw)
  To: Hans de Goede, Tony Lindgren
  Cc: Weifeng Liu, platform-driver-x86, linux-serial, Maximilian Luz

On Mon, May 06, 2024 at 04:41:19PM +0200, Hans de Goede wrote:
> On 5/5/24 3:07 PM, Weifeng Liu wrote:

...

> If a serdev_device_driver is already loaded for a serdev_tty_port when it
> gets registered by tty_port_register_device_attr_serdev() then that
> driver's probe() method will be called immediately.
> 
> The serdev_device_driver's probe() method should then be able to call
> serdev_device_open() successfully, but because UPF_DEAD is still dead
> serdev_device_open() will fail with -ENXIO in this scenario:
> 
>   serdev_device_open()
>   ctrl->ops->open()	/* this callback being ttyport_open() */
>   tty->ops->open()	/* this callback being uart_open() */
>   tty_port_open()
>   port->ops->activate()	/* this callback being uart_port_activate() */
>   Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.
> 
> Fix this be clearing UPF_DEAD before tty_port_register_device_attr_serdev()
> note this only moves up the UPD_DEAD clearing a small bit, before:
> 
>   tty_port_register_device_attr_serdev();
>   mutex_unlock(&tty_port.mutex);
>   uart_port.flags &= ~UPF_DEAD;
>   mutex_unlock(&port_mutex);
> 
> after:
> 
>   uart_port.flags &= ~UPF_DEAD;
>   tty_port_register_device_attr_serdev();
>   mutex_unlock(&tty_port.mutex);
>   mutex_unlock(&port_mutex);

> Reported-by: Weifeng Liu <weifeng.liu.z@gmail.com>
> Closes: https://lore.kernel.org/platform-driver-x86/20240505130800.2546640-1-weifeng.liu.z@gmail.com/

> Cc: Maximilian Luz <luzmaximilian@gmail.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Can you move Cc after the cutter '---' line?

The patch makes sense to me, FWIW,
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
but Cc'ed Tony just in case.

> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  drivers/tty/serial/serial_core.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> index ff85ebd3a007..d9424fe6513b 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -3196,6 +3196,9 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
>  	if (uport->attr_group)
>  		uport->tty_groups[1] = uport->attr_group;
>  
> +	/* Ensure serdev drivers can call serdev_device_open() right away */
> +	uport->flags &= ~UPF_DEAD;
> +
>  	/*
>  	 * Register the port whether it's detected or not.  This allows
>  	 * setserial to be used to alter this port's parameters.
> @@ -3206,6 +3209,7 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
>  	if (!IS_ERR(tty_dev)) {
>  		device_set_wakeup_capable(tty_dev, 1);
>  	} else {
> +		uport->flags |= UPF_DEAD;
>  		dev_err(uport->dev, "Cannot register tty device on line %d\n",
>  		       uport->line);
>  	}
> @@ -3411,8 +3415,6 @@ int serial_core_register_port(struct uart_driver *drv, struct uart_port *port)
>  	if (ret)
>  		goto err_unregister_port_dev;
>  
> -	port->flags &= ~UPF_DEAD;
> -
>  	mutex_unlock(&port_mutex);
>  
>  	return 0;

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready
  2024-05-06 14:41 ` [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Hans de Goede
  2024-05-06 14:52   ` Andy Shevchenko
@ 2024-05-07 13:44   ` Weifeng Liu
  1 sibling, 0 replies; 8+ messages in thread
From: Weifeng Liu @ 2024-05-07 13:44 UTC (permalink / raw)
  To: Hans de Goede, platform-driver-x86, linux-serial
  Cc: Andy Shevchenko, Maximilian Luz, Tony Lindgren

Hi,

On Mon, 2024-05-06 at 16:41 +0200, Hans de Goede wrote:
> Hi,
> 
> On 5/5/24 3:07 PM, Weifeng Liu wrote:
> > v3 changes:
> > * better formatting, nothing special
> > 
> > v2 changes:
> > * resolves Andy's comments
> > 
> > Original letter:
> > 
> > Greetings,
> > 
> > This series is intended to remedy a race condition where surface
> > aggregator module (SAM) which is a serdev driver could fail to probe if
> > the underlying UART port is not ready to open.  In such circumstance,
> > invoking serdev_device_open() gets errno -ENXIO, leading to failure in
> > probing of SAM.  However, if the probe is retried in a short delay,
> > serdev_device_open() would work as expected and everything just goes
> > fine.  As a workaround, adding the serial driver (8250_dw) into
> > initramfs or building it into the kernel image significantly mitigates
> > the likelihood of encountering this race condition, as in this way the
> > serial device would be initialized much earlier than probing of SAM.
> > 
> > However, IMO we should reliably avoid this sort of race condition.  A
> > good way is returning -EPROBE_DEFER when serdev_device_open returns
> > -ENXIO so that the kernel will be able to retry the probing later.  This
> > is what the first patch tries to do.
> > 
> > Though this solution might be a good enough solution for this specific
> > issue, I am wondering why this kind of race condition could ever happen,
> > i.e., why a serdes device could be not ready yet at the moment the
> > serdev driver gets called and tries to bind it.  And even if this is an
> > expected behavior how serdev driver works, I do feel it a little bit
> > weird that we need to identify serdev_device_open() returning -ENXIO as
> > non-fatal error and thus return -EPROBE_DEFER manually in such case, as
> > I don't see this sort of behavior in other serdev driver.  Maximilian
> > and Hans suggested discussing the root cause of the race condition here.
> > I will be grateful if you could provide some reasoning and insights on
> > this.
> > 
> > Following is the code path when the issue occurs:
> > 
> > 	ssam_serial_hub_probe()
> > 	serdev_device_open()
> > 	ctrl->ops->open()	/* this callback being ttyport_open() */
> > 	tty->ops->open()	/* this callback being uart_open() */
> > 	tty_port_open()
> > 	port->ops->activate()	/* this callback being uart_port_activate() */
> > 	Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.
> > 
> > I notice that flag UPF_DEAD would be set in serial_core_register_port()
> > during calling serial_core_add_one_port() but don't have much idea
> > what's going on under the hood.
> 
> Thank you this explanation + Andy's questions about this were quite
> useful. I think I have found the root cause of this problem and I have
> attached a patch which should fix this.
> 
> After dropping your own fix from your local kernel you should be able
> to reproduce this 100% of the time by making the surface_aggregator
> module builtin (CONFIG_SURFACE_AGGREGATOR=y) while making 8250_dw
> a module (CONFIG_SERIAL_8250_DW=m). Although I'm not sure if the uart
> will then not simply be initialzed as something generic. You could also
> try building both into the kernel and see if the issue reproduces then.
> 

As per your instructions, I

* removed my fixes,
* built surface_aggregator into the kernal,
* built 8250_dw as a module and
* removed 8250_dw from initramfs as well,

and found the occurrence rate of the issue was around 30%.  With your
patch applied, the issue disappeared completely.

Tested-by: Weifeng Liu <weifeng.liu.z@gmail.com>

Really glad to see the final solution in serial core for this issue. 
This might also help other serdev drivers.  Thank you all.

Best regards,
Weifeng

> Once you can reproduce this reliably, give the attached patch a try
> to fix this please.
> 
> Regards,
> 
> Hans
> 
> 
> 
> > 
> > Additionally, add logs to the probe procedure of SAM in the second
> > patch, hoping it could help provide context to user when something
> > miserable happens.
> > 
> > Context of this series is available in [1].
> > 
> > Best regards,
> > Weifeng
> > 
> > Weifeng Liu (2):
> >   platform/surface: aggregator: Defer probing when serdev is not ready
> >   platform/surface: aggregator: Log critical errors during SAM probing
> > 
> >  drivers/platform/surface/aggregator/core.c | 53 ++++++++++++++++------
> >  1 file changed, 39 insertions(+), 14 deletions(-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] platform/surface: aggregator: Log critical errors during SAM probing
  2024-05-05 13:07 ` [PATCH v3 2/2] platform/surface: aggregator: Log critical errors during SAM probing Weifeng Liu
@ 2024-05-13 11:55   ` Hans de Goede
  0 siblings, 0 replies; 8+ messages in thread
From: Hans de Goede @ 2024-05-13 11:55 UTC (permalink / raw)
  To: Weifeng Liu, platform-driver-x86, linux-serial
  Cc: Andy Shevchenko, Maximilian Luz, Andy Shevchenko

Hi,

On 5/5/24 3:07 PM, Weifeng Liu wrote:
> Emits messages upon errors during probing of SAM.  Hopefully this could
> provide useful context to user for the purpose of diagnosis when
> something miserable happen.
> 
> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Acked-by: Hans de Goede <hdegoede@redhat.com>
> Signed-off-by: Weifeng Liu <weifeng.liu.z@gmail.com>

Thank you for your patch, I've applied this patch to my review-hans 
branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans

Note it will show up in my review-hans branch once I've pushed my
local branch there, which might take a while.

Once I've run some tests on this branch the patches there will be
added to the platform-drivers-x86/for-next branch and eventually
will be included in the pdx86 pull-request to Linus for the next
merge-window.

Regards,

Hans



> ---
>  drivers/platform/surface/aggregator/core.c | 42 ++++++++++++++--------
>  1 file changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/platform/surface/aggregator/core.c b/drivers/platform/surface/aggregator/core.c
> index 7b1871eb7a6f..046fa63446bf 100644
> --- a/drivers/platform/surface/aggregator/core.c
> +++ b/drivers/platform/surface/aggregator/core.c
> @@ -618,15 +618,17 @@ static const struct acpi_gpio_mapping ssam_acpi_gpios[] = {
>  
>  static int ssam_serial_hub_probe(struct serdev_device *serdev)
>  {
> -	struct acpi_device *ssh = ACPI_COMPANION(&serdev->dev);
> +	struct device *dev = &serdev->dev;
> +	struct acpi_device *ssh = ACPI_COMPANION(dev);
>  	struct ssam_controller *ctrl;
>  	acpi_status astatus;
>  	int status;
>  
> -	if (gpiod_count(&serdev->dev, NULL) < 0)
> -		return -ENODEV;
> +	status = gpiod_count(dev, NULL);
> +	if (status < 0)
> +		return dev_err_probe(dev, status, "no GPIO found\n");
>  
> -	status = devm_acpi_dev_add_driver_gpios(&serdev->dev, ssam_acpi_gpios);
> +	status = devm_acpi_dev_add_driver_gpios(dev, ssam_acpi_gpios);
>  	if (status)
>  		return status;
>  
> @@ -637,8 +639,11 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
>  
>  	/* Initialize controller. */
>  	status = ssam_controller_init(ctrl, serdev);
> -	if (status)
> +	if (status) {
> +		dev_err_probe(dev, status,
> +			      "failed to initialize ssam controller\n");
>  		goto err_ctrl_init;
> +	}
>  
>  	ssam_controller_lock(ctrl);
>  
> @@ -657,14 +662,13 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
>  	if (status == -ENXIO)
>  		status = -EPROBE_DEFER;
>  	if (status) {
> -		dev_err_probe(&serdev->dev, status,
> -			      "failed to open serdev device\n");
> +		dev_err_probe(dev, status, "failed to open serdev device\n");
>  		goto err_devopen;
>  	}
>  
>  	astatus = ssam_serdev_setup_via_acpi(ssh->handle, serdev);
>  	if (ACPI_FAILURE(astatus)) {
> -		status = -ENXIO;
> +		status = dev_err_probe(dev, -ENXIO, "failed to setup serdev\n");
>  		goto err_devinit;
>  	}
>  
> @@ -680,25 +684,33 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
>  	 * states.
>  	 */
>  	status = ssam_log_firmware_version(ctrl);
> -	if (status)
> +	if (status) {
> +		dev_err_probe(dev, status, "failed to get firmware version\n");
>  		goto err_initrq;
> +	}
>  
>  	status = ssam_ctrl_notif_d0_entry(ctrl);
> -	if (status)
> +	if (status) {
> +		dev_err_probe(dev, status, "D0-entry notification failed\n");
>  		goto err_initrq;
> +	}
>  
>  	status = ssam_ctrl_notif_display_on(ctrl);
> -	if (status)
> +	if (status) {
> +		dev_err_probe(dev, status, "display-on notification failed\n");
>  		goto err_initrq;
> +	}
>  
> -	status = sysfs_create_group(&serdev->dev.kobj, &ssam_sam_group);
> +	status = sysfs_create_group(&dev->kobj, &ssam_sam_group);
>  	if (status)
>  		goto err_initrq;
>  
>  	/* Set up IRQ. */
>  	status = ssam_irq_setup(ctrl);
> -	if (status)
> +	if (status) {
> +		dev_err_probe(dev, status, "failed to setup IRQ\n");
>  		goto err_irq;
> +	}
>  
>  	/* Finally, set main controller reference. */
>  	status = ssam_try_set_controller(ctrl);
> @@ -715,7 +727,7 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
>  	 *       resumed. In short, this causes some spurious unwanted wake-ups.
>  	 *       For now let's thus default power/wakeup to false.
>  	 */
> -	device_set_wakeup_capable(&serdev->dev, true);
> +	device_set_wakeup_capable(dev, true);
>  	acpi_dev_clear_dependencies(ssh);
>  
>  	return 0;
> @@ -723,7 +735,7 @@ static int ssam_serial_hub_probe(struct serdev_device *serdev)
>  err_mainref:
>  	ssam_irq_free(ctrl);
>  err_irq:
> -	sysfs_remove_group(&serdev->dev.kobj, &ssam_sam_group);
> +	sysfs_remove_group(&dev->kobj, &ssam_sam_group);
>  err_initrq:
>  	ssam_controller_lock(ctrl);
>  	ssam_controller_shutdown(ctrl);


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-13 11:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-05 13:07 [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Weifeng Liu
2024-05-05 13:07 ` [PATCH v3 1/2] platform/surface: aggregator: Defer probing when serdev " Weifeng Liu
2024-05-06  9:06   ` Andy Shevchenko
2024-05-05 13:07 ` [PATCH v3 2/2] platform/surface: aggregator: Log critical errors during SAM probing Weifeng Liu
2024-05-13 11:55   ` Hans de Goede
2024-05-06 14:41 ` [PATCH v3 0/2] Defer probing of SAM if serdev device is not ready Hans de Goede
2024-05-06 14:52   ` Andy Shevchenko
2024-05-07 13:44   ` Weifeng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).