linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] serial: sci: fix OOPS because of wrongly running hrtimer
@ 2024-04-16 12:35 Wolfram Sang
  2024-04-16 12:35 ` [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA Wolfram Sang
  2024-04-16 12:35 ` [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated Wolfram Sang
  0 siblings, 2 replies; 7+ messages in thread
From: Wolfram Sang @ 2024-04-16 12:35 UTC (permalink / raw)
  To: linux-renesas-soc; +Cc: Wolfram Sang, linux-kernel, linux-serial

Dirk sent a very interesting bug report[1]. This series is what I found
out by reviewing the driver. It is not tested yet because I couldn't
trigger the code path yet. The console still works normally with these
patches. Still, I am already curious in hearing your opinions, so here
is what I have...

[1] https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com

Wolfram Sang (2):
  serial: sh-sci: start hrtimer after setting up DMA
  serial: sh-sci: always cancel hrtimer when DMA RX is invalidated

 drivers/tty/serial/sh-sci.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA
  2024-04-16 12:35 [RFC PATCH 0/2] serial: sci: fix OOPS because of wrongly running hrtimer Wolfram Sang
@ 2024-04-16 12:35 ` Wolfram Sang
  2024-04-24  9:41   ` Geert Uytterhoeven
  2024-05-02 18:01   ` Wolfram Sang
  2024-04-16 12:35 ` [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated Wolfram Sang
  1 sibling, 2 replies; 7+ messages in thread
From: Wolfram Sang @ 2024-04-16 12:35 UTC (permalink / raw)
  To: linux-renesas-soc
  Cc: Wolfram Sang, Dirk Behme, Greg Kroah-Hartman, Jiri Slaby,
	Geert Uytterhoeven, linux-kernel, linux-serial

In the RX DMA completion handler, the hrtimer was restarted before DMA
was set up. If DMA failed, for some reason, it would clean up and the
hrtimer would run into a NULL-pointer. Restart the timer after DMA was
successfully set up.

Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com
Fixes: 67f462b069e9 ("serial: sh-sci: Get rid of the workqueue to handle receive DMA requests")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/tty/serial/sh-sci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index e512eaa57ed5..1e3c26c11c49 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -1325,8 +1325,6 @@ static void sci_dma_rx_complete(void *arg)
 	if (active >= 0)
 		count = sci_dma_rx_push(s, s->rx_buf[active], s->buf_len_rx);
 
-	start_hrtimer_us(&s->rx_timer, s->rx_timeout);
-
 	if (count)
 		tty_flip_buffer_push(&port->state->port);
 
@@ -1346,6 +1344,8 @@ static void sci_dma_rx_complete(void *arg)
 
 	dma_async_issue_pending(chan);
 
+	start_hrtimer_us(&s->rx_timer, s->rx_timeout);
+
 	uart_port_unlock_irqrestore(port, flags);
 	dev_dbg(port->dev, "%s: cookie %d #%d, new active cookie %d\n",
 		__func__, s->cookie_rx[active], active, s->active_rx);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated
  2024-04-16 12:35 [RFC PATCH 0/2] serial: sci: fix OOPS because of wrongly running hrtimer Wolfram Sang
  2024-04-16 12:35 ` [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA Wolfram Sang
@ 2024-04-16 12:35 ` Wolfram Sang
  2024-04-24  9:48   ` Geert Uytterhoeven
  1 sibling, 1 reply; 7+ messages in thread
From: Wolfram Sang @ 2024-04-16 12:35 UTC (permalink / raw)
  To: linux-renesas-soc
  Cc: Wolfram Sang, Dirk Behme, Greg Kroah-Hartman, Jiri Slaby,
	Geert Uytterhoeven, Aleksandar Mitev, linux-kernel, linux-serial

Clear the timer whenever 'chan_rx' is cleared to avoid an OOPS.
Currently, the driver only runs the timer when 'chan_rx' is set before.
However, it is good defensive programming to make sure the hrtimer is
always stopped before clearing the 'chan_rx' pointer.

Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com
Fixes: 9ab765566086 ("serial: sh-sci: Remove timer on shutdown of port")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---

Locking needs to be double-checked here. This patch is mainly calling
for opinions.

 drivers/tty/serial/sh-sci.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 1e3c26c11c49..5ad73933c1c5 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -1262,6 +1262,7 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s)
 {
 	unsigned int i;
 
+	hrtimer_cancel(&s->rx_timer);
 	s->chan_rx = NULL;
 	for (i = 0; i < ARRAY_SIZE(s->cookie_rx); i++)
 		s->cookie_rx[i] = -EINVAL;
@@ -2242,14 +2243,6 @@ static void sci_shutdown(struct uart_port *port)
 		       scr & (SCSCR_CKE1 | SCSCR_CKE0 | s->hscif_tot));
 	uart_port_unlock_irqrestore(port, flags);
 
-#ifdef CONFIG_SERIAL_SH_SCI_DMA
-	if (s->chan_rx_saved) {
-		dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
-			port->line);
-		hrtimer_cancel(&s->rx_timer);
-	}
-#endif
-
 	if (s->rx_trigger > 1 && s->rx_fifo_timeout > 0)
 		del_timer_sync(&s->rx_fifo_timer);
 	sci_free_irq(s);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA
  2024-04-16 12:35 ` [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA Wolfram Sang
@ 2024-04-24  9:41   ` Geert Uytterhoeven
  2024-05-02 18:01   ` Wolfram Sang
  1 sibling, 0 replies; 7+ messages in thread
From: Geert Uytterhoeven @ 2024-04-24  9:41 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: linux-renesas-soc, Dirk Behme, Greg Kroah-Hartman, Jiri Slaby,
	linux-kernel, linux-serial

Hi Wolfram,

Thanks for your patch!

On Tue, Apr 16, 2024 at 2:35 PM Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
> In the RX DMA completion handler, the hrtimer was restarted before DMA
> was set up. If DMA failed, for some reason, it would clean up and the
> hrtimer would run into a NULL-pointer. Restart the timer after DMA was

... into a NULL-pointer dereference of s->chan_rx.

> successfully set up.
>
> Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com
> Fixes: 67f462b069e9 ("serial: sh-sci: Get rid of the workqueue to handle receive DMA requests")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

This is definitely a step in the right direction, so
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated
  2024-04-16 12:35 ` [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated Wolfram Sang
@ 2024-04-24  9:48   ` Geert Uytterhoeven
  2024-05-02 18:17     ` Wolfram Sang
  0 siblings, 1 reply; 7+ messages in thread
From: Geert Uytterhoeven @ 2024-04-24  9:48 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: linux-renesas-soc, Dirk Behme, Greg Kroah-Hartman, Jiri Slaby,
	Geert Uytterhoeven, Aleksandar Mitev, linux-kernel, linux-serial

Hi Wolfram,

On Tue, Apr 16, 2024 at 2:35 PM Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
> Clear the timer whenever 'chan_rx' is cleared to avoid an OOPS.
> Currently, the driver only runs the timer when 'chan_rx' is set before.
> However, it is good defensive programming to make sure the hrtimer is
> always stopped before clearing the 'chan_rx' pointer.
>
> Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com
> Fixes: 9ab765566086 ("serial: sh-sci: Remove timer on shutdown of port")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Thanks for your patch!

> Locking needs to be double-checked here. This patch is mainly calling
> for opinions.

I do think you need to cancel the timer: even when not restarting
the timer in sci_dma_rx_complete() due to a DMA failure, the previous
timer may still be running, and will cause a NULL pointer dereference
on s->chan_rx on timer expiry.

> --- a/drivers/tty/serial/sh-sci.c
> +++ b/drivers/tty/serial/sh-sci.c
> @@ -1262,6 +1262,7 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s)
>  {
>         unsigned int i;
>
> +       hrtimer_cancel(&s->rx_timer);

Is it safe to do this unconditionally on shutdown (cfr. the old check
for s->chan_rx_saved)?

>         s->chan_rx = NULL;
>         for (i = 0; i < ARRAY_SIZE(s->cookie_rx); i++)
>                 s->cookie_rx[i] = -EINVAL;
> @@ -2242,14 +2243,6 @@ static void sci_shutdown(struct uart_port *port)
>                        scr & (SCSCR_CKE1 | SCSCR_CKE0 | s->hscif_tot));
>         uart_port_unlock_irqrestore(port, flags);
>
> -#ifdef CONFIG_SERIAL_SH_SCI_DMA
> -       if (s->chan_rx_saved) {
> -               dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
> -                       port->line);
> -               hrtimer_cancel(&s->rx_timer);
> -       }
> -#endif
> -
>         if (s->rx_trigger > 1 && s->rx_fifo_timeout > 0)
>                 del_timer_sync(&s->rx_fifo_timer);
>         sci_free_irq(s);

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA
  2024-04-16 12:35 ` [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA Wolfram Sang
  2024-04-24  9:41   ` Geert Uytterhoeven
@ 2024-05-02 18:01   ` Wolfram Sang
  1 sibling, 0 replies; 7+ messages in thread
From: Wolfram Sang @ 2024-05-02 18:01 UTC (permalink / raw)
  To: linux-renesas-soc
  Cc: Dirk Behme, Greg Kroah-Hartman, Jiri Slaby, Geert Uytterhoeven,
	linux-kernel, linux-serial

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

On Tue, Apr 16, 2024 at 02:35:47PM +0200, Wolfram Sang wrote:
> In the RX DMA completion handler, the hrtimer was restarted before DMA
> was set up. If DMA failed, for some reason, it would clean up and the
> hrtimer would run into a NULL-pointer. Restart the timer after DMA was
> successfully set up.

Further investigations, please review:

Originally, I thought this was the race condition Dirk encountered. But
I didn't take locking into account. sci_dma_rx_timer_fn() is protected
by the uart_port_lock. sci_dma_rx_complete() is also protected by the
uart_port_lock. So, the position of restarting the hrtimer should not
matter.

However, I still think it is good coding practice (and easier to
understand) if we cancel the hrtimer at the begin of
sci_dma_rx_complete() and reenable it only if setting DMA was
successful. Because that basically means the timer only runs when DMA
was scheduled and has not finished yet.

There is some unnecessary unlock/lock in the error handling of
sci_dma_rx_complete(). I'll simplify this by moving the dev_err
downwards.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated
  2024-04-24  9:48   ` Geert Uytterhoeven
@ 2024-05-02 18:17     ` Wolfram Sang
  0 siblings, 0 replies; 7+ messages in thread
From: Wolfram Sang @ 2024-05-02 18:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-renesas-soc, Dirk Behme, Greg Kroah-Hartman, Jiri Slaby,
	Geert Uytterhoeven, Aleksandar Mitev, linux-kernel, linux-serial

[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]

Hi Geert,

good news, I was able to trigger the DMA rx code path. I dunno what I
did wrong last time. I started from scratch again and it worked easily
by dd-ing random data to the second non-console debug port.

> I do think you need to cancel the timer: even when not restarting
> the timer in sci_dma_rx_complete() due to a DMA failure, the previous
> timer may still be running, and will cause a NULL pointer dereference
> on s->chan_rx on timer expiry.

Taking locking into account, I think this patch is bogus. If we run into
this NULL-pointer, we have a locking problem and cancelling the timer in
sci_dma_rx_chan_invalidate() is not going to fix the underlying locking
problem. sci_dma_rx_chan_invalidate() does not only clear the pointer
but also the cookie_rx-array. sci_dma_rx_timer_fn() bails out via
sci_dma_rx_find_active() if that array is cleared. It does so before
accessing the chan_rx-pointer. So, it looks to me that should work once
all calls to sci_dma_rx_chan_invalidate() are protected. And there is
one path where this is not true, namely via sci_dma_rx_release() during
shutdown. This is why I asked Dirk if the system was about to shutdown.
Currently, I don't see any other problematic code path.

> > -#ifdef CONFIG_SERIAL_SH_SCI_DMA
> > -       if (s->chan_rx_saved) {
> > -               dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
> > -                       port->line);
> > -               hrtimer_cancel(&s->rx_timer);
> > -       }
> > -#endif

Also, this chunk needs to stay. I suggested in patch 1 to cancel the
timer on successful dma_rx_complete, so the timer only runs when a DMA
is in progress. Then, of course, we need to cancel it in shutdown.

I hope I am not seeing "no wood for the trees" by now. I am not
convinced that I actually found Dirk's race condition yet...

All the best,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-02 18:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-16 12:35 [RFC PATCH 0/2] serial: sci: fix OOPS because of wrongly running hrtimer Wolfram Sang
2024-04-16 12:35 ` [RFC PATCH 1/2] serial: sh-sci: start hrtimer after setting up DMA Wolfram Sang
2024-04-24  9:41   ` Geert Uytterhoeven
2024-05-02 18:01   ` Wolfram Sang
2024-04-16 12:35 ` [RFC PATCH 2/2] serial: sh-sci: always cancel hrtimer when DMA RX is invalidated Wolfram Sang
2024-04-24  9:48   ` Geert Uytterhoeven
2024-05-02 18:17     ` Wolfram Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).