linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt
@ 2022-08-03 15:32 Sebastian Würl
  2022-08-03 16:48 ` Andy Shevchenko
  2022-08-03 18:59 ` Marc Kleine-Budde
  0 siblings, 2 replies; 13+ messages in thread
From: Sebastian Würl @ 2022-08-03 15:32 UTC (permalink / raw)
  To: sebastian.wuerl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Stefan Mätje, Andy Shevchenko, Oliver Hartkopp,
	Sebastian Andrzej Siewior, Uwe Kleine-König, linux-can,
	netdev, linux-kernel

The mcp251x driver uses both receiving mailboxes of the can controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled, an will retrieve the
messages accordingly.

This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.

This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

For reproducing the bug I created the following setup:
 - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
 - Setup CAN to 1 MHz
 - Spam bursts of 5 CAN-messages with increasing CAN-ids
 - Continue sending the bursts while sleeping a second between the bursts
 - Check on the RPi whether the received messages have increasing CAN-ids
 - Without this patch, every burst of messages will contain a flipped pair

Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
---
 drivers/net/can/spi/mcp251x.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 666a4505a55a..687aafef4717 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1063,17 +1063,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	mutex_lock(&priv->mcp_lock);
 	while (!priv->force_quit) {
 		enum can_state new_state;
-		u8 intf, eflag;
+		u8 intf, intf0, intf1, eflag, eflag0, eflag1;
 		u8 clear_intf = 0;
 		int can_id = 0, data1 = 0;
 
-		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
-
-		/* mask out flags we don't care about */
-		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+		mcp251x_read_2regs(spi, CANINTF, &intf0, &eflag0);
 
 		/* receive buffer 0 */
-		if (intf & CANINTF_RX0IF) {
+		if (intf0 & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
 			/* Free one buffer ASAP
 			 * (The MCP2515/25625 does this automatically.)
@@ -1083,14 +1080,24 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 						   CANINTF_RX0IF, 0x00);
 		}
 
+		/* intf needs to be read again to avoid a race condition */
+		mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);
+
 		/* receive buffer 1 */
-		if (intf & CANINTF_RX1IF) {
+		if (intf1 & CANINTF_RX1IF) {
 			mcp251x_hw_rx(spi, 1);
 			/* The MCP2515/25625 does this automatically. */
 			if (mcp251x_is_2510(spi))
 				clear_intf |= CANINTF_RX1IF;
 		}
 
+		/* combine flags from both operations for error handling */
+		intf = intf0 | intf1;
+		eflag = eflag0 | eflag1;
+
+		/* mask out flags we don't care about */
+		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+
 		/* any error or tx interrupt we need to clear? */
 		if (intf & (CANINTF_ERR | CANINTF_TX))
 			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt
  2022-08-03 15:32 [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt Sebastian Würl
@ 2022-08-03 16:48 ` Andy Shevchenko
  2022-08-03 16:50   ` Andy Shevchenko
  2022-08-03 19:08   ` Marc Kleine-Budde
  2022-08-03 18:59 ` Marc Kleine-Budde
  1 sibling, 2 replies; 13+ messages in thread
From: Andy Shevchenko @ 2022-08-03 16:48 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Stefan Mätje, Andy Shevchenko, Oliver Hartkopp,
	Sebastian Andrzej Siewior, Uwe Kleine-König, linux-can,
	netdev, Linux Kernel Mailing List

On Wed, Aug 3, 2022 at 5:36 PM Sebastian Würl
<sebastian.wuerl@ororatech.com> wrote:
>
> The mcp251x driver uses both receiving mailboxes of the can controller

CAN

> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled, an will retrieve the
> messages accordingly.
>
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
>
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
>
> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair

Fixes tag?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt
  2022-08-03 16:48 ` Andy Shevchenko
@ 2022-08-03 16:50   ` Andy Shevchenko
  2022-08-03 19:08   ` Marc Kleine-Budde
  1 sibling, 0 replies; 13+ messages in thread
From: Andy Shevchenko @ 2022-08-03 16:50 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Stefan Mätje, Andy Shevchenko, Oliver Hartkopp,
	Sebastian Andrzej Siewior, Uwe Kleine-König, linux-can,
	netdev, Linux Kernel Mailing List

On Wed, Aug 3, 2022 at 6:48 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Wed, Aug 3, 2022 at 5:36 PM Sebastian Würl
> <sebastian.wuerl@ororatech.com> wrote:
> >
> > The mcp251x driver uses both receiving mailboxes of the can controller
>
> CAN
>
> > chips. For retrieving the CAN frames from the controller via SPI, it checks
> > once per interrupt which mailboxes have been filled, an will retrieve the
> > messages accordingly.
> >
> > This introduces a race condition, as another CAN frame can enter mailbox 1
> > while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> > the interrupt handler is called next, mailbox 0 is emptied before
> > mailbox 1, leading to out-of-order CAN frames in the network device.
> >
> > This is fixed by checking the interrupt flags once again after freeing
> > mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
> >
> > For reproducing the bug I created the following setup:
> >  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
> >  - Setup CAN to 1 MHz
> >  - Spam bursts of 5 CAN-messages with increasing CAN-ids
> >  - Continue sending the bursts while sleeping a second between the bursts
> >  - Check on the RPi whether the received messages have increasing CAN-ids
> >  - Without this patch, every burst of messages will contain a flipped pair
>
> Fixes tag?

Also fix the Subject prefix (you may see the most used ones by running
`git log --no-merges --oneline -- drivers/net/can/spi/mcp251x.c`).

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt
  2022-08-03 15:32 [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt Sebastian Würl
  2022-08-03 16:48 ` Andy Shevchenko
@ 2022-08-03 18:59 ` Marc Kleine-Budde
  2022-08-04  6:48   ` [PATCH] can: mcp251x: " Sebastian Würl
  1 sibling, 1 reply; 13+ messages in thread
From: Marc Kleine-Budde @ 2022-08-03 18:59 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Vincent Mailhol, Stefan Mätje,
	Andy Shevchenko, Oliver Hartkopp, Sebastian Andrzej Siewior,
	Uwe Kleine-König, linux-can, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]

On 03.08.2022 17:32:59, Sebastian Würl wrote:
> The mcp251x driver uses both receiving mailboxes of the can controller
> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled, an will retrieve the
> messages accordingly.
> 
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
> 
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

Nitpick: On mcp2515 the buffer is freed automatically.
With this change the interrupt flags are _always_ checked a 2nd time,
even if buffer 0 is not serviced. See below for a change proposal.

> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair
> 
> Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
> ---
>  drivers/net/can/spi/mcp251x.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
> index 666a4505a55a..687aafef4717 100644
> --- a/drivers/net/can/spi/mcp251x.c
> +++ b/drivers/net/can/spi/mcp251x.c
> @@ -1063,17 +1063,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
>  	mutex_lock(&priv->mcp_lock);
>  	while (!priv->force_quit) {
>  		enum can_state new_state;
> -		u8 intf, eflag;
> +		u8 intf, intf0, intf1, eflag, eflag0, eflag1;
>  		u8 clear_intf = 0;
>  		int can_id = 0, data1 = 0;
>  
> -		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
> -
> -		/* mask out flags we don't care about */
> -		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
> +		mcp251x_read_2regs(spi, CANINTF, &intf0, &eflag0);
>  
>  		/* receive buffer 0 */
> -		if (intf & CANINTF_RX0IF) {
> +		if (intf0 & CANINTF_RX0IF) {
>  			mcp251x_hw_rx(spi, 0);
>  			/* Free one buffer ASAP
>  			 * (The MCP2515/25625 does this automatically.)
> @@ -1083,14 +1080,24 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
>  						   CANINTF_RX0IF, 0x00);
>  		}
>  
> +		/* intf needs to be read again to avoid a race condition */
> +		mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);

I think we only have to re-read the interrupt flag if we actually read
data from buffer 0. So what about moving this into the if () { } above?

> +
>  		/* receive buffer 1 */
> -		if (intf & CANINTF_RX1IF) {
> +		if (intf1 & CANINTF_RX1IF) {
>  			mcp251x_hw_rx(spi, 1);
>  			/* The MCP2515/25625 does this automatically. */
>  			if (mcp251x_is_2510(spi))
>  				clear_intf |= CANINTF_RX1IF;
>  		}
>  
> +		/* combine flags from both operations for error handling */
> +		intf = intf0 | intf1;
> +		eflag = eflag0 | eflag1;
> +
> +		/* mask out flags we don't care about */
> +		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
> +
>  		/* any error or tx interrupt we need to clear? */
>  		if (intf & (CANINTF_ERR | CANINTF_TX))
>  			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt
  2022-08-03 16:48 ` Andy Shevchenko
  2022-08-03 16:50   ` Andy Shevchenko
@ 2022-08-03 19:08   ` Marc Kleine-Budde
  1 sibling, 0 replies; 13+ messages in thread
From: Marc Kleine-Budde @ 2022-08-03 19:08 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Sebastian Würl, Wolfgang Grandegger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Stefan Mätje, Andy Shevchenko, Oliver Hartkopp,
	Sebastian Andrzej Siewior, Uwe Kleine-König, linux-can,
	netdev, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

On 03.08.2022 18:48:57, Andy Shevchenko wrote:
> On Wed, Aug 3, 2022 at 5:36 PM Sebastian Würl
> <sebastian.wuerl@ororatech.com> wrote:
> >
> > The mcp251x driver uses both receiving mailboxes of the can controller
> 
> CAN
> 
> > chips. For retrieving the CAN frames from the controller via SPI, it checks
> > once per interrupt which mailboxes have been filled, an will retrieve the
> > messages accordingly.
> >
> > This introduces a race condition, as another CAN frame can enter mailbox 1
> > while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> > the interrupt handler is called next, mailbox 0 is emptied before
> > mailbox 1, leading to out-of-order CAN frames in the network device.
> >
> > This is fixed by checking the interrupt flags once again after freeing
> > mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
> >
> > For reproducing the bug I created the following setup:
> >  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
> >  - Setup CAN to 1 MHz
> >  - Spam bursts of 5 CAN-messages with increasing CAN-ids
> >  - Continue sending the bursts while sleeping a second between the bursts
> >  - Check on the RPi whether the received messages have increasing CAN-ids
> >  - Without this patch, every burst of messages will contain a flipped pair
> 
> Fixes tag?

Should be:
Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] can: mcp251x: Fix race condition on receive interrupt
  2022-08-03 18:59 ` Marc Kleine-Budde
@ 2022-08-04  6:48   ` Sebastian Würl
  2022-08-04  7:06     ` Marc Kleine-Budde
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Würl @ 2022-08-04  6:48 UTC (permalink / raw)
  To: sebastian.wuerl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Stefan Mätje, Andy Shevchenko, Oliver Hartkopp,
	Sebastian Andrzej Siewior, Uwe Kleine-König,
	Christian Pellegrin, linux-can, netdev, linux-kernel

The mcp251x driver uses both receiving mailboxes of the CAN controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled and will retrieve the
messages accordingly.

This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.

This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

For reproducing the bug I created the following setup:
 - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
 - Setup CAN to 1 MHz
 - Spam bursts of 5 CAN-messages with increasing CAN-ids
 - Continue sending the bursts while sleeping a second between the bursts
 - Check on the RPi whether the received messages have increasing CAN-ids
 - Without this patch, every burst of messages will contain a flipped pair

Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
---
 drivers/net/can/spi/mcp251x.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 89897a2d41fa..ca462868141c 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1068,17 +1068,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	mutex_lock(&priv->mcp_lock);
 	while (!priv->force_quit) {
 		enum can_state new_state;
-		u8 intf, eflag;
+		u8 intf, intf0, intf1, eflag, eflag0, eflag1;
 		u8 clear_intf = 0;
 		int can_id = 0, data1 = 0;
 
-		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
-
-		/* mask out flags we don't care about */
-		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+		mcp251x_read_2regs(spi, CANINTF, &intf0, &eflag0);
 
 		/* receive buffer 0 */
-		if (intf & CANINTF_RX0IF) {
+		if (intf0 & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
 			/* Free one buffer ASAP
 			 * (The MCP2515/25625 does this automatically.)
@@ -1086,16 +1083,31 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 			if (mcp251x_is_2510(spi))
 				mcp251x_write_bits(spi, CANINTF,
 						   CANINTF_RX0IF, 0x00);
+
+			if (intf0 & CANINTF_RX1IF) {
+				/* buffer 1 is already known to be full, no need to re-read */
+				intf1 = intf0;
+			} else {
+				/* intf needs to be read again to avoid a race condition */
+				mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);
+			}
 		}
 
 		/* receive buffer 1 */
-		if (intf & CANINTF_RX1IF) {
+		if (intf1 & CANINTF_RX1IF) {
 			mcp251x_hw_rx(spi, 1);
 			/* The MCP2515/25625 does this automatically. */
 			if (mcp251x_is_2510(spi))
 				clear_intf |= CANINTF_RX1IF;
 		}
 
+		/* combine flags from both operations for error handling */
+		intf = intf0 | intf1;
+		eflag = eflag0 | eflag1;
+
+		/* mask out flags we don't care about */
+		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+
 		/* any error or tx interrupt we need to clear? */
 		if (intf & (CANINTF_ERR | CANINTF_TX))
 			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  6:48   ` [PATCH] can: mcp251x: " Sebastian Würl
@ 2022-08-04  7:06     ` Marc Kleine-Budde
  2022-08-04  7:45       ` Sebastian Würl
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Kleine-Budde @ 2022-08-04  7:06 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Vincent Mailhol, Stefan Mätje,
	Andy Shevchenko, Oliver Hartkopp, Sebastian Andrzej Siewior,
	Uwe Kleine-König, Christian Pellegrin, linux-can, netdev,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4540 bytes --]

On 04.08.2022 08:48:03, Sebastian Würl wrote:
> The mcp251x driver uses both receiving mailboxes of the CAN controller
> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled and will retrieve the
> messages accordingly.
> 
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
> 
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
> 
> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair
> 
> Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
> Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>

Thanks for your patch! I think we're almost there. If you send a new
version of the patch, please increase the reroll count, i.e. add a -v3
to the patch subject, this can be done with the parameter "-v3" to git
send-email or git format-patch.

> ---
>  drivers/net/can/spi/mcp251x.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
> index 89897a2d41fa..ca462868141c 100644
> --- a/drivers/net/can/spi/mcp251x.c
> +++ b/drivers/net/can/spi/mcp251x.c
> @@ -1068,17 +1068,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
>  	mutex_lock(&priv->mcp_lock);
>  	while (!priv->force_quit) {
>  		enum can_state new_state;
> -		u8 intf, eflag;
> +		u8 intf, intf0, intf1, eflag, eflag0, eflag1;
>  		u8 clear_intf = 0;
>  		int can_id = 0, data1 = 0;
>  
> -		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);

Keep the read into "&intf, &eflag" here....

> -
> -		/* mask out flags we don't care about */
> -		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
> +		mcp251x_read_2regs(spi, CANINTF, &intf0, &eflag0);
>  
>  		/* receive buffer 0 */
> -		if (intf & CANINTF_RX0IF) {
> +		if (intf0 & CANINTF_RX0IF) {
>  			mcp251x_hw_rx(spi, 0);
>  			/* Free one buffer ASAP
>  			 * (The MCP2515/25625 does this automatically.)
> @@ -1086,16 +1083,31 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
>  			if (mcp251x_is_2510(spi))
>  				mcp251x_write_bits(spi, CANINTF,
>  						   CANINTF_RX0IF, 0x00);
> +
> +			if (intf0 & CANINTF_RX1IF) {
> +				/* buffer 1 is already known to be full, no need to re-read */

Nice! I haven't thought about this optimization.

> +				intf1 = intf0;

...no need to assign intf1.

> +			} else {

Move intf1 into this scope...

> +				/* intf needs to be read again to avoid a race condition */
> +				mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);

...and "or" it to intf here:

intf |= intf1;

Another optimization idea: Do we need to re-read the eflag1? "eflag" is
for error handling only and you're optimizing the good path.

> +			}
>  		}
>  
>  		/* receive buffer 1 */
> -		if (intf & CANINTF_RX1IF) {
> +		if (intf1 & CANINTF_RX1IF) {
>  			mcp251x_hw_rx(spi, 1);
>  			/* The MCP2515/25625 does this automatically. */
>  			if (mcp251x_is_2510(spi))
>  				clear_intf |= CANINTF_RX1IF;
>  		}
>  
> +		/* combine flags from both operations for error handling */
> +		intf = intf0 | intf1;
> +		eflag = eflag0 | eflag1;
> +
> +		/* mask out flags we don't care about */
> +		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
> +
>  		/* any error or tx interrupt we need to clear? */
>  		if (intf & (CANINTF_ERR | CANINTF_TX))
>  			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
> -- 
> 2.30.2
> 
> 

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  7:06     ` Marc Kleine-Budde
@ 2022-08-04  7:45       ` Sebastian Würl
  2022-08-04  7:51         ` Marc Kleine-Budde
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Würl @ 2022-08-04  7:45 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Vincent Mailhol, Stefan Mätje,
	Andy Shevchenko, Oliver Hartkopp, Sebastian Andrzej Siewior,
	Uwe Kleine-König, Christian Pellegrin, linux-can, netdev,
	linux-kernel

On Thu, Aug 4, 2022 at 9:06 AM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> Another optimization idea: Do we need to re-read the eflag1? "eflag" is
> for error handling only and you're optimizing the good path.

I'd argue if a new message entered mailbox 1, this also potentially
changed the error state, so we need to read it.

Thanks a lot for your feedback! Will post v3 soon.

Also I'm sorry for spam in anyones inbox, I didn't get my mailing
program to produce plain-text for the last mail.

best,
Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  7:45       ` Sebastian Würl
@ 2022-08-04  7:51         ` Marc Kleine-Budde
  2022-08-04  7:59           ` [PATCH v3] " Sebastian Würl
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Kleine-Budde @ 2022-08-04  7:51 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Vincent Mailhol, Stefan Mätje,
	Andy Shevchenko, Oliver Hartkopp, Sebastian Andrzej Siewior,
	Uwe Kleine-König, Christian Pellegrin, linux-can, netdev,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

On 04.08.2022 09:45:07, Sebastian Würl wrote:
> On Thu, Aug 4, 2022 at 9:06 AM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> >
> > Another optimization idea: Do we need to re-read the eflag1? "eflag" is
> > for error handling only and you're optimizing the good path.
> 
> I'd argue if a new message entered mailbox 1, this also potentially
> changed the error state, so we need to read it.

Makes sense!

> Thanks a lot for your feedback! Will post v3 soon.
> 
> Also I'm sorry for spam in anyones inbox, I didn't get my mailing
> program to produce plain-text for the last mail.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  7:51         ` Marc Kleine-Budde
@ 2022-08-04  7:59           ` Sebastian Würl
  2022-08-04  8:14             ` [PATCH v4] " Sebastian Würl
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Würl @ 2022-08-04  7:59 UTC (permalink / raw)
  To: sebastian.wuerl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Andy Shevchenko, Oliver Hartkopp, Stefan Mätje,
	Uwe Kleine-König, Sebastian Andrzej Siewior,
	Christian Pellegrin, linux-can, netdev, linux-kernel

The mcp251x driver uses both receiving mailboxes of the CAN controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled and will retrieve the
messages accordingly.

This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.

This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

For reproducing the bug I created the following setup:
 - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
 - Setup CAN to 1 MHz
 - Spam bursts of 5 CAN-messages with increasing CAN-ids
 - Continue sending the bursts while sleeping a second between the bursts
 - Check on the RPi whether the received messages have increasing CAN-ids
 - Without this patch, every burst of messages will contain a flipped pair

Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
---
 drivers/net/can/spi/mcp251x.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 89897a2d41fa..df748b2e7421 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1068,15 +1068,12 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	mutex_lock(&priv->mcp_lock);
 	while (!priv->force_quit) {
 		enum can_state new_state;
-		u8 intf, eflag;
+		u8 intf, intf1, eflag, eflag1;
 		u8 clear_intf = 0;
 		int can_id = 0, data1 = 0;
 
 		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
 
-		/* mask out flags we don't care about */
-		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
-
 		/* receive buffer 0 */
 		if (intf & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
@@ -1086,6 +1083,16 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 			if (mcp251x_is_2510(spi))
 				mcp251x_write_bits(spi, CANINTF,
 						   CANINTF_RX0IF, 0x00);
+
+			/* check ifbuffer 1 is already known to be full, no need to re-read */
+			if (!(intf & CANINTF_RX1IF)) {
+				/* intf needs to be read again to avoid a race condition */
+				mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);
+
+				/* combine flags from both operations for error handling */
+				intf |= intf1;
+				eflag |= eflag1;
+			}
 		}
 
 		/* receive buffer 1 */
@@ -1096,6 +1103,9 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 				clear_intf |= CANINTF_RX1IF;
 		}
 
+		/* mask out flags we don't care about */
+		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+
 		/* any error or tx interrupt we need to clear? */
 		if (intf & (CANINTF_ERR | CANINTF_TX))
 			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  7:59           ` [PATCH v3] " Sebastian Würl
@ 2022-08-04  8:14             ` Sebastian Würl
  2022-08-04  8:28               ` Marc Kleine-Budde
  2022-08-04 18:42               ` Andy Shevchenko
  0 siblings, 2 replies; 13+ messages in thread
From: Sebastian Würl @ 2022-08-04  8:14 UTC (permalink / raw)
  To: sebastian.wuerl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Andy Shevchenko, Oliver Hartkopp, Stefan Mätje,
	Uwe Kleine-König, Sebastian Andrzej Siewior,
	Christian Pellegrin, linux-can, netdev, linux-kernel

The mcp251x driver uses both receiving mailboxes of the CAN controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled and will retrieve the
messages accordingly.

This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.

This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

For reproducing the bug I created the following setup:
 - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
 - Setup CAN to 1 MHz
 - Spam bursts of 5 CAN-messages with increasing CAN-ids
 - Continue sending the bursts while sleeping a second between the bursts
 - Check on the RPi whether the received messages have increasing CAN-ids
 - Without this patch, every burst of messages will contain a flipped pair

Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
---
 drivers/net/can/spi/mcp251x.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 89897a2d41fa..df748b2e7421 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1068,15 +1068,12 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	mutex_lock(&priv->mcp_lock);
 	while (!priv->force_quit) {
 		enum can_state new_state;
-		u8 intf, eflag;
+		u8 intf, intf1, eflag, eflag1;
 		u8 clear_intf = 0;
 		int can_id = 0, data1 = 0;
 
 		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
 
-		/* mask out flags we don't care about */
-		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
-
 		/* receive buffer 0 */
 		if (intf & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
@@ -1086,6 +1083,16 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 			if (mcp251x_is_2510(spi))
 				mcp251x_write_bits(spi, CANINTF,
 						   CANINTF_RX0IF, 0x00);
+
+			/* check if buffer 1 is already known to be full, no need to re-read */
+			if (!(intf & CANINTF_RX1IF)) {
+				/* intf needs to be read again to avoid a race condition */
+				mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);
+
+				/* combine flags from both operations for error handling */
+				intf |= intf1;
+				eflag |= eflag1;
+			}
 		}
 
 		/* receive buffer 1 */
@@ -1096,6 +1103,9 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 				clear_intf |= CANINTF_RX1IF;
 		}
 
+		/* mask out flags we don't care about */
+		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+
 		/* any error or tx interrupt we need to clear? */
 		if (intf & (CANINTF_ERR | CANINTF_TX))
 			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  8:14             ` [PATCH v4] " Sebastian Würl
@ 2022-08-04  8:28               ` Marc Kleine-Budde
  2022-08-04 18:42               ` Andy Shevchenko
  1 sibling, 0 replies; 13+ messages in thread
From: Marc Kleine-Budde @ 2022-08-04  8:28 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Vincent Mailhol, Andy Shevchenko,
	Oliver Hartkopp, Stefan Mätje, Uwe Kleine-König,
	Sebastian Andrzej Siewior, Christian Pellegrin, linux-can,
	netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1753 bytes --]

On 04.08.2022 10:14:11, Sebastian Würl wrote:
> The mcp251x driver uses both receiving mailboxes of the CAN controller
> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled and will retrieve the
> messages accordingly.
> 
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
> 
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
> 
> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair
> 
> Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
> Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>

I've reduced the scope of intf1, eflag1 while applying the patch to
can-next/master.

Thanks,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] can: mcp251x: Fix race condition on receive interrupt
  2022-08-04  8:14             ` [PATCH v4] " Sebastian Würl
  2022-08-04  8:28               ` Marc Kleine-Budde
@ 2022-08-04 18:42               ` Andy Shevchenko
  1 sibling, 0 replies; 13+ messages in thread
From: Andy Shevchenko @ 2022-08-04 18:42 UTC (permalink / raw)
  To: Sebastian Würl
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vincent Mailhol,
	Andy Shevchenko, Oliver Hartkopp, Stefan Mätje,
	Uwe Kleine-König, Sebastian Andrzej Siewior,
	Christian Pellegrin, linux-can, netdev,
	Linux Kernel Mailing List

On Thu, Aug 4, 2022 at 10:42 AM Sebastian Würl
<sebastian.wuerl@ororatech.com> wrote:
>
> The mcp251x driver uses both receiving mailboxes of the CAN controller
> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled and will retrieve the
> messages accordingly.
>
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
>
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
>
> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair

For future submissions...

> Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
> Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
> ---

When you send the new version of a single patch, use --annotate to put
a changelog here (after the cutter '---' line) so people will know how
v4 differes to v3 to v2 and to v1.


Also you mentioned some issues with the mail program. Use `git
send-email` for patch submission. Also you may consider my "smart"
(no) script [1] for that.

[1]: https://github.com/andy-shev/home-bin-tools/blob/master/ge2maintainer.sh


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-04 18:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 15:32 [PATCH] drivers/net/can/spi/mcp251x.c: Fix race condition on receive interrupt Sebastian Würl
2022-08-03 16:48 ` Andy Shevchenko
2022-08-03 16:50   ` Andy Shevchenko
2022-08-03 19:08   ` Marc Kleine-Budde
2022-08-03 18:59 ` Marc Kleine-Budde
2022-08-04  6:48   ` [PATCH] can: mcp251x: " Sebastian Würl
2022-08-04  7:06     ` Marc Kleine-Budde
2022-08-04  7:45       ` Sebastian Würl
2022-08-04  7:51         ` Marc Kleine-Budde
2022-08-04  7:59           ` [PATCH v3] " Sebastian Würl
2022-08-04  8:14             ` [PATCH v4] " Sebastian Würl
2022-08-04  8:28               ` Marc Kleine-Budde
2022-08-04 18:42               ` Andy Shevchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).