All of lore.kernel.org
 help / color / mirror / Atom feed
* Improving TX for m_can peripherals
@ 2021-05-05 10:30 Torin Cooper-Bennun
  2021-05-05 11:56 ` Marc Kleine-Budde
  0 siblings, 1 reply; 3+ messages in thread
From: Torin Cooper-Bennun @ 2021-05-05 10:30 UTC (permalink / raw)
  To: linux-can; +Cc: Marc Kleine-Budde

Good morning candevs,

I've been testing the TCAN4550 recently with proper kit (no more jumper
wires, hooray!) and I'm happy to say the RX path is fixed in v5.12 with
the latest patches, and even with heavy load I see no missed frames or
errors.

However, TX still needs work. It's easy to break the driver due to the
following logic in m_can_start_xmit():

| 	if (cdev->tx_skb) {
| 		netdev_err(dev, "hard_xmit called while tx busy\n");
| 		return NETDEV_TX_BUSY;
| 	}

Regardless of your netif TX queue length or the number of TX buffers
allocated in the M_CAN core, if you try to transmit too quickly, you
will hit this. For the application I'm working on, I run into this very
quickly with real-world scenarios.

Also, the queue is always stopped before the tx work is queued in
m_can_start_xmit(), which seems wrong and clearly doesn't solve the
problem:

| 	/* Need to stop the queue to avoid numerous requests
| 	 * from being sent.  Suggested improvement is to create
| 	 * a queueing mechanism that will queue the skbs and
| 	 * process them in order.
| 	 */
| 	cdev->tx_skb = skb;
| 	netif_stop_queue(cdev->net);
| 	queue_work(cdev->tx_wq, &cdev->tx_work);


So - I'd like to fix this. The comment in the snippet above suggests a
queueing mechanism. It would be good to hear your take on this, Marc -
AFAIU you have written a similar mechanism for mcp251xfd. :)

--
Regards,

Torin Cooper-Bennun
Software Engineer | maxiluxsystems.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Improving TX for m_can peripherals
  2021-05-05 10:30 Improving TX for m_can peripherals Torin Cooper-Bennun
@ 2021-05-05 11:56 ` Marc Kleine-Budde
  2021-05-05 12:21   ` Torin Cooper-Bennun
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Kleine-Budde @ 2021-05-05 11:56 UTC (permalink / raw)
  To: Torin Cooper-Bennun; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 3741 bytes --]

On 05.05.2021 11:30:49, Torin Cooper-Bennun wrote:
> I've been testing the TCAN4550 recently with proper kit (no more jumper
> wires, hooray!) and I'm happy to say the RX path is fixed in v5.12 with
> the latest patches, and even with heavy load I see no missed frames or
> errors.

\o/

> However, TX still needs work. It's easy to break the driver due to the
> following logic in m_can_start_xmit():
> 
> | 	if (cdev->tx_skb) {
> | 		netdev_err(dev, "hard_xmit called while tx busy\n");
> | 		return NETDEV_TX_BUSY;
> | 	}
> 
> Regardless of your netif TX queue length or the number of TX buffers
> allocated in the M_CAN core, if you try to transmit too quickly, you
> will hit this. For the application I'm working on, I run into this very
> quickly with real-world scenarios.
> 
> Also, the queue is always stopped before the tx work is queued in
> m_can_start_xmit(), which seems wrong and clearly doesn't solve the
> problem:

The simple approach is to stop the tx queue in ndo_start_xmit() and to
restart it if the skb has been send. If the above error message is
triggered I suppose there's a race condition in the driver, cdev->tx_skb
is cleared _after_ the queue has been restarted.

I think it's somewhere here:

| static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
| {
[...]
| 		/* Enable TX FIFO element to start transfer  */
| 		m_can_write(cdev, M_CAN_TXBAR, (1 << putidx));

TX complete IRQ can come after this...

| 		/* stop network queue if fifo full */
| 		if (m_can_tx_fifo_full(cdev) ||

This is a SPI transfer and may take a lot of time....

| 		    m_can_next_echo_skb_occupied(dev, putidx))
| 			netif_stop_queue(dev);
[...]
| }

| static void m_can_tx_work_queue(struct work_struct *ws)
| {
| 	struct m_can_classdev *cdev = container_of(ws, struct m_can_classdev,
| 						   tx_work);
| 
| 	m_can_tx_handler(cdev);

... here the tx_skb is set to NULL

| 	cdev->tx_skb = NULL;
| }

I've send a patch (compile time tested, only!) to fix the probem.

> | 	/* Need to stop the queue to avoid numerous requests
> | 	 * from being sent.  Suggested improvement is to create
> | 	 * a queueing mechanism that will queue the skbs and
> | 	 * process them in order.
> | 	 */
> | 	cdev->tx_skb = skb;
> | 	netif_stop_queue(cdev->net);
> | 	queue_work(cdev->tx_wq, &cdev->tx_work);
> 
> 
> So - I'd like to fix this. The comment in the snippet above suggests a
> queueing mechanism. It would be good to hear your take on this, Marc -
> AFAIU you have written a similar mechanism for mcp251xfd. :)

For easier queuing, get rid of the worker. Directly send the can_frame
from the xmit handler. The problem is, you cannot sleep inside it. This
means, you cannot use regmap(), the only thing that works is
spi_async(). And as it's async, you can only write data via SPI, reading
doesn't make sense.

Have a look at the mcp251xfd_start_xmit() function. All data structures
for the SPI messages are prepared in beforehand in
mcp251xfd_tx_ring_init_tx_obj(). The xmit function looks up the correct
data structure, converts the skb to the chip's format
mcp251xfd_tx_obj_from_skb() and then sends the data over spi. Special
care is taken of the handling of the head and tail pointers and the
stopping of the queue to avoid race conditions - see
mcp251xfd_get_tx_free(), mcp251xfd_tx_busy(), netif_stop_queue(),
netif_wake_queue().

Hope that helps,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Improving TX for m_can peripherals
  2021-05-05 11:56 ` Marc Kleine-Budde
@ 2021-05-05 12:21   ` Torin Cooper-Bennun
  0 siblings, 0 replies; 3+ messages in thread
From: Torin Cooper-Bennun @ 2021-05-05 12:21 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can

On Wed, May 05, 2021 at 01:56:24PM +0200, Marc Kleine-Budde wrote:
> For easier queuing, get rid of the worker. Directly send the can_frame
> from the xmit handler. The problem is, you cannot sleep inside it. This
> means, you cannot use regmap(), the only thing that works is
> spi_async(). And as it's async, you can only write data via SPI, reading
> doesn't make sense.

Yeah, this makes things pretty tough, given that we currently have to
read the next available TX buffer index before doing the writes!

> Have a look at the mcp251xfd_start_xmit() function. All data structures
> for the SPI messages are prepared in beforehand in
> mcp251xfd_tx_ring_init_tx_obj(). The xmit function looks up the correct
> data structure, converts the skb to the chip's format
> mcp251xfd_tx_obj_from_skb() and then sends the data over spi. Special
> care is taken of the handling of the head and tail pointers and the
> stopping of the queue to avoid race conditions - see
> mcp251xfd_get_tx_free(), mcp251xfd_tx_busy(), netif_stop_queue(),
> netif_wake_queue().
> 
> Hope that helps,
> Marc

I'll be sure to look into it in more detail, thanks. In the meantime,
your patch is working very well to mostly solve this issue :)

--
Regards,

Torin Cooper-Bennun
Software Engineer | maxiluxsystems.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-05 12:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 10:30 Improving TX for m_can peripherals Torin Cooper-Bennun
2021-05-05 11:56 ` Marc Kleine-Budde
2021-05-05 12:21   ` Torin Cooper-Bennun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.