* Re: Improving TX for m_can peripherals
2021-05-05 10:30 Improving TX for m_can peripherals Torin Cooper-Bennun
@ 2021-05-05 11:56 ` Marc Kleine-Budde
2021-05-05 12:21 ` Torin Cooper-Bennun
0 siblings, 1 reply; 3+ messages in thread
From: Marc Kleine-Budde @ 2021-05-05 11:56 UTC (permalink / raw)
To: Torin Cooper-Bennun; +Cc: linux-can
[-- Attachment #1: Type: text/plain, Size: 3741 bytes --]
On 05.05.2021 11:30:49, Torin Cooper-Bennun wrote:
> I've been testing the TCAN4550 recently with proper kit (no more jumper
> wires, hooray!) and I'm happy to say the RX path is fixed in v5.12 with
> the latest patches, and even with heavy load I see no missed frames or
> errors.
\o/
> However, TX still needs work. It's easy to break the driver due to the
> following logic in m_can_start_xmit():
>
> | if (cdev->tx_skb) {
> | netdev_err(dev, "hard_xmit called while tx busy\n");
> | return NETDEV_TX_BUSY;
> | }
>
> Regardless of your netif TX queue length or the number of TX buffers
> allocated in the M_CAN core, if you try to transmit too quickly, you
> will hit this. For the application I'm working on, I run into this very
> quickly with real-world scenarios.
>
> Also, the queue is always stopped before the tx work is queued in
> m_can_start_xmit(), which seems wrong and clearly doesn't solve the
> problem:
The simple approach is to stop the tx queue in ndo_start_xmit() and to
restart it if the skb has been send. If the above error message is
triggered I suppose there's a race condition in the driver, cdev->tx_skb
is cleared _after_ the queue has been restarted.
I think it's somewhere here:
| static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
| {
[...]
| /* Enable TX FIFO element to start transfer */
| m_can_write(cdev, M_CAN_TXBAR, (1 << putidx));
TX complete IRQ can come after this...
| /* stop network queue if fifo full */
| if (m_can_tx_fifo_full(cdev) ||
This is a SPI transfer and may take a lot of time....
| m_can_next_echo_skb_occupied(dev, putidx))
| netif_stop_queue(dev);
[...]
| }
| static void m_can_tx_work_queue(struct work_struct *ws)
| {
| struct m_can_classdev *cdev = container_of(ws, struct m_can_classdev,
| tx_work);
|
| m_can_tx_handler(cdev);
... here the tx_skb is set to NULL
| cdev->tx_skb = NULL;
| }
I've send a patch (compile time tested, only!) to fix the probem.
> | /* Need to stop the queue to avoid numerous requests
> | * from being sent. Suggested improvement is to create
> | * a queueing mechanism that will queue the skbs and
> | * process them in order.
> | */
> | cdev->tx_skb = skb;
> | netif_stop_queue(cdev->net);
> | queue_work(cdev->tx_wq, &cdev->tx_work);
>
>
> So - I'd like to fix this. The comment in the snippet above suggests a
> queueing mechanism. It would be good to hear your take on this, Marc -
> AFAIU you have written a similar mechanism for mcp251xfd. :)
For easier queuing, get rid of the worker. Directly send the can_frame
from the xmit handler. The problem is, you cannot sleep inside it. This
means, you cannot use regmap(), the only thing that works is
spi_async(). And as it's async, you can only write data via SPI, reading
doesn't make sense.
Have a look at the mcp251xfd_start_xmit() function. All data structures
for the SPI messages are prepared in beforehand in
mcp251xfd_tx_ring_init_tx_obj(). The xmit function looks up the correct
data structure, converts the skb to the chip's format
mcp251xfd_tx_obj_from_skb() and then sends the data over spi. Special
care is taken of the handling of the head and tail pointers and the
stopping of the queue to avoid race conditions - see
mcp251xfd_get_tx_free(), mcp251xfd_tx_busy(), netif_stop_queue(),
netif_wake_queue().
Hope that helps,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread