All of lore.kernel.org
 help / color / mirror / Atom feed
* m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
@ 2021-02-24 14:27 Mariusz Madej
  2021-02-26 13:37 ` Torin Cooper-Bennun
  0 siblings, 1 reply; 6+ messages in thread
From: Mariusz Madej @ 2021-02-24 14:27 UTC (permalink / raw)
  To: linux-can; +Cc: dmurphy


Hi,

I have a problem with m_can controller in my sama5d2 processor.
Under heavy can traffic it happens that my device starts to report (dmesg):

[   77.610000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.620000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.630000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.630000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.640000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.640000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.650000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.660000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.660000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost

what causes large load problem in my system.

I think I have a clue what is going on but my kernel knowledge is low so i want
You to tell me if I am right or not. So:

The only place in m_can.c file, where interrupt register is cleared is function
called when interrupt arrives

static irqreturn_t m_can_isr(int irq, void *dev_id)
{
.
.
        /* ACK all irqs */
        if (ir & IR_ALL_INT)
                m_can_write(cdev, M_CAN_IR, ir);
.
.
}

But when we enter 'NAPI mode' in heavy load we are never get to this function
until load gets lower and interrupts are enabled again. In this situation,
this code:

static int m_can_do_rx_poll(struct net_device *dev, int quota)
{
        struct m_can_classdev *cdev = netdev_priv(dev);
        u32 pkts = 0;
        u32 rxfs;

        rxfs = m_can_read(cdev, M_CAN_RXF0S);
        if (!(rxfs & RXFS_FFL_MASK)) {
                netdev_dbg(dev, "no messages in fifo0\n");
                return 0;
        }

        while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
                if (rxfs & RXFS_RFL)
                        netdev_warn(dev, "Rx FIFO 0 Message Lost\n");

                m_can_read_fifo(dev, rxfs);

                quota--;
                pkts++;
                rxfs = m_can_read(cdev, M_CAN_RXF0S);
        }

        if (pkts)
                can_led_event(dev, CAN_LED_EVENT_RX);

        return pkts;
}

will always have (rxfs & RXFS_RFL) == true until interrupt are enabled again.
That is why we got so many messages in a row for so long time. So clearing
RXFS_RFL bit after warning is issued could be a solution.

Can You tell me if I am right?

Regards
Mariusz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
  2021-02-24 14:27 m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg Mariusz Madej
@ 2021-02-26 13:37 ` Torin Cooper-Bennun
  2021-02-27  4:03   ` Mariusz Madej
  0 siblings, 1 reply; 6+ messages in thread
From: Torin Cooper-Bennun @ 2021-02-26 13:37 UTC (permalink / raw)
  To: Mariusz Madej; +Cc: linux-can, dmurphy

On Wed, Feb 24, 2021 at 02:27:28PM +0000, Mariusz Madej wrote:
> 
> Hi,
> 
> I have a problem with m_can controller in my sama5d2 processor.
> Under heavy can traffic it happens that my device starts to report (dmesg):
> 
> [   77.610000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.620000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.630000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.630000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.640000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.640000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.650000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.660000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> [   77.660000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
> 
> what causes large load problem in my system.

How heavy is this traffic? Is the bus operating at a very high bitrate?
Are there any other useful lines in dmesg?

> The only place in m_can.c file, where interrupt register is cleared is function
> called when interrupt arrives
> 
> static irqreturn_t m_can_isr(int irq, void *dev_id)
> {
> .
> .
>         /* ACK all irqs */
>         if (ir & IR_ALL_INT)
>                 m_can_write(cdev, M_CAN_IR, ir);
> .
> .
> }
> 
> But when we enter 'NAPI mode' in heavy load we are never get to this function
> until load gets lower and interrupts are enabled again. In this situation,
> this code:

The m_can driver handles the IRQ by offloading the RX to a NAPI queue,
so the RX procedure is deferred, and is scheduled to happen at a
(slightly) later time. As far as I understand it, interrupts are not
disabled at any point.

> That is why we got so many messages in a row for so long time. So clearing
> RXFS_RFL bit after warning is issued could be a solution.

RXFS_RFL is a flag in a status register, not an interrupt flag. There is
a corresponding interrupt flag, but that is cleared along with the rest,
at the top of m_can_isr.

I think you are losing messages because the traffic is too heavy for
your system to read out the messages fast enough. That is the usual
reason for seeing "Rx FIFO 0 Message Lost".

--
Regards,

Torin Cooper-Bennun
Software Engineer  | maxiluxsystems.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
  2021-02-26 13:37 ` Torin Cooper-Bennun
@ 2021-02-27  4:03   ` Mariusz Madej
  2021-03-01 14:14     ` Torin Cooper-Bennun
  0 siblings, 1 reply; 6+ messages in thread
From: Mariusz Madej @ 2021-02-27  4:03 UTC (permalink / raw)
  To: Torin Cooper-Bennun; +Cc: linux-can

On 2/26/21 2:37 PM, Torin Cooper-Bennun wrote

>> The only place in m_can.c file, where interrupt register is cleared is function
>> called when interrupt arrives
>>
>> static irqreturn_t m_can_isr(int irq, void *dev_id)
>> {
>> .
>> .
>>          /* ACK all irqs */
>>          if (ir & IR_ALL_INT)
>>                  m_can_write(cdev, M_CAN_IR, ir);
>> .
>> .
>> }
>>
>> But when we enter 'NAPI mode' in heavy load we are never get to this function
>> until load gets lower and interrupts are enabled again. In this situation,
>> this code:
> The m_can driver handles the IRQ by offloading the RX to a NAPI queue,
> so the RX procedure is deferred, and is scheduled to happen at a
> (slightly) later time. As far as I understand it, interrupts are not
> disabled at any point.

Interupts are disabled in m_can_isr function:

if ((ir & IR_RF0N) || (ir & IR_ERR_ALL_30X)) {
         cdev->irqstatus = ir;
         m_can_disable_all_interrupts(cdev);    <--------HERE
         if (!cdev->is_peripheral)
                 napi_schedule(&cdev->napi);
         else
                 m_can_rx_peripheral(dev);

}

and they are enabled conditionaly in function:

static int m_can_poll(struct napi_struct *napi, int quota)
{
         struct net_device *dev = napi->dev;
         struct m_can_classdev *cdev = netdev_priv(dev);
         int work_done;

         work_done = m_can_rx_handler(dev, quota);
         if (work_done < quota) {
                 napi_complete_done(napi, work_done);
                 m_can_enable_all_interrupts(cdev); <---- HERE
         }

         return work_done;
}

so if work_done==quota(64) napi will schedule next receiving instead
of enabling interrupts. That is why i wrote that in my condition i dont get
to m_can_isr function and message lost interrupt is not cleared. As a result
my device enters to this function:

static int m_can_do_rx_poll(struct net_device *dev, int quota)
{
         struct m_can_classdev *cdev = netdev_priv(dev);
         u32 pkts = 0;
         u32 rxfs;

         rxfs = m_can_read(cdev, M_CAN_RXF0S);
         if (!(rxfs & RXFS_FFL_MASK)) {
                 netdev_dbg(dev, "no messages in fifo0\n");
                 return 0;
         }

         while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
                 if (rxfs & RXFS_RFL)
                         netdev_warn(dev, "Rx FIFO 0 Message Lost\n");

                 m_can_read_fifo(dev, rxfs);

                 quota--;
                 pkts++;
                 rxfs = m_can_read(cdev, M_CAN_RXF0S);
         }

         if (pkts)
                 can_led_event(dev, CAN_LED_EVENT_RX);

         return pkts;
}

With RXFS_RFL==true and 64 messages to be read, that is why i have 64 warnings
in a row.
Those warnings take cpu time, and in this time fifo is full again so
function m_can_poll does not enable interrupts again, and so on...

>> That is why we got so many messages in a row for so long time. So clearing
>> RXFS_RFL bit after warning is issued could be a solution.
> RXFS_RFL is a flag in a status register, not an interrupt flag. There is
> a corresponding interrupt flag, but that is cleared along with the rest,
> at the top of m_can_isr.

I agree, sorry for not being specific, the problem is cpu can not get into
m_can_isr for a long time in my case.

>
> I think you are losing messages because the traffic is too heavy for
> your system to read out the messages fast enough. That is the usual
> reason for seeing "Rx FIFO 0 Message Lost".

Seeing "Rx FIFO 0 Message Lost" is not my biggest problem. The problem is
my system is not responsive along this messages.

I changed m_can_do_rx_poll:

static int m_can_do_rx_poll(struct net_device *dev, int quota)
{
         struct m_can_classdev *cdev = netdev_priv(dev);
         u32 pkts = 0;
         u32 rxfs;

         rxfs = m_can_read(cdev, M_CAN_RXF0S);
         if (!(rxfs & RXFS_FFL_MASK)) {
                 netdev_dbg(dev, "no messages in fifo0\n");
                 return 0;
         }

         while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
                 if (rxfs & RXFS_RFL) {
                         netdev_warn(dev, "Rx FIFO 0 Message Lost\n");
                         m_can_write(cdev, M_CAN_IR, IR_RF0L);
                 }

                 m_can_read_fifo(dev, rxfs);

                 quota--;
                 pkts++;
                 rxfs = m_can_read(cdev, M_CAN_RXF0S);
         }

         if (pkts)
                 can_led_event(dev, CAN_LED_EVENT_RX);

         return pkts;
}

And now my system is responsive - i sometimes get "Rx FIFO 0 Message Lost"
but one at a time - not 100k and this is not a big problem for me.
CAN works OK
So IMO it is a bug.

>
> --
> Regards,
>
> Torin Cooper-Bennun
> Software Engineer  | maxiluxsystems.com
>
Regards,

Mariusz


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
  2021-02-27  4:03   ` Mariusz Madej
@ 2021-03-01 14:14     ` Torin Cooper-Bennun
  2021-03-01 21:31       ` Mariusz Madej
  0 siblings, 1 reply; 6+ messages in thread
From: Torin Cooper-Bennun @ 2021-03-01 14:14 UTC (permalink / raw)
  To: Mariusz Madej; +Cc: linux-can

Thank you Mariusz, you've helped me understand this driver better - I
had missed some of the logic. I agree that this is a bug.

On Sat, Feb 27, 2021 at 05:03:14AM +0100, Mariusz Madej wrote:
> I changed m_can_do_rx_poll:
> 
> static int m_can_do_rx_poll(struct net_device *dev, int quota)
> {
>         struct m_can_classdev *cdev = netdev_priv(dev);
>         u32 pkts = 0;
>         u32 rxfs;
> 
>         rxfs = m_can_read(cdev, M_CAN_RXF0S);
>         if (!(rxfs & RXFS_FFL_MASK)) {
>                 netdev_dbg(dev, "no messages in fifo0\n");
>                 return 0;
>         }
> 
>         while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
>                 if (rxfs & RXFS_RFL) {
>                         netdev_warn(dev, "Rx FIFO 0 Message Lost\n");
>                         m_can_write(cdev, M_CAN_IR, IR_RF0L);
>                 }
> 
>                 m_can_read_fifo(dev, rxfs);
> 
>                 quota--;
>                 pkts++;
>                 rxfs = m_can_read(cdev, M_CAN_RXF0S);
>         }
> 
>         if (pkts)
>                 can_led_event(dev, CAN_LED_EVENT_RX);
> 
>         return pkts;
> }

Your fix makes sense to me. If you submit a patch to the linux-can list,
it will probably be reviewed quickly. (Don't bother to CC Dan Murphy
though, as his address listed in MAINTAINERS bounces mail.)

--
Regards,

Torin Cooper-Bennun
Software Engineer | maxiluxsystems.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
  2021-03-01 14:14     ` Torin Cooper-Bennun
@ 2021-03-01 21:31       ` Mariusz Madej
  0 siblings, 0 replies; 6+ messages in thread
From: Mariusz Madej @ 2021-03-01 21:31 UTC (permalink / raw)
  To: Torin Cooper-Bennun; +Cc: linux-can

On 3/1/21 3:14 PM, Torin Cooper-Bennun wrote:

> Thank you Mariusz, you've helped me understand this driver better - I
> had missed some of the logic. I agree that this is a bug.

Thank You for your feedback. I am glad we could help each other.

> Your fix makes sense to me. If you submit a patch to the linux-can list,
> it will probably be reviewed quickly. (Don't bother to CC Dan Murphy
> though, as his address listed in MAINTAINERS bounces mail.)
>
I just send a patch. We will see.

Regards,
Mariusz


^ permalink raw reply	[flat|nested] 6+ messages in thread

* m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
@ 2021-02-24 11:24 Mariusz Madej
  0 siblings, 0 replies; 6+ messages in thread
From: Mariusz Madej @ 2021-02-24 11:24 UTC (permalink / raw)
  To: linux-can

Hi,

I have a problem with m_can controller in my sama5d2 processor.
Under heavy can traffic it happens that my device starts to report (dmesg):

[   77.610000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.620000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.630000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.630000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.640000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.640000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.650000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.660000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost
[   77.660000] m_can_platform f8054000.can can0: Rx FIFO 0 Message Lost

what causes large load problem in my system.

I think I have a clue what is going on but my kernel knowledge is low so 
i want
You to tell me if I am right or not. So:

The only place in m_can.c file, where interrupt register is cleared is 
function
called when interrupt arrives

static irqreturn_t m_can_isr(int irq, void *dev_id)
{
.
.
         /* ACK all irqs */
         if (ir & IR_ALL_INT)
                 m_can_write(cdev, M_CAN_IR, ir);
.
.
}

But when we enter 'NAPI mode' in heavy load we are never get to this 
function
until load gets lower and interrupts are enabled again. In this situation,
this code:

static int m_can_do_rx_poll(struct net_device *dev, int quota)
{
         struct m_can_classdev *cdev = netdev_priv(dev);
         u32 pkts = 0;
         u32 rxfs;

         rxfs = m_can_read(cdev, M_CAN_RXF0S);
         if (!(rxfs & RXFS_FFL_MASK)) {
                 netdev_dbg(dev, "no messages in fifo0\n");
                 return 0;
         }

         while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
                 if (rxfs & RXFS_RFL)
                         netdev_warn(dev, "Rx FIFO 0 Message Lost\n");

                 m_can_read_fifo(dev, rxfs);

                 quota--;
                 pkts++;
                 rxfs = m_can_read(cdev, M_CAN_RXF0S);
         }

         if (pkts)
                 can_led_event(dev, CAN_LED_EVENT_RX);

         return pkts;
}

will always have (rxfs & RXFS_RFL) == true until interrupt are enabled 
again.
That is why we got so many messages in a row for so long time. So clearing
RXFS_RFL bit after warning is issued could be a solution.

Can You tell me if I am right?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-01 21:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 14:27 m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg Mariusz Madej
2021-02-26 13:37 ` Torin Cooper-Bennun
2021-02-27  4:03   ` Mariusz Madej
2021-03-01 14:14     ` Torin Cooper-Bennun
2021-03-01 21:31       ` Mariusz Madej
  -- strict thread matches above, loose matches on Subject: below --
2021-02-24 11:24 Mariusz Madej

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.