From: Mariusz Madej <mariusz.madej@xtrack.com>
To: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Cc: "linux-can@vger.kernel.org" <linux-can@vger.kernel.org>
Subject: Re: m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg
Date: Sat, 27 Feb 2021 05:03:14 +0100 [thread overview]
Message-ID: <d0ebed81-3f7a-1e82-e16b-85e242d1ddca@xtrack.com> (raw)
In-Reply-To: <20210226133702.echxlob5z4pj5ptc@bigthink>
On 2/26/21 2:37 PM, Torin Cooper-Bennun wrote
>> The only place in m_can.c file, where interrupt register is cleared is function
>> called when interrupt arrives
>>
>> static irqreturn_t m_can_isr(int irq, void *dev_id)
>> {
>> .
>> .
>> /* ACK all irqs */
>> if (ir & IR_ALL_INT)
>> m_can_write(cdev, M_CAN_IR, ir);
>> .
>> .
>> }
>>
>> But when we enter 'NAPI mode' in heavy load we are never get to this function
>> until load gets lower and interrupts are enabled again. In this situation,
>> this code:
> The m_can driver handles the IRQ by offloading the RX to a NAPI queue,
> so the RX procedure is deferred, and is scheduled to happen at a
> (slightly) later time. As far as I understand it, interrupts are not
> disabled at any point.
Interupts are disabled in m_can_isr function:
if ((ir & IR_RF0N) || (ir & IR_ERR_ALL_30X)) {
cdev->irqstatus = ir;
m_can_disable_all_interrupts(cdev); <--------HERE
if (!cdev->is_peripheral)
napi_schedule(&cdev->napi);
else
m_can_rx_peripheral(dev);
}
and they are enabled conditionaly in function:
static int m_can_poll(struct napi_struct *napi, int quota)
{
struct net_device *dev = napi->dev;
struct m_can_classdev *cdev = netdev_priv(dev);
int work_done;
work_done = m_can_rx_handler(dev, quota);
if (work_done < quota) {
napi_complete_done(napi, work_done);
m_can_enable_all_interrupts(cdev); <---- HERE
}
return work_done;
}
so if work_done==quota(64) napi will schedule next receiving instead
of enabling interrupts. That is why i wrote that in my condition i dont get
to m_can_isr function and message lost interrupt is not cleared. As a result
my device enters to this function:
static int m_can_do_rx_poll(struct net_device *dev, int quota)
{
struct m_can_classdev *cdev = netdev_priv(dev);
u32 pkts = 0;
u32 rxfs;
rxfs = m_can_read(cdev, M_CAN_RXF0S);
if (!(rxfs & RXFS_FFL_MASK)) {
netdev_dbg(dev, "no messages in fifo0\n");
return 0;
}
while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
if (rxfs & RXFS_RFL)
netdev_warn(dev, "Rx FIFO 0 Message Lost\n");
m_can_read_fifo(dev, rxfs);
quota--;
pkts++;
rxfs = m_can_read(cdev, M_CAN_RXF0S);
}
if (pkts)
can_led_event(dev, CAN_LED_EVENT_RX);
return pkts;
}
With RXFS_RFL==true and 64 messages to be read, that is why i have 64 warnings
in a row.
Those warnings take cpu time, and in this time fifo is full again so
function m_can_poll does not enable interrupts again, and so on...
>> That is why we got so many messages in a row for so long time. So clearing
>> RXFS_RFL bit after warning is issued could be a solution.
> RXFS_RFL is a flag in a status register, not an interrupt flag. There is
> a corresponding interrupt flag, but that is cleared along with the rest,
> at the top of m_can_isr.
I agree, sorry for not being specific, the problem is cpu can not get into
m_can_isr for a long time in my case.
>
> I think you are losing messages because the traffic is too heavy for
> your system to read out the messages fast enough. That is the usual
> reason for seeing "Rx FIFO 0 Message Lost".
Seeing "Rx FIFO 0 Message Lost" is not my biggest problem. The problem is
my system is not responsive along this messages.
I changed m_can_do_rx_poll:
static int m_can_do_rx_poll(struct net_device *dev, int quota)
{
struct m_can_classdev *cdev = netdev_priv(dev);
u32 pkts = 0;
u32 rxfs;
rxfs = m_can_read(cdev, M_CAN_RXF0S);
if (!(rxfs & RXFS_FFL_MASK)) {
netdev_dbg(dev, "no messages in fifo0\n");
return 0;
}
while ((rxfs & RXFS_FFL_MASK) && (quota > 0)) {
if (rxfs & RXFS_RFL) {
netdev_warn(dev, "Rx FIFO 0 Message Lost\n");
m_can_write(cdev, M_CAN_IR, IR_RF0L);
}
m_can_read_fifo(dev, rxfs);
quota--;
pkts++;
rxfs = m_can_read(cdev, M_CAN_RXF0S);
}
if (pkts)
can_led_event(dev, CAN_LED_EVENT_RX);
return pkts;
}
And now my system is responsive - i sometimes get "Rx FIFO 0 Message Lost"
but one at a time - not 100k and this is not a big problem for me.
CAN works OK
So IMO it is a bug.
>
> --
> Regards,
>
> Torin Cooper-Bennun
> Software Engineer | maxiluxsystems.com
>
Regards,
Mariusz
next prev parent reply other threads:[~2021-02-27 4:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-24 14:27 m_can: a lot of 'Rx FIFO 0 Message Lost' in dmesg Mariusz Madej
2021-02-26 13:37 ` Torin Cooper-Bennun
2021-02-27 4:03 ` Mariusz Madej [this message]
2021-03-01 14:14 ` Torin Cooper-Bennun
2021-03-01 21:31 ` Mariusz Madej
-- strict thread matches above, loose matches on Subject: below --
2021-02-24 11:24 Mariusz Madej
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d0ebed81-3f7a-1e82-e16b-85e242d1ddca@xtrack.com \
--to=mariusz.madej@xtrack.com \
--cc=linux-can@vger.kernel.org \
--cc=torin@maxiluxsystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).