All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Evans <tom_usenet@optusnet.com.au>
To: Marc Kleine-Budde <mkl@pengutronix.de>,
	Sharma Bhupesh <bhupesh.sharma@freescale.com>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"linux-can@vger.kernel.org" <linux-can@vger.kernel.org>
Cc: "bhupesh.linux@gmail.com" <bhupesh.linux@gmail.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Arora Sakar <Sakar.Arora@freescale.com>
Subject: Re: can: flexcan: Ancient Freescale Reference FlexCAN Driver and bug fixes so it works.
Date: Thu, 17 Dec 2015 15:22:01 +1100	[thread overview]
Message-ID: <56723869.5060900@optusnet.com.au> (raw)
In-Reply-To: <566A0267.2080809@optusnet.com.au>

On 11/12/15 09:53, Tom Evans wrote:
> On 10/12/2015 11:44 PM, Marc Kleine-Budde wrote:
>> On 12/10/2015 12:44 PM, Tom Evans wrote:
>>> I've just had to delve back into the deep murky past and try to get
>>> Freescale's Kernel 2.6.35 FlexCAN driver working...
>
> On an i.MX53. That's the last supported kernel Freescale ever released for
> that series of parts.

> The simplest path was to fix the
> simple and stupidly obvious bugs in the existing driver rather
 > than add new complicated bugs.

There seems to be no end of bugs in this driver. It is a lesson in something.

What's the dumbest driver bug ever? Interrupt hazard. Yes, this one has that too.

The mainline sends a Message Buffer (or tries to) and DISABLES the netif queue 
if it can't send [1]. The transmit interrupt service routine ENABLES the netif 
queue. And when the interrupt routine happens in the middle of the mainline 
transmit routine? It locks forever. You have to power-cycle to get it back. 
Normally taking the interface "down" and "up" should fix it, as the "start" 
and "stop" functions normally stop and start the netif queue. Not these ones. 
They do now.

Note 1: This is in the case where the transmit queues and MBs are full. The 
transmit function is always called twice. The first time it is called it sends 
the MB, leaves the queue running and return "OK". The second time it finds out 
it is full, stops the queue and returns "BUSY". Even better, with the default 
32 transmit MBs it has to scan through all 32 MBs to find out it is full, and 
it may have also had to scan the same 32 to find an empty one. The FlexCAN 
registers are on a slow bus and take 180ns to read or write. So it can take 
the driver up to 12us to send one buffer. Or proportionally longer if you have 
more Transmit MBs (you might have 56).

If you never send enough data to flow-control the driver this won't happen. If 
you have more than one Transmit MB (and can handle out of order transmissions) 
then the subsequent transmit interrupts will repair the damage and you won't 
see it either. That's probably why it passed whatever testing it had.

Bug detailed here:

https://community.freescale.com/message/597952

Patch to fix here:

https://community.freescale.com/message/597951#597951
Tom


WARNING: multiple messages have this Message-ID (diff)
From: tom_usenet@optusnet.com.au (Tom Evans)
To: linux-arm-kernel@lists.infradead.org
Subject: can: flexcan: Ancient Freescale Reference FlexCAN Driver and bug fixes so it works.
Date: Thu, 17 Dec 2015 15:22:01 +1100	[thread overview]
Message-ID: <56723869.5060900@optusnet.com.au> (raw)
In-Reply-To: <566A0267.2080809@optusnet.com.au>

On 11/12/15 09:53, Tom Evans wrote:
> On 10/12/2015 11:44 PM, Marc Kleine-Budde wrote:
>> On 12/10/2015 12:44 PM, Tom Evans wrote:
>>> I've just had to delve back into the deep murky past and try to get
>>> Freescale's Kernel 2.6.35 FlexCAN driver working...
>
> On an i.MX53. That's the last supported kernel Freescale ever released for
> that series of parts.

> The simplest path was to fix the
> simple and stupidly obvious bugs in the existing driver rather
 > than add new complicated bugs.

There seems to be no end of bugs in this driver. It is a lesson in something.

What's the dumbest driver bug ever? Interrupt hazard. Yes, this one has that too.

The mainline sends a Message Buffer (or tries to) and DISABLES the netif queue 
if it can't send [1]. The transmit interrupt service routine ENABLES the netif 
queue. And when the interrupt routine happens in the middle of the mainline 
transmit routine? It locks forever. You have to power-cycle to get it back. 
Normally taking the interface "down" and "up" should fix it, as the "start" 
and "stop" functions normally stop and start the netif queue. Not these ones. 
They do now.

Note 1: This is in the case where the transmit queues and MBs are full. The 
transmit function is always called twice. The first time it is called it sends 
the MB, leaves the queue running and return "OK". The second time it finds out 
it is full, stops the queue and returns "BUSY". Even better, with the default 
32 transmit MBs it has to scan through all 32 MBs to find out it is full, and 
it may have also had to scan the same 32 to find an empty one. The FlexCAN 
registers are on a slow bus and take 180ns to read or write. So it can take 
the driver up to 12us to send one buffer. Or proportionally longer if you have 
more Transmit MBs (you might have 56).

If you never send enough data to flow-control the driver this won't happen. If 
you have more than one Transmit MB (and can handle out of order transmissions) 
then the subsequent transmit interrupts will repair the damage and you won't 
see it either. That's probably why it passed whatever testing it had.

Bug detailed here:

https://community.freescale.com/message/597952

Patch to fix here:

https://community.freescale.com/message/597951#597951
Tom

  reply	other threads:[~2015-12-17  4:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-14 11:33 [PATCH v2 0/5] Add flexcan support for LS1021A SoCs Bhupesh Sharma
2015-05-14 11:33 ` Bhupesh Sharma
2015-05-14 11:33 ` [PATCH v2 1/5] doc/bindings: Add 'endianess' optional-property for FlexCAN controller Bhupesh Sharma
2015-05-14 11:33   ` Bhupesh Sharma
2015-05-14 11:33 ` [PATCH v2 2/5] arm/dts: Add nodes for flexcan devices present on LS1021A SoC Bhupesh Sharma
2015-05-14 11:33   ` Bhupesh Sharma
2015-05-14 11:33 ` [PATCH v2 3/5] can: flexcan: Add ls1021a flexcan device entry Bhupesh Sharma
2015-05-14 11:33   ` Bhupesh Sharma
2015-05-14 15:38   ` Marc Kleine-Budde
2015-05-14 15:38     ` Marc Kleine-Budde
2015-05-14 11:33 ` [PATCH v2 4/5] can: flexcan: Remodel FlexCAN register r/w APIs for BE instances Bhupesh Sharma
2015-05-14 11:33   ` Bhupesh Sharma
2015-05-18 16:17   ` Enrico Weigelt, metux IT consult
2015-05-18 16:17     ` Enrico Weigelt, metux IT consult
2015-05-18 16:37     ` Sharma Bhupesh
2015-05-18 16:37       ` Sharma Bhupesh
2015-05-14 11:33 ` [PATCH v2 5/5] can: flexcan: Add support for non RX-FIFO mode Bhupesh Sharma
2015-05-14 11:33   ` Bhupesh Sharma
2015-05-14 15:41   ` Marc Kleine-Budde
2015-05-14 15:41     ` Marc Kleine-Budde
2015-05-14 15:44     ` Sharma Bhupesh
2015-05-14 15:44       ` Sharma Bhupesh
2015-12-10 11:05       ` Sharma Bhupesh
2015-12-10 11:05         ` Sharma Bhupesh
2015-12-10 11:44         ` can: flexcan: Ancient Freescale Reference FlexCAN Driver and bug fixes so it works Tom Evans
2015-12-10 12:44           ` Marc Kleine-Budde
2015-12-10 12:44             ` Marc Kleine-Budde
2015-12-10 22:53             ` Tom Evans
2015-12-10 22:53               ` Tom Evans
2015-12-17  4:22               ` Tom Evans [this message]
2015-12-17  4:22                 ` Tom Evans
2015-12-23  0:53                 ` Tom Evans
2015-12-23  0:53                   ` Tom Evans
2015-12-10 12:19         ` [PATCH v2 5/5] can: flexcan: Add support for non RX-FIFO mode Marc Kleine-Budde
2015-12-10 12:19           ` Marc Kleine-Budde
2015-12-10 12:22           ` Sharma Bhupesh
2015-12-10 12:22             ` Sharma Bhupesh
2015-12-11 16:01             ` Robert Schwebel
2015-12-11 16:01               ` Robert Schwebel
2015-05-15  0:09     ` Tom Evans
2015-05-15  0:09       ` Tom Evans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56723869.5060900@optusnet.com.au \
    --to=tom_usenet@optusnet.com.au \
    --cc=Sakar.Arora@freescale.com \
    --cc=arnd@arndb.de \
    --cc=bhupesh.linux@gmail.com \
    --cc=bhupesh.sharma@freescale.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.