On 11/05/2012 07:15 PM, Marc Kleine-Budde wrote:
> On 11/05/2012 12:34 PM, Marc Kleine-Budde wrote:
>> Hello,
>>
>> On 10/08/2012 08:40 AM, AnilKumar Ch wrote:
>>> Fix "received wrong sequence count. expected: x, got: y" errors
>>> reported by cansequence in a stress test (--verbose).
>>>
>>> While processing the receive packets in mailboxes, upper mailboxes
>>> need to enable while processing 12th (RX_BUFFER mailbox) mailbox.
>>> This requires a status check to identify whether the receiving of
>>> packets in progress or not. If the mailboxes are enabled before the
>>> receive packet status is cleared then there is a possibility of
>>> receiving out of order packets.
>>>
>>> With this patch mailboxes are enabled once the receive status is
>>> cleared.
>>>
> [..]
> 
>> Then I had a closer look at AnilKumar Ch's patch, the canme register is
>> not changed if a the CAN core currently receives a CAN frame. I added
>> AnilKumar Ch's busy loop before modifying the canme register, and it
>> works now. So I conclude that the "pick next free mailbox" algorithm in
>> the CAN core is racy. You are not allowed to do something that changes a
>> mailbox's status from disabled/full to enabled/free when a CAN frame is
>> received. As you cannot control CAN frame reception, this is a nice
>> hardware race condition. Please add this to the manual and errata sheets.
> 
> I really want to know where the race window starts and ends, in order to
> specify proper timeouts.
> 
> This is a hunk from AnilKumar Ch's patch:
> 
>>  			/* enable high bank mailboxes */
>> +			timeout = jiffies + usecs_to_jiffies(RM_TIMEOUT_US);
>> +			while (hecc_get_bit(priv, HECC_CANES, HECC_CANES_RM)) {
>> +				cpu_relax();
>> +				if (time_after(jiffies, timeout)) {
>> +					netdev_warn(ndev, "receiving pkt\n");
>> +					break;
>> +				}
>> +			}
>> +
> 
> You first wait for reception to finish, but then might block on this
> spin_lock. This introduces a software race condition.
> 
>>  			spin_lock_irqsave(&priv->mbx_lock, flags);
>>  			mbx_mask = hecc_read(priv, HECC_CANME);
>>  			mbx_mask |= HECC_RX_HIGH_MBOX_MASK;
> 
> We have to use the spin_lock because the CANME register is shared
> between the rx and tx path. I think it's better to make use of
> CANRPM+CANMIM, because I think the critical path is the modification of
> the CANRPM register, which is not shared, so it can be written without
> the need for the spin_lock.

I've implemented both solutions and both are working. I've instrumented
how long the CAN driver does stay in the while loop. I've loaded the CAN
bus with:

# 1 byte frame
cansequence -p

# 8 byte frame, all 1s
cansend can0 -i 0xffffffff 0xff 0xff 0xff 0xff 0xff 0xaff 0xff 0xff
--loop=-1 -e

Both solution show about the same number of loops, depending on the bit
rate of the bus:

500 kbit/s
hecc_set_bit_canme: looped 867 times

50 kbit/s
hecc_set_bit_canme: looped 4807 times

AnilKumar, can you get in contact with the hardware people to figure out
a better solution than to cycle 4k times in a loop?

Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |