Hello, On 10/08/2012 08:40 AM, AnilKumar Ch wrote: > Fix "received wrong sequence count. expected: x, got: y" errors > reported by cansequence in a stress test (--verbose). > > While processing the receive packets in mailboxes, upper mailboxes > need to enable while processing 12th (RX_BUFFER mailbox) mailbox. > This requires a status check to identify whether the receiving of > packets in progress or not. If the mailboxes are enabled before the > receive packet status is cleared then there is a possibility of > receiving out of order packets. > > With this patch mailboxes are enabled once the receive status is > cleared. > > Signed-off-by: AnilKumar Ch AnilKumar Ch, can you add some TI people with access to the vhdl CAN core design to Cc? I'm debugging this issue for a week, fixed several other problems with the driver and now and it seems there is a race condition in the hardware. The algorithm in the driver relies on the ability to re-enable a bunch of mailboxes with a single write command. In the ti_hecc hardware the driver uses the CANME (mailbox enable) register to do so. Under certain circumstances I've noticed that the CAN core receives the next message into an undefined mailbox. According to the data sheet CAN messages are always received into the highest free mailbox (with a matching filter). But after enabling the mailboxes the next CAN frame doesn't go into the highest mailbox. This is a log from my test application. The ti_hecc receives 1 byte long frames at 500 kbit/s. In the first data byte a number is increased with every CAN frame (the "d=" column). The enabled mailboxes are in the "me" column and the pending messages in "rmp". > [ 57.390747] ti_hecc ti_hecc: can0: i=39502 m=14 n=14 rmp=00004000 me=00007ff0 d= 78 > [ 57.390777] ti_hecc ti_hecc: can0: i=39503 m=13 n=13 rmp=00002000 me=00003ff0 d= 79 > [ 57.390777] ti_hecc ti_hecc: can0: i=39504 m=12 n=12 rmp=00001000 me=00001ff0 d= 80 > [ 57.390808] ti_hecc ti_hecc: can0: i=39505 m=11 n=11 rmp=00000800 me=00000ff0 d= 81 > [ 57.390838] ti_hecc ti_hecc: can0: i=39506 m=10 n=10 rmp=00000400 me=000007f0 d= 82 > [ 57.390838] ti_hecc ti_hecc: can0: i=39507 m= 9 n= 9 rmp=00000200 me=000003f0 d= 83 > [ 57.390869] ti_hecc ti_hecc: can0: i=39508 m= 8 n= 8 rmp=00000100 me=000001f0 d= 84 Here the driver enables the high priority mailboxes (0xfffff000) with a single write. Actually the driver writes 0xfffff0f0, as there a still four free low priority mailboxes. > [ 57.390899] ti_hecc ti_hecc: can0: i=39509 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 ^ The next CAN frame should be received, either into 0x00000800 (if the reception happens before the driver enables the high prio mailboxes) or into 0x80000000 (if the reception takes place after the driver has written the canme register). > [ 57.390930] ti_hecc ti_hecc: can0: i=39510 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.390930] ti_hecc ti_hecc: can0: i=39511 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.390960] ti_hecc ti_hecc: can0: i=39512 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.390991] ti_hecc ti_hecc: can0: i=39513 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.391021] ti_hecc ti_hecc: can0: i=39514 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.391021] ti_hecc ti_hecc: can0: i=39515 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.391052] ti_hecc ti_hecc: can0: i=39516 m= 0 n= 7 rmp=00010000 me=fffff0f0 d= 0 > [ 57.391082] ti_hecc ti_hecc: can0: i=39517 m=31 n=31 rmp=80010000 me=fffff0f0 d= 86 ^ We see the next frame goes into 0x80000000 as expected. (There are so many d=0 lines, because the driver is confused, as the next frame is not expected in the 0x00010000 mailbox.) There isn't any tx going on at the moment, the rx-path is the only modifier of the canme register. I thought maybe there is a problem in the hardware with the canme register, so I've reworked the algorithm not to change the canme register at all. canme stays 0xfffffff0 all the time (all rx mailboxes active), but received frames are not acknowledged (I don't write 1s into canrmp after message reception). The pending rx interrupts are masked with canmim. To "enable" the mailboxes again, the driver clears bits in the canrmp register with a single write and then unmasks the interrupt again. This however results in _the_same_ buggy behaviour. Then I had a closer look at AnilKumar Ch's patch, the canme register is not changed if a the CAN core currently receives a CAN frame. I added AnilKumar Ch's busy loop before modifying the canme register, and it works now. So I conclude that the "pick next free mailbox" algorithm in the CAN core is racy. You are not allowed to do something that changes a mailbox's status from disabled/full to enabled/free when a CAN frame is received. As you cannot control CAN frame reception, this is a nice hardware race condition. Please add this to the manual and errata sheets. I'll post a series which fixes the problems I've found soon. regards, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |